System Oscar+ working
Vision

Sharper Eyes For Vision+Language: AI research shows improved image and text matching.

Models that interpret the interplay of words and images tend to be trained on richer bodies of text than images. Recent research worked toward giving such models a more balanced knowledge of the two domains.
Different data related to the phenomenon called underspecification
Vision

Facing Failure to Generalize: Why some AI models exhibit underspecification.

The same models trained on the same data may show the same performance in the lab, and yet respond very differently to data they haven’t seen before. New work finds this inconsistency to be pervasive.
Art pieces with subjective commentary regarding their emotional impact
Vision

How Art Makes AI Feel: How an AI model feels about art.

An automated art critic spells out the emotional impact of images. Led by Panos Achlioptas, researchers at Ecole Polytechnique, King Abdullah University, and Stanford University trained a deep learning system to generate subjective interpretations of art.
Detection of a digitally altered image of a frog holding a violin
Vision

Fighting Fakes: Six algorithms that help news sites spot deepfakes

A supergroup of machine learning models flags manipulated photos. Jigsaw, a tech incubator owned by Alphabet, released a system that detects digitally altered images.
Graphs and data related to visualized tokens (or vokens)
Vision

Better Language Through Vision: Study improved Bert performance using visual tokens.

For children, associating a word with a picture that illustrates it helps them learn the word’s meaning. Research aims to do something similar for machine learning models. Researchers improved a BERT model’s performance on some language tasks by training it on a large dataset of image-word pairs.
Series of images and graphs related to cancer detection
Vision

Shortcut to Cancer Treatment: AI determines breast cancer treatment from H&E stains.

Doctors who treat breast cancer typically use a quick, inexpensive tumor-tissue stain test to diagnose the illness and a slower, more costly one to determine treatment. A new neural network could help doctors to go straight from diagnosis to treatment.
Examples of InstaHide scrambling images
Vision

A Privacy Threat Revealed: How researchers cracked InstaHide for computer vision.

With access to a trained model, an attacker can use a reconstruction attack to approximate its training data. A method called InstaHide recently won acclaim for promising to make such examples unrecognizable to human eyes while retaining their utility for training.
Facebook service describing a photo on Instagram
Vision

Every Picture Tells a Story: Facebook expands automated alternative text.

Facebook expanded a system of vision, language, and speech models designed to open the social network to users who are visually impaired. A Facebook service that describes photos in a synthesized voice now recognizes 1,200 visual concepts — 10 times more than the previous version.
Data related to adversarial learning
Vision

Adversarial Helper: Adversarial learning can improve vision and NLP.

Models that learn relationships between images and words are gaining a higher profile. New research shows that adversarial learning, usually a way to make models robust to deliberately misleading inputs, can boost vision-and-language performance.
Gun detecting system working and alerting the police
Vision

Draw a Gun, Trigger an Algorithm: These AI-enabled security cameras automatically ID guns.

Computer vision is alerting authorities the moment someone draws a gun. Several companies offer deep learning systems that enable surveillance cameras to spot firearms and quickly notify security guards or police.
Rebag app working on a cellphone
Vision

How Much For That Vintage Gucci?: An AI system that appraises luxury handbags

Computer vision is helping people resell their used designer handbags. Rebag, a resale company for luxury handbags, watches, and jewelry, launched Clair AI, an app that automatically appraises second-hand bags from brands like Gucci, Hermes, and Prada.
AI-generated images with the model DALL-E
Vision

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.
Covid Fast Fax operating
Vision

The Fax About Tracking Covid: A deep learning system for sorting critical Covid-19 cases.

A pair of neural networks is helping to prioritize Covid-19 cases for contact tracing. The public health department of California’s Contra Costa County is using deep learning to sort Covid-19 cases reported via the pre-internet technology known as fax.
Tree farm dataset
Vision

Representing the Underrepresented: Many important AI datasets contain bias.

Some of deep learning’s bedrock datasets came under scrutiny as researchers combed them for built-in biases. Researchers found that popular datasets impart biases against socially marginalized groups to trained models due to the ways the datasets were compiled, labeled, and used.
Robotaxi in different angles
Vision

Robotaxi Reimagined: Zoox reveals an electric robotaxi.

A new breed of self-driving car could kick the autonomous-vehicle industry into a higher gear. Zoox unveiled its first product, an all-electric, driverless taxi designed fully in-house.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox