Image recognition examples
ResNet

Smaller Models, Bigger Biases: Compressed face recognition models have stronger bias.

Compression methods like parameter pruning and quantization can shrink neural networks for use in devices like smartphones with little impact on accuracy — but they also exacerbate a network’s bias.
Series of images showing how single trained network generates 3D reconstructions of multiple scenes
ResNet

One Network, Many Scenes: Combining NeRF with VAE to generate 3D scenes

To reconstruct the 3D world behind a set of 2D images, machine learning systems usually require a dedicated neural network for each scene. New research enables a single trained network to generate 3D reconstructions of multiple scenes.
Few-shot Learning with a Universal Template (FLUTE)
ResNet

Pattern for Efficient Learning: A training method for few-shot learning in computer vision.

Getting high accuracy out of a classifier trained on a small number of examples is tricky. You might train the model on several large-scale datasets prior to few-shot training, but what if the few-shot dataset includes novel classes? A new method performs well even in that case.
Graphs and pictures showing how computer vision classifies pottery
ResNet

Sorting Shattered Traditions: Archaeologists use machine learning to classify pottery.

Computer vision is probing the history of ancient pottery | What’s new: Researchers at Northern Arizona University developed a machine learning model that identifies different styles of Native American painting on ceramic fragments and sorts the shards by historical period.
Two images showing the process of turning handwriting into text
ResNet

The Writing, Not the Doodles: A handwriting detection AI model for messy paper.

Systems designed to turn handwriting into text typically work best on pages with a consistent layout, such as a single column unbroken by drawings, diagrams, or extraneous symbols. A new system removes that requirement.
Architecture of vision-language tasks
ResNet

One Model for Vision-Language: A general purpose AI for vision and language tasks.

Researchers have proposed task-agnostic architectures for image classification tasks and language tasks. New work proposes a single architecture for vision-language tasks.
Model identifying erroneous labels in popular datasets
ResNet

Labeling Errors Everywhere: Many deep learning datasets contain mislabeled data.

Key machine learning datasets are riddled with mistakes. Several benchmark datasets are shot through with incorrect labels. On average, 3.4 percent of examples in 10 commonly used datasets are mislabeled and the detrimental impact of such errors rises with model size.
Data related to SElf-supERvised (SEER), an image classifier pretrained on unlabeled images
ResNet

Pretraining on Uncurated Data: How unlabeled data improved computer vision accuracy.

It’s well established that pretraining a model on a large dataset improves performance on fine-tuned tasks. In sufficient quantity and paired with a big model, even data scraped from the internet at random can contribute to the performance boost.
Taxonomy of deep learning architectures using self-attention for visual recognition and images from the COCO dataset
ResNet

Vision Models Get Some Attention: Researchers add self-attention to convolutional neural nets.

Self-attention is a key element in state-of-the-art language models, but it struggles to process images because its memory requirement rises rapidly with the size of the input. New research addresses the issue with a simple twist on a convolutional neural network.
Sequence showing a training step that uses different perspectives of the same patient to enhance unsupervised pretraining
ResNet

Same Patient, Different Views: Contrastive pretraining improves medical imaging AI.

When you lack labeled training data, pretraining a model on unlabeled data can compensate. New research pretrained a model three times to boost performance on a medical imaging task.
Graphs and data related to ReLabel, a technique that labels any random crop of any image.
ResNet

Good Labels for Cropped Images: AI technique adds text labels to random image crops.

In training an image recognition model, it’s not uncommon to augment the data by cropping original images randomly. But if an image contains several objects, a cropped version may no longer match its label. Researchers developed a way to make sure random crops are labeled properly.
Graphs and data related to ImageNet performance
ResNet

ImageNet Performance, No Panacea: ImageNet pretraining won't always improve computer vision.

It’s commonly assumed that models pretrained to achieve high performance on ImageNet will perform better on other visual tasks after fine-tuning. But is it always true? A new study reached surprising conclusions.
System Oscar+ working
ResNet

Sharper Eyes For Vision+Language: AI research shows improved image and text matching.

Models that interpret the interplay of words and images tend to be trained on richer bodies of text than images. Recent research worked toward giving such models a more balanced knowledge of the two domains.
Different data related to the phenomenon called underspecification
ResNet

Facing Failure to Generalize: Why some AI models exhibit underspecification.

The same models trained on the same data may show the same performance in the lab, and yet respond very differently to data they haven’t seen before. New work finds this inconsistency to be pervasive.
Art pieces with subjective commentary regarding their emotional impact
ResNet

How Art Makes AI Feel: How an AI model feels about art.

An automated art critic spells out the emotional impact of images. Led by Panos Achlioptas, researchers at Ecole Polytechnique, King Abdullah University, and Stanford University trained a deep learning system to generate subjective interpretations of art.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox