Blurred human faces in different pictures
Datasets

De-Facing ImageNet: Researchers blur all faces in ImageNet.

ImageNet now comes with privacy protection.What’s new: The team that manages the machine learning community’s go-to image dataset blurred all the human faces pictured in it and tested how models trained on the modified images on a variety of image recognition tasks.
Square brackets with lines disappearing inside
Datasets

Spotlight on Unreproducible Results: Papers Without Code collects unreproducible AI research.

A new website calls out AI research that may not lend itself to being reproduced. Papers Without Code maintains a directory of AI systems that researchers tried but failed to reproduce.
Graphs and data related to ImageNet performance
Datasets

ImageNet Performance, No Panacea: ImageNet pretraining won't always improve computer vision.

It’s commonly assumed that models pretrained to achieve high performance on ImageNet will perform better on other visual tasks after fine-tuning. But is it always true? A new study reached surprising conclusions.
Dozens of different faces shown in a series of images
Datasets

Cutting Corners to Recognize Faces: Research finds flaws in face recognition datasets.

Datasets for training face recognition models have ballooned in size — while slipping in quality and respect for privacy. In a survey of 130 datasets compiled over the last four decades, researchers traced how the need for increasing quantities of data led researchers to relax their standards.
Different data related to the phenomenon called underspecification
Datasets

Facing Failure to Generalize: Why some AI models exhibit underspecification.

The same models trained on the same data may show the same performance in the lab, and yet respond very differently to data they haven’t seen before. New work finds this inconsistency to be pervasive.
GPT-Neo related animation
Datasets

Language Models Want to Be Free: How EleutherAI is developing a GPT-3 clone.

A grassroots research collective aims to make a GPT-3 clone that’s available to everyone. EleutherAI, a loose-knit group of independent researchers, is developing GPT-Neo, an open source, free-to-use version of OpenAI’s gargantuan language model.
Data and graphs related to a new model capable of detecting tremors
Datasets

Quake Watch: AI model detects earthquakes and estimates epicenters.

Detecting earthquakes is an important step toward warning surrounding communities that damaging seismic waves may be headed their way. A new model detects tremors and provides clues to their epicenter.
United States Capitol
Datasets

U.S. New Year’s Resolutions for AI: All the AI programs authorized in the 2021 NDAA.

U.S. lawmakers authorized a slew of national programs that promote artificial intelligence research, development, and deployment, and support efforts to make sure the results are ethical and trustworthy.
Tree farm dataset
Datasets

Representing the Underrepresented: Many important AI datasets contain bias.

Some of deep learning’s bedrock datasets came under scrutiny as researchers combed them for built-in biases. Researchers found that popular datasets impart biases against socially marginalized groups to trained models due to the ways the datasets were compiled, labeled, and used.
Animations depicting benchmarking, datasets and best practices
Datasets

Prosperity of the Commons: Tools from MLCommons for improved model development

A new consortium of companies, schools, and research labs is building open tools for next-generation machine learning. MLCommons aims to foster innovation in machine learning by developing new benchmarks, datasets, and best practices.
Collage of self portraits
Datasets

Unsupervised Prejudice: Image classification models learned bias from ImageNet.

Social biases are well documented in decisions made by supervised models trained on ImageNet’s labels. But they also crept into the output of unsupervised models pretrained on the same dataset.
Contrast between real and real and synthetic datasets
Datasets

Battling Bias in Synthetic Data: How synthetic data startups are working to avoid bias

Synthetic datasets can inherit flaws in the real-world data they’re based on. Startups are working on solutions. Generating synthetic datasets for training machine learning systems is a booming business.
Examples of AI generated images
Datasets

GANs for Smaller Data: Training GANs on small data without overfitting

Trained on a small dataset, generative adversarial networks (GANs) tend to generate either replicas of the training data or noisy output. A new method spurs them to produce satisfying variations.
Military AI technology
Datasets

The Geopolitics of Data: How data impacts military uses of AI

Some politicians and pundits believe that, in the AI era, the military with the most data wins. A new analysis disputes this notion. A report by Georgetown University’s Center for Security and Emerging Technology examines the relative strengths of the Chinese and U.S. militaries in AI.
Tiny Images photos and datasets
Datasets

Tiny Images, Outsized Biases: Why MIT withdrew the Tiny Images dataset

MIT withdrew a popular computer vision dataset after researchers found that it was rife with social bias. Researchers found racist, misogynistic, and demeaning labels among the nearly 80 million pictures in Tiny Images, a collection of 32-by-32 pixel color photos.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox