Graphs related to ImageNet error landscape

Rightsizing Neural Nets

How much data do we want? More! How large should the model be? Bigger! How much more and how much bigger? New research estimates the impact of dataset and model sizes on neural network performance.
Info about radioactive data

X Marks the Dataset

Which dataset was used to train a given model? A new method makes it possible to see traces of the training corpus in a model’s output.
Covid-19 illustration

Tools For a Pandemic

Chinese tech giants have opened their AI platforms to scientists fighting coronavirus. Alibaba Cloud and Baidu are offering a powerful weapon to life-science researchers working to stop the spread of the illness officially known as Covid-19: free access to their computing horsepower and tools.
ImageNet face recognition labels on a picture

ImageNet Gets a Makeover

Computer scientists are struggling to purge bias from one of AI’s most important datasets. ImageNet’s 14 million photos are a go-to collection for training computer-vision systems, yet their descriptive labels have been rife with derogatory and stereotyped attitudes toward race, gender, and sex.
Dawn Song

Dawn Song: Taking Responsibility for Data

Datasets are critical to AI and machine learning, and they are becoming a key driver of the economy. Collection of sensitive data is increasing rapidly, covering almost every aspect of people’s lives.
Information related to Bias-Resilient Neural Network (BR-Net)

Bias Fighter

Sophisticated models trained on biased data can learn discriminatory patterns, which leads to skewed decisions. A new solution aims to prevent neural networks from making decisions based on common biases.
Illustration of two black cats labeled as cats, one white cat labeled as banana

Biased Data Trains Oppressive AI

Will biases in training data unwittingly turn AI into a tool for persecution? Bias encoded in software used by nominally objective institutions like, say, the justice or education systems will become impossible to root out.
Collage with photos of people's faces

Public Access, Private Faces

One of the largest open datasets for training face recognition systems has its roots in a popular photo-sharing service. Companies that have used this data could find themselves liable for millions in legal recompense.
 Proportion of examples covered by number of annotators (sorted by number of annotations)

AI Knows Who Labeled the Data

The latest language models are great at answering questions about a given text passage. However, these models are also powerful enough to recognize an individual writer’s style, which can clue them in to the right answers. New research measures such annotator bias in several data sets.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox