Info about radioactive data
datasets

X Marks the Dataset

Which dataset was used to train a given model? A new method makes it possible to see traces of the training corpus in a model’s output.
2 min read
Covid-19 illustration
datasets

Tools For a Pandemic

Chinese tech giants have opened their AI platforms to scientists fighting coronavirus. Alibaba Cloud and Baidu are offering a powerful weapon to life-science researchers working to stop the spread of the illness officially known as Covid-19: free access to their computing horsepower and tools.
1 min read
ImageNet face recognition labels on a picture
datasets

ImageNet Gets a Makeover

Computer scientists are struggling to purge bias from one of AI’s most important datasets. ImageNet’s 14 million photos are a go-to collection for training computer-vision systems, yet their descriptive labels have been rife with derogatory and stereotyped attitudes toward race, gender, and sex.
2 min read
Dawn Song
datasets

Dawn Song: Taking Responsibility for Data

Datasets are critical to AI and machine learning, and they are becoming a key driver of the economy. Collection of sensitive data is increasing rapidly, covering almost every aspect of people’s lives.
2 min read
Information related to Bias-Resilient Neural Network (BR-Net)
datasets

Bias Fighter

Sophisticated models trained on biased data can learn discriminatory patterns, which leads to skewed decisions. A new solution aims to prevent neural networks from making decisions based on common biases.
2 min read
Illustration of two black cats labeled as cats, one white cat labeled as banana
datasets

Biased Data Trains Oppressive AI

Will biases in training data unwittingly turn AI into a tool for persecution? Bias encoded in software used by nominally objective institutions like, say, the justice or education systems will become impossible to root out.
2 min read
Collage with photos of people's faces
datasets

Public Access, Private Faces

One of the largest open datasets for training face recognition systems has its roots in a popular photo-sharing service. Companies that have used this data could find themselves liable for millions in legal recompense.
2 min read
 Proportion of examples covered by number of annotators (sorted by number of annotations)
datasets

AI Knows Who Labeled the Data

The latest language models are great at answering questions about a given text passage. However, these models are also powerful enough to recognize an individual writer’s style, which can clue them in to the right answers. New research measures such annotator bias in several data sets.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox