Tiny Images photos and datasets

Tiny Images, Outsized Biases: Why MIT withdrew the Tiny Images dataset

MIT withdrew a popular computer vision dataset after researchers found that it was rife with social bias. Researchers found racist, misogynistic, and demeaning labels among the nearly 80 million pictures in Tiny Images, a collection of 32-by-32 pixel color photos.
Data and graphs related to a method that synthesizes extracted features of underrepresented classes

Augmentation for Features: A technique for boosting underrepresented data classes

In any training dataset, some classes may have relatively few examples. A new technique can improve a trained model’s performance on such underrepresented classes. Researchers introduced a method that synthesizes extracted features of underrepresented classes.
Self-driving car from the inside

Cars Idled, AV Makers Keep Rolling: How self-driving researchers stayed busy during the pandemic

The pandemic has forced self-driving car companies off the road. Now they’re moving forward by refining their mountains of training data. Self-driving cars typically collect real-world training data with two human operators onboard, but Covid-19 makes that unsafe at any speed.
Series of pictures of hotels and resorts located in African countries

Goodbye Tourists, Hello Labelers: How Samasource kept its data labelers safe from Covid-19

Covid-19 has cost many workers their livelihood, but it has provided a lucky few on the lowest rungs of Africa’s machine learning industry with luxury suites.
Good and bad examples of labeling images with pictures of birds

Selling Shovels to Data Miners: A survey of AI business-to-business services

When the world is panning for machine learning gold, it pays to help them dig through the data. Machine learning entrepreneurs can make their mark (and their fortune) building services that help other companies develop, deploy, and monitor AI, venture capitalist Rob Toews argues in Forbes.
Graphs related to ImageNet error landscape

Rightsizing Neural Nets: An equation for predicting optimal data and model size

How much data do we want? More! How large should the model be? Bigger! How much more and how much bigger? New research estimates the impact of dataset and model sizes on neural network performance.
Info about radioactive data

X Marks the Dataset: Radioacive data helps trace a model's training corpus.

Which dataset was used to train a given model? A new method makes it possible to see traces of the training corpus in a model’s output.
Covid-19 illustration

Tools For a Pandemic: Chinese tech giants provide resources for Covid research.

Chinese tech giants have opened their AI platforms to scientists fighting coronavirus. Alibaba Cloud and Baidu are offering a powerful weapon to life-science researchers working to stop the spread of the illness officially known as Covid-19: free access to their computing horsepower and tools.
ImageNet face recognition labels on a picture

ImageNet Gets a Makeover: The effort to remove bias from ImageNet

Computer scientists are struggling to purge bias from one of AI’s most important datasets. ImageNet’s 14 million photos are a go-to collection for training computer-vision systems, yet their descriptive labels have been rife with derogatory and stereotyped attitudes toward race, gender, and sex.
Dawn Song

Dawn Song — Taking Responsibility for Data: The importance of a responsible data economy

Datasets are critical to AI and machine learning, and they are becoming a key driver of the economy. Collection of sensitive data is increasing rapidly, covering almost every aspect of people’s lives.
Information related to Bias-Resilient Neural Network (BR-Net)

Bias Fighter: A neural network for countering bias variables in data

Sophisticated models trained on biased data can learn discriminatory patterns, which leads to skewed decisions. A new solution aims to prevent neural networks from making decisions based on common biases.
Illustration of two black cats labeled as cats, one white cat labeled as banana

Biased Data Trains Oppressive AI

Will biases in training data unwittingly turn AI into a tool for persecution? Bias encoded in software used by nominally objective institutions like, say, the justice or education systems will become impossible to root out.
Collage with photos of people's faces

Public Access, Private Faces

One of the largest open datasets for training face recognition systems has its roots in a popular photo-sharing service. Companies that have used this data could find themselves liable for millions in legal recompense.
 Proportion of examples covered by number of annotators (sorted by number of annotations)

AI Knows Who Labeled the Data

The latest language models are great at answering questions about a given text passage. However, these models are also powerful enough to recognize an individual writer’s style, which can clue them in to the right answers. New research measures such annotator bias in several data sets.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox