GPT-Neo related animation
datasets

Language Models Want to Be Free

A grassroots research collective aims to make a GPT-3 clone that’s available to everyone. EleutherAI, a loose-knit group of independent researchers, is developing GPT-Neo, an open source, free-to-use version of OpenAI’s gargantuan language model.
1 min read
Data and graphs related to a new model capable of detecting tremors
datasets

Quake Watch

Detecting earthquakes is an important step toward warning surrounding communities that damaging seismic waves may be headed their way. A new model detects tremors and provides clues to their epicenter.
2 min read
United States Capitol
datasets

U.S. New Year’s Resolutions for AI

U.S. lawmakers authorized a slew of national programs that promote artificial intelligence research, development, and deployment, and support efforts to make sure the results are ethical and trustworthy.
2 min read
Tree farm dataset
datasets

Representing the Underrepresented

Some of deep learning’s bedrock datasets came under scrutiny as researchers combed them for built-in biases. Researchers found that popular datasets impart biases against socially marginalized groups to trained models due to the ways the datasets were compiled, labeled, and used.
2 min read
Animations depicting benchmarking, datasets and best practices
datasets

Prosperity of the Commons

A new consortium of companies, schools, and research labs is building open tools for next-generation machine learning. MLCommons aims to foster innovation in machine learning by developing new benchmarks, datasets, and best practices.
1 min read
Collage of self portraits
datasets

Unsupervised Prejudice

Social biases are well documented in decisions made by supervised models trained on ImageNet’s labels. But they also crept into the output of unsupervised models pretrained on the same dataset.
2 min read
Contrast between real and real and synthetic datasets
datasets

Battling Bias in Synthetic Data

Synthetic datasets can inherit flaws in the real-world data they’re based on. Startups are working on solutions. Generating synthetic datasets for training machine learning systems is a booming business.
2 min read
Examples of AI generated images
datasets

GANs for Smaller Data

Trained on a small dataset, generative adversarial networks (GANs) tend to generate either replicas of the training data or noisy output. A new method spurs them to produce satisfying variations.
2 min read
Military AI technology
datasets

The Geopolitics of Data

Some politicians and pundits believe that, in the AI era, the military with the most data wins. A new analysis disputes this notion. A report by Georgetown University’s Center for Security and Emerging Technology examines the relative strengths of the Chinese and U.S. militaries in AI.
2 min read
Tiny Images photos and datasets
datasets

Tiny Images, Outsized Biases

MIT withdrew a popular computer vision dataset after researchers found that it was rife with social bias. Researchers found racist, misogynistic, and demeaning labels among the nearly 80 million pictures in Tiny Images, a collection of 32-by-32 pixel color photos.
2 min read
Data and graphs related to a method that synthesizes extracted features of underrepresented classes
datasets

Augmentation for Features

In any training dataset, some classes may have relatively few examples. A new technique can improve a trained model’s performance on such underrepresented classes. Researchers introduced a method that synthesizes extracted features of underrepresented classes.
2 min read
Self-driving car from the inside
datasets

Cars Idled, AV Makers Keep Rolling

The pandemic has forced self-driving car companies off the road. Now they’re moving forward by refining their mountains of training data. Self-driving cars typically collect real-world training data with two human operators onboard, but Covid-19 makes that unsafe at any speed.
1 min read
Series of pictures of hotels and resorts located in African countries
datasets

Goodbye Tourists, Hello Labelers

Covid-19 has cost many workers their livelihood, but it has provided a lucky few on the lowest rungs of Africa’s machine learning industry with luxury suites.
2 min read
Good and bad examples of labeling images with pictures of birds
datasets

Selling Shovels to Data Miners

When the world is panning for machine learning gold, it pays to help them dig through the data. Machine learning entrepreneurs can make their mark (and their fortune) building services that help other companies develop, deploy, and monitor AI, venture capitalist Rob Toews argues in Forbes.
1 min read
Graph related to error landscape
datasets

Rightsizing Neural Nets

How much data do we want? More! How large should the model be? Bigger! How much more and how much bigger? New research estimates the impact of dataset and model sizes on neural network performance.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox