14 Posts

Dependency between compute budget and number of parameters

Right-Sizing Models for the Dataset: Finding the Best Data-To-Parameter Ratio for NLP Models

The route to improving transformer-based language models like GPT-3 and Gopher, which are trained on immense quantities of text scraped from the web, has been to increase their size. But research shows that, given a processing budget, bigger doesn’t necessarily mean better.
Information related to Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)

Update Any Language Model: New Method to Update Pretrained Language Models

The ability to update language models is essential to incorporate new information and correct undesirable behaviors. Previous methods are unwieldy and often fail as the amount of new data increases. New work offers a workaround.
Animated chart shows how AI can avoid mistaking an image's subject for its context.

Taming Spurious Correlations: New Technique Helps AI Avoid Classification Mistakes

When a neural network learns image labels, it may confuse a background item for the labeled object. New research avoids such mistakes.
Two randomly cropped pictures

Tradeoffs for Higher Accuracy: Data Augmentation Plus Weight Decay can Boost Some AI Models

Vision models can be improved by training them on several altered versions of the same image and also by encouraging their weights to be close to zero. Recent research showed that both can have adverse effects that may be difficult to detect.
Masked Auto-Encoder (MAE) explanation

Who Was That Masked Input? Pretraining Method Improves Computer Vision Performance

Researchers have shown that it’s possible to train a computer vision model effectively on around 66 percent of the pixels in each training image. New work used 25 percent, saving computation and boosting performance to boot.
Graphs related to a comparison and evaluation of 14 different optimizers

Optimizer Shootout: An evaluation of 14 deep learning optimizers

Everyone has a favorite optimization method, but it’s not always clear which one works best in a given situation. New research aims to establish a set of benchmarks. Researchers evaluated 14 popular optimizers using the Deep Optimization Benchmark Suite some of them introduced last year.
Series of images with graphs and data related to optimization algorithms

When Optimization is Suboptimal: How gradient descent can sometimes lead to model bias

Bias arises in machine learning when we fit an overly simple function to a more complex problem. A theoretical study shows that gradient descent itself may introduce such bias and render algorithms unable to fit data properly.
Data and graphs related to equations that optimize some training parameters.

Optimize Your Training Parameters: Research on finding a neural net's optimal batch size

Last week we reported on a formula to determine model width and dataset size for optimal performance. A new paper contributes equations that optimize some training parameters.
Good and bad examples of labeling images with pictures of birds

Selling Shovels to Data Miners: A survey of AI business-to-business services

When the world is panning for machine learning gold, it pays to help them dig through the data. Machine learning entrepreneurs can make their mark (and their fortune) building services that help other companies develop, deploy, and monitor AI, venture capitalist Rob Toews argues in Forbes.
Graphs related to ImageNet error landscape

Rightsizing Neural Nets: An equation for predicting optimal data and model size

How much data do we want? More! How large should the model be? Bigger! How much more and how much bigger? New research estimates the impact of dataset and model sizes on neural network performance.
Graph related to Noisy Student performance on ImageNet

Self-Training for Sharper Vision: The noisy student method for computer vision, explained

The previous state-of-the-art image classifier was trained on the ImageNet dataset plus 3.5 billion supplemental images from a different database. A new method achieved higher accuracy with one-tenth as many supplemental examples — and they were unlabeled, to boot.
Bipedal robot crossing obstacles

Survival of the Overfittest

Neuroevolution, which combines neural networks with ideas drawn from Darwin, is gaining momentum. Its advocates claim that they can achieve faster, better results by generating a succession of new models, each slightly different than its predecessors, rather than relying on a purpose-built model.
Continuous Planner for One-Shot Imitation Learning

Working Through Uncertainty

How to build robots that respond to novel situations? When prior experience is limited, enabling a model to describe its uncertainty can enable it to explore more avenues to success.
Calibration plot for ImageNet

Scaling Bayes

Neural networks are good at making predictions, but they’re not so good at estimating how certain they are. If the training data set is small and many sets of model parameters fit the data well, for instance, the network may not realize this explicitly, leading to overly confident predictions.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox