Tradeoffs for Higher Accuracy Data Augmentation Plus Weight Decay can Boost Some AI Models

Published

Jul 13, 2022

Reading time

2 min read

Vision models can be improved by training them on several altered versions of the same image and also by encouraging their weights to be close to zero. Recent research showed that both can have adverse effects that may be difficult to detect.

What’s new: Randall Balestriero, Leon Bottou, and Yann LeCun at Meta found that using augmented data and the form of regularization known as weight decay, though they typically boost performance overall, can degrade a model’s performance on some classes.

Key insight: Augmenting training images by cropping, coloring, and otherwise altering them varies patterns in their pixels, helping models learn to generalize beyond the specific examples in the dataset. For instance, if a model uses stripes to classify zebras, then randomly altering color values in training images of zebras can help it learn to recognize zebras despite color variations in input images at inference. However, altering colors in training images may also disrupt the model’s ability to learn from certain patterns. If a model uses color to classify basketballs, then changing the colors in training images of basketballs may render it unable to distinguish basketballs from other spherical objects. Weight decay, which helps models generalize by encouraging weights to be closer to zero during training, may raise similar issues. Both weight decay and pruning reduce the impact of the lowest weights. Previous work showed that pruning, which zeroes out weights that are near zero after training, adversely affects some classes more than others. Shifting low weights closer to zero may do the same.

How it works: The authors trained separate sets of roughly 20 ResNets on ImageNet images that had been altered by randomly cropping, blacking out a rectangle, and adjusting color by changing brightness, contrast, saturation, and hue. They tested the models on ImageNet.

The authors trained different sets of models on varying amounts of each alteration; for instance, cropping images by different percentages. They averaged each set’s accuracy on each class and graphed the results.
They ran similar experiments using weight decay instead of data augmentation: They trained different sets of models on varying amounts of weight decay and averaged their accuracy on each class.

Results: Data augmentations increased the models’ average accuracy on some classes and decreased it on others. For instance, models trained on a dataset from which four-fifths of each image had been cropped achieved 56 percent average accuracy on pickup trucks and 59 percent on academic gowns. Cropping the dataset by three-fifths boosted average accuracy to 75 percent on trucks but cut it to 46 percent on gowns. Weight decay also affected some classes more than others. For example, with very little weight decay, average accuracy was nearly the same (around 47 percent) on gongs and miniature poodles. But with a high weight decay factor, average accuracy reached 72 percent on gongs but plummeted to 22 percent on poodles.

Why it matters: This work raises caution around techniques that improve overall performance. Although a model’s performance is very high on average, its performance on a given class may be much lower.

We’re thinking: In last year’s Data Centric AI Competition, a top-ranked team team augmented data differently depending on its class. For instance, flipping Roman numeral I horizontally doesn’t affect the label, but flipping Roman numeral IV horizontally changes the label from 4 to 6. The team determined appropriate augmentations manually rather than using one-size-fits-all alterations. This work adds credence to the value of such approaches.

Subscribe to The Batch