Taming Spurious Correlations New Technique Helps AI Avoid Classification Mistakes

Published

Aug 24, 2022

Reading time

2 min read

When a neural network learns image labels, it may confuse a background item for the labeled object. For example, it may learn to associate the label “camel” with desert sand and then classify a cow on a beach as a camel. New research has trained networks to avoid such mistakes.

What’s new: A team at Stanford and Northeastern University led by Michael Zhang proposed Correct-N-Contrast (CNC), a training method that makes neural networks more robust to spurious correlations, in which features and labels are associated but not causally related.

Key insight: A neural network likely has learned a spurious correlation when it produces dissimilar representations of two images with the same label. When learning representations of two images of a cow, for example, the error may manifest as a representation of a grassy field in one image and a representation of a beach in the other. A contrastive loss function can help a neural network avoid such errors by encouraging it to learn similar representations for similar objects against different backgrounds.

How it works: The authors trained models to classify examples and identified examples the models got wrong, possibly owing to spurious correlations. Then they trained a second neural network to classify them correctly using a contrastive loss function.

The authors trained or fine-tuned a neural network to classify a dataset. They used a pretrained LeNet to classify handwritten numbers, a ResNet-50 to classify celebrities’ hair color in CelebA and classify water birds versus land birds, and BERT to recognize toxic social media comments.
They trained or fine-tuned a second neural network using a weighted sum of two loss terms. One term encouraged the network to classify examples correctly. The second, contrastive term pushed together representations of the same labeled object but with dissimilar network output and pulled apart representations of objects with different labels that resulted in similar output.

Results: The authors evaluated their models’ accuracies on groups of examples known to be difficult to classify. Their approach outperformed EIIL, which first trains a model to infer related groups of examples and then trains a second model to classify examples using the group IDs, both on average and on individual tasks. For instance, the ResNet-50 trained on CelebA with CNC achieved 88.8 percent accuracy, while training with EIIL achieved 81.7 percent accuracy. Across all tasks, the authors’ approach achieved 80.9 percent average accuracy while EIIL achieved 74.7 percent average accuracy.

Yes, but: Group DRO, which provides additional information during training such as a description of the background of an image or the gender of a depicted person, achieved 81.8 percent average accuracy.

Why it matters: Previous approaches to managing spurious correlations tend to expand training datasets to capture more variability in data. This work actively guides models away from representing features that reduce classification accuracy.
We’re thinking: A self-driving car must detect a cow (or a person or another vehicle) whether it stands on a meadow, a beach, or pavement.

Subscribe to The Batch