More Learning From Less Data

Reading time
2 min read
Training objective for UDA, where M is a model that predicts a distribution of y given x

Neural networks surpass human abilities in many tasks, with a caveat: Typical supervised learning requires lots of labeled data. New research is pushing back that threshold.

What’s new: Emerging semi-supervised learning methods use a smaller amount of labeled data along with plenty of unlabeled data. Meanwhile, data augmentation multiplies labeled examples by distorting them to create variants. A team effort between Google Brain and Carnegie Mellon University combines the two approaches. Their approach, Unsupervised Data Augmentation, turns in excellent performance in both image recognition and language generation tasks.

Key insight: UDA encourages models to make similar predictions for original inputs and their altered versions, a technique known as smoothness enforcing. Unlike prior smoothness enforcing algorithms, UDA uses data augmentation to multiply labeled and unlabeled examples.

How it works: UDA trains a neural network by calculating two loss functions: one derived from labeled data, the other from unlabeled data. Then it optimizes both losses through gradient descent on subsets of labeled and unlabeled data.

  • The labeled loss focuses on improving the model’s accuracy, just like fully supervised training.
  • For a given set of unlabeled inputs, the model predicts a set of labels.
  • At this point, any state-of-the-art data augmentation algorithm can be used to transform the input. The model re-predicts labels for the newly transformed data.
  • The unlabeled loss measures the difference in distribution between predictions for raw and transformed input.
  • The final loss is the sum of labeled and unlabeled losses.

Why it matters: Neural networks often fail to generalize after supervised learning with limited labeled data. Troughs of unlabeled data may exist, but there may not be enough time or resources for humans to label it. UDA takes advantage of this bounty, promising impressive results even with few labeled examples.

Results: UDA achieves state-of-the-art performance in natural language tasks, specifically with the BERT model. It beats the best fully supervised model on the IMDb dataset  — trained on only 20 labeled examples! Similarly, UDA exceeds the previous record on a restricted ImageNet from which labels were removed from 90 percent of the images.

Takeaway: Deep learning pioneer Yann LeCun considers semi-supervised learning an essential technique for AI to gain common sense. As the state-of-the-art semi-supervised training algorithm, UDA may be one of the small steps toward AI’s next great leap.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox