Small Data the Simple Way A training technique that can outperform few-shot learning

Published

May 20, 2020

Reading time

2 min read

Few-shot learning seeks to build models that adapt to novel tasks based on small numbers of training examples. This sort of learning typically involves complicated techniques, but researchers achieved state-of-the-art results using a simpler approach.

What’s new: An MIT-Google collaboration led by Yonglong Tian and Yue Wang discovered that simple classifiers with access to an embedding that represents similar tasks can outperform the best few-shot techniques.

Few-shot learning: A typical few-shot learning algorithm might receive, for example, 100 different supervised learning tasks with a small training set per task. One task could be recognizing dogs based on, say, 600 images of dogs. Another might be recognizing buses based on a similar number of examples. By drawing on commonalities among the 100 tasks, the algorithm aims to do well on a 101st task using a similarly limited training set.

Key insight: Previous methods for extracting commonalities from a set of training tasks were complex. The authors found that simply training a shared feature extractor on a number of tasks, with few training examples of each, allowed a rudimentary algorithm to learn to perform well on novel tasks, also with few training examples.

How it works: The researchers used conventional supervised learning to train a network to classify images that represent 100 different classes, using 600 images of each class. Simple classifiers for each task had the same architecture and parameters up to the final hidden layer.

After training, the network’s output layer was removed and the final hidden layer was used as a feature extractor.
A logistic regression model used features from the extractor to learn from a small number of examples of a novel class.
The researchers improved the system’s accuracy via knowledge distillation; that is, using an existing model to train a new one. The first feature extractor’s output fed a second, and the second learned to recreate the first’s output. They performed this operation repeatedly.

Results: The researchers tested their method against state-of-the-art few-shot models on four datasets derived from ImageNet or CIFAR10. Their method gained around 3 percentage points of accuracy, averaging around 79 percent.

Why it matters: This work aligns few-shot learning more closely than earlier methods with supervised learning and multi-task learning. The use of common techniques throughout machine learning could spur more rapid progress than specialized approaches.

We’re thinking: Many potential applications of deep learning hinge on models that can learn from small data. We’re glad to have a simple approach to the problem.

Subscribe to The Batch