The Limits of Pretraining More pretraining doesn't guarantee a better fine-tuned AI.

Published

Feb 09, 2022

Reading time

2 min read

The higher the accuracy of a pretrained model, the better its performance after fine-tuning, right? Not necessarily.

What’s new: Samira Abnar and colleagues at Google Research conducted a meta-analysis of image-recognition experiments and performed some of their own. They analyzed the relationship between model performance after pretraining and after fine-tuning in a variety of tasks.

Key insight: To find out whether higher pretrained accuracy always leads to higher fine-tuned accuracy, it would be necessary to run thousands of experiments while varying hyperparameter values systematically for each task. A simpler way is to extrapolate the relationship from the results of existing experiments.

How it works: The authors re-examined 4,800 experiments performed on diverse architectures: Vision Transformers, MLP-Mixers, and ResNets. The models had been pretrained to classify labeled images in JFT or ImageNet 21K. They were tested on 25 tasks, including classifying objects, classifying the orientation of objects, and diagnosing diabetic retinopathy, after fine-tuning via few-shot learning or transfer learning. In few-shot learning, the last layer was replaced and trained on 25 examples. In transfer learning, the whole network was fine-tuned on 1,000 examples.

For each model and fine-tuned task, the authors plotted pretrained accuracy on the horizontal axis and fine-tuned accuracy on the vertical axis. The resulting swaths of clustered dots generally rose nonlinearly until they reached a plateau.
The authors calculated a curve to match the best results in each task. Then they extended that line to extrapolate fine-tuned accuracy if pretrained accuracy were 100 percent.
In their own experiments, they varied the size of the pretraining set (JFT), number of parameters in the model (Vision Transformer), and number of epochs in pretraining. Then they repeated the steps above.

Results: Higher pretrained accuracy generally yielded higher fine-tuned accuracy — but it reached a point of diminishing returns. In some cases, higher pretrained accuracy yielded worse fine-tuned accuracy. Moreover, pretrained models of equal accuracy didn’t necessarily perform equally well on different fine-tuned tasks. The authors’ own experiments matched the curves they derived from earlier work, leading them to conclude that dataset size, number of parameters in a model, and length of training don’t significantly influence the relationship between pretrained and fine-tuned accuracy.

Why it matters: More pretraining doesn’t necessarily result in a better fine-tuned model.

We’re thinking: One limiting factor in the value of pretraining accuracy may be the relevance of the pretrained task to the fine-tuned task. No matter how well a model classifies ImageNet, it may not easily learn how to diagnose medical images. A rigorous framework for managing the tradeoff between pretraining and fine-tuning would be useful.

Subscribe to The Batch