Same Patient, Different Views Contrastive pretraining improves medical imaging AI.

Published

Mar 31, 2021

Reading time

2 min read

When you lack labeled training data, pretraining a model on unlabeled data can compensate. New research pretrained a model three times to boost performance on a medical imaging task.

What’s new: Shekoofeh Azizi and colleagues at Google developed Multiple-Instance Contrastive Learning (MICLe), a training step that uses different perspectives of the same patient to enhance unsupervised pretraining.

Key insight: Presented with similar images, a model trained via contrastive learning produces representations that are nearby in vector space. Training via contrastive learning on images of the same patient taken from various angles can produce similar representations of an illness regardless of the camera’s viewpoint.

How it works: The authors started with a ResNet-50 (4x) pretrained on ImageNet. They added contrastive pretraining steps and fine-tuning to diagnose 26 skin conditions from acne to melanoma. The training data was a private set of 454,295 images that included multiple shots of the same patients.

To refine the general representations learned from ImageNet for medical images, the authors pretrained the model according to SimCLR, an earlier contrastive learning technique. The model regarded augmented versions of the same parent image as similar and augmented versions of different images as dissimilar.
To sharpen the representations for changes in viewpoint, lighting, and other variables, they further pretrained the model on multiple shots of 12,306 patients. In this step — called MICLe — the model regarded randomly cropped images of the same patient as similar and randomly cropped images of different patients as dissimilar.
To focus the representations for classifying skin conditions, they fine-tuned the model on the images used in the previous step.

Results: The authors compared the performance of identical ResNet-50s pretrained and fine-tuned with and without MICLe. The authors’ method boosted the model’s accuracy by 1.18 percent to 68.81 percent, versus 67.63 percent without it.

Why it matters: A model intended to diagnose skin conditions no matter where they appear on the body may not have enough data to gain that skill through typical supervised learning methods. This work shows that the same learning can be accomplished using relatively little data through judicious unsupervised pretraining and contrastive losses.

We’re thinking: The combination of SimCLR and MICLe is a study in contrasts.

Subscribe to The Batch