Unsupervised Prejudice Image classification models learned bias from ImageNet.

Published

Nov 18, 2020

Reading time

2 min read

Social biases are well documented in decisions made by supervised models trained on ImageNet’s labels. But they also crept into the output of unsupervised models pretrained on the same dataset.

What’s new: Two image classification models learned social biases from ImageNet photos, according to a study by researchers Carnegie Mellon and George Washington University.

How it works: The authors measured the extent to which Google’s SimCLRv2 and OpenAI’s iGPT associated types of people with certain attributes.

Using images from CIFAR-100 and Google Images, they assigned each picture either a category (such as man, woman, white, black, or gay) or an attribute (such as pleasant, unpleasant, career, or family).
Then they fed the images to the model to generate features.
They compared the features generated in response to different types of people (say, men or women) with features of opposing pairs of attributes (say, pleasant and unpleasant). In this way, they could determine the degree to which the model associated men versus women with those attributes.

Results: Features generated by both models showed social biases such as associating white people with tools and black people weapons. While SimCLRv2 tended to associate stereotyped attributes with certain categories more strongly, iGPT showed such biases toward a broader range of categories. For instance, features generated by iGPT associated thin people with pleasantness and overweight people with unpleasantness, and also associated men with science and women with liberal arts.

Behind the news: ImageNet 2012 contains 14 million images annotated by human workers, who passed along their prejudices to the dataset. ImageNet creator Fei-Fei Li is spearheading an effort to purge the dataset of labels that associated genders, races, or other identities with stereotypes and slurs.

Why it matters: When unsupervised models pick up on biases in a dataset, the issue runs deeper than problematic labels. The authors believe that their models learned social stereotypes because ImageNet predominantly includes images of people in stereotypical roles: men in offices, women in kitchens, and non-white people in general excluded from images showing situations that have positive associations such as weddings. Machine learning engineers need to be aware that a dataset’s curation alone can encode common social prejudices.

We’re thinking: Datasets are built by humans, so it may be impossible to eliminate social biases from them completely. But minimizing them will pay dividends in applications that don’t discriminate unfairly against certain social groups.

Subscribe to The Batch