Classification isn’t always cut and dried. While the majority of doctors are men and nurses women, that doesn't mean all men who wear scrubs are doctors or all women who wear scrubs are nurses. A new method attempts to account for biases that may be held by certain subsets of labelers.
What's new: Mitchell L. Gordon and colleagues at Stanford introduced a method to control bias in machine learning model outputs. Their jury learning approach models a user-selected subset of the annotators who labeled the training data.
Key insight: A typical classifier mimics how an average labeler would annotate a given example. Such output inevitably reflects biases typically associated with an annotator’s age, gender, religion, and so on, and if the distribution of such demographic characteristics among labelers is skewed, the model’s output will be skewed as well. How to correct for such biases? Instead of predicting the average label, a classifier can predict the label likely to be applied by each individual in a pool of labelers whose demographic characteristics are known. Users can choose labelers who have the characteristics they desire, and the model can emulate them and assign a label accordingly. This would enable users to correct for biases (or select for them).
How it works: The authors used jury learning to train a classifier to mimic the ways different annotators label the toxicity of social media comments. The dataset comprised comments from Twitter, Reddit, and 4Chan.

  • From a group of 17,280 annotators, five scored each comment from 0 (not toxic) to 4 (extremely toxic). In addition, each annotator specified their age, gender, race, education level, political affiliation, whether they’re a parent, and whether religion was an important part of their lives.
  • BERTweet, a natural language model pre-trained on tweets in English, learned to produce representations of each comment. The system also learned embeddings for each annotator and demographic characteristic.
  • The authors concatenated the representations and fed them into a Deep & Cross Network, which learned to reproduce the annotators’ classifications.
  • At inference, the authors set a desired demographic mix for the virtual jury. The model selected 12 qualified annotators at random. Given a comment, the model predicted how each member would classify it and chose the label via majority vote.
  • The authors repeated this process several times to render classifications by many randomly selected juries of the same demographic composition. The median rating provided the label.

Results: The authors evaluated their model’s ability to predict labels assigned by individual annotators. It achieved 0.61 mean average error, while a BERTweet fine-tuned on the dataset achieved 0.9 mean average error (lower is better). The authors’ model achieved fairly consistent error rates when estimating how annotators of different races would label examples: Asian (0.62), Black (0.65), Hispanic (0.57), White (0.60). In contrast, BERTweet’s error rate varied widely with respect to Black annotators: Asian (0.83), Black (1.12), Hispanic (0.87), White (0.87). The authors’ model, which focused on estimating labels assigned by individuals, also outperformed a similar model that was trained to predict decisions by demographic groups, which scored 0.81 mean average error.
Why it matters: Users of AI systems may assume that data labels are objectively true. In fact, they’re often messy approximations, and they can be influenced by the circumstances and experiences of individual annotators. The jury method gives users a way to account for this inherent subjectivity.
We're thinking: Selecting a good demographic mix of labelers can reduce some biases and ensure that diverse viewpoints are represented in the resulting labels — but it doesn’t reduce biases that are pervasive across demographic groups. That problem requires a different approach.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox