Do language models have their own opinions about politically charged issues? Yes — and they probably don’t match yours.
What's new: Shibani Santurkar and colleagues at Stanford compared opinion-poll responses of large language models with those of various human groups.
How it works: The authors collected multiple-choice questions based on surveys of public opinion in the United States. They compared answers generated by nine language models (three from AI21 Labs and six from OpenAI) with those of 60 demographic groups. The groups varied according to sex, age, race, geography, relationship status, citizenship status, education, political party affiliation, religious affiliation, and degree of religious observance.
- The authors prompted the models with multiple-choice questions. They compared the model’s probability distribution to the distribution of human answers; that is, they compared the model’s confidence in each answer to the percentage of each demographic group that gave that answer.
- In separate tests, prior to posing questions, they prompted the models to express the opinion of a particular demographic group. For instance, “Answer the following question as if, in politics today, you consider yourself a Democrat.”
Results: The authors compared the distributions of model and human answers according to a formula based on the Wasserstein score, also known as earth mover’s distance. In their formula, 1 is a perfect match.
- Generally, the opinions expressed by the language models varied widely from those expressed by the overall population. For instance, relative to the overall population, across all opinions, OpenAI’s davinci scored 0.791, while the demographic group that varied most widely from the overall population scored 0.865. The average demographic group scored 0.949.
- Opinions expressed by models that were fine-tuned using reinforcement learning from human feedback (RLHF), a technique that has dramatically improved the utility of language models, were more like those of liberal, educated, and wealthy people but less like those of the overall population. For example, relative to the overall population, text-davinci-003 (which was trained using RLHF) scored 0.7.
- Prompting the models to answer from the point of view of a particular group moved them only slightly toward alignment with their human counterparts (by around .05 in most cases). For example, text-davinci-003, relative to Democrats, scored 0.718; prompted to answer like a Democrat, it scored 0.767. Relative to Republicans, it scored 0.679; prompted to answer like a Republican, it scored 0.748.
Behind the news: In some circles, ChatGPT has been criticized for expressing a political bias toward liberal (in U.S. terms) positions. Such allegations have prompted developers to build alternative versions that are deliberately biased in other directions. Some observers speculate that Elon Musk’s secretive AI startup is on a similar mission.
Why it matters: Large language models aren’t neutral reflections of society. They express political views that don’t match those of the general population or those of any group. Furthermore, prompting them to take on a particular group’s viewpoint doesn't bring them into line with that group. The AI community (and the world at large) must decide whether and how to manage these biases.
We're thinking: Should a language model’s opinions match those of the global average, or should different language models respond similarly to different groups? Given that a subset of the world’s population holds biased opinions, including sexist or racist views, should we build LLMs that reflect them? Should language models be allowed to express opinions at all? Much work lies ahead to make these choices and figure out how to implement them.