Large language models occasionally generate information that’s false. What if they produce output that’s downright dangerous?
The fear: Text generators don’t know true from false or right from wrong. Ask an innocent question about food or health, and you might get an innocent — but fatal — answer.
Horror stories: Large language models may already have claimed their first victims.
- Sold on Amazon, AI-generated guides to edible plants encourage readers to gather poisonous mushrooms. Online sleuths have found dangerous misidentifications.
- A New Zealand supermarket chain offered a chatbot that makes recipes from lists of ingredients. When a user asked it what to do with water, ammonia, and bleach, it offered a recipe for lethal chloramine gas. Subsequently the bot appended recipes with the disclaimer, “You must use your own judgment before relying on or making any recipe produced by Savey Meal-bot.”
- A chatbot provided by the National Eating Disorder Association dispensed advice likely to exacerbate eating disorders, users reported. For instance, it told one user with anorexia to continue to lose weight. The organization withdrew the bot.
How scared should you be: AI models are becoming safer as researchers develop techniques that align models to human preferences, such as reinforcement learning from human feedback, constitutional AI, and data-centric AI.
- Anthropic is among a number of AI companies that focus on building safe models. Its Claude family of large language models were trained to follow a constitution that stresses human rights and harm reduction.
- Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI committed to prioritizing AI safety research and sharing information with independent researchers.
Facing the fear: Large language models are widely available, but they’re still experimental. Researchers — like users — are learning how to control them. Builders of systems geared toward the general public — like mental health and recipe chatbots — have a special responsibility to consider sensitive, dangerous, or nefarious uses.