Bias in AI is a serious problem. For example, if a judge who’s deciding how to sentence a defendant relies on an AI system that routinely estimates a higher risk that offenders of a particular race will reoffend, that’s a terrible thing. As we work to reduce bias in AI models, though, it’s also worth exploring a different issue: inconsistency. Specifically, let’s consider how inconsistent human decisions are, and how AI can reduce that inconsistency.
If a human judge, given two defendants who committed the same crime under identical circumstances, sentences one to three years in prison and the other to 30 days, we would consider this inconsistency blatantly unfair. Yet, as Daniel Kahneman and his co-authors document in their book, Noise: A Flaw in Human Judgment, human decision-making is extremely inconsistent (or noisy).
One study found that judges systematically sentenced defendants more harshly if the local football team had suffered an upset loss (which presumably made the judge cranky). Judges are only human, and if they’re swayed by football outcomes, imagine how many other irrelevant factors may influence their decisions!
Many human decisions rest on complex criteria, and humans don’t always define their criteria before weighing them. For example:
- In medicine, I’ve seen individual doctors make highly inconsistent diagnoses given the same input. Working on a project with a doctor whom I’ll call Alice, we measured the “inter-Alice agreement score,” which was loosely a measure of how much her diagnoses differed between morning and afternoon. (For the record, Alice is a brilliant doctor and wonderful collaborator. This score measured the inherent ambiguity of the task more than it measured her competence.)
- In manufacturing, I’ve seen skilled inspectors make very different decisions about whether or not parts with similar flaws were defective.
- In online retailing, I’ve seen human annotators make inconsistent decisions about how to tag or categorize products. (Should a fun gadget go under electronics or entertainment?)
In contrast, given the same input, a trained neural network will produce the same output every time. Given similar inputs, a trained model will also typically output similar results. Automated software tends to be highly consistent. This is one of automation’s huge advantages: Algorithms make decisions much more consistently than humans. To my mind, they offer a way to give patients more consistent and fair treatment options, make manufacturing more efficient, make retail product catalogs less confusing to shoppers, and so on.
In conversations about whether and how to build an AI system, it’s important to address how to ensure that the system doesn’t have significant bias as well as how to benchmark its bias against human bias. If you’re trying to get an AI project approved, you may find it useful raise the issue of consistency as well. Measuring the consistency of your algorithm relative to humans who make the same decision can add weight to arguments in favor of investing in an automated system.