The UK government abandoned a plan to use machine learning to assess students for higher education.
What’s new: The UK Department of Education discarded grades generated by an algorithm designed to predict performance on the annual Advanced Level qualifications, which had been canceled due to the pandemic.
- Also known as A Levels, these tests serve as entrance exams for colleges and universities.
- The algorithm’s predictions were generally lower than teachers’ assessments of their students’ likely performance, sparking days of demonstrations in London.
- The government ultimately agreed to accept whichever grade was higher.
What went wrong: Education officials initially asked teachers to award their students an expected grade based on past performance on practice exams. This resulted in a higher-than-normal share of good grades. The department developed its model in an effort to bring grades into line with their usual distribution.
- The algorithm predicted A-level grades based primarily on two inputs: students’ past academic ranking within their school and the school’s historical performance relative to others.
- Forty percent of students across England, Ireland, and Wales received predicted scores lower than those their teachers had estimated, compared to 2.2 percent whose scores improved.
- Most students whose predicted grade was lower than the teacher’s assessment attended schools that serve primarily poor, non-white communities.
Behind the news: The pandemic also induced the International Baccalaureate Organization, a nonprofit foundation that provides two-year high school diplomas, to develop its own grading model. The organization said its model produced a distribution similar to that produced by teachers last year. Nonetheless, over 15,000 parents, students, and teachers are petitioning the foundation to reevaluate its model, which they say predicts unfairly low grades.
Why it matters: In many countries, high school exit exams determine whether students can pursue higher education. Flawed grading models can have lifelong consequences.
We’re thinking: If an AI algorithm predicted grades that were, on average, more accurate, less biased, and less noisy than those estimated by human teachers, it would be worth considering deploying it. If an AI algorithm unfairly lowered many individuals’ grades, it would seem like a terrible idea to use it. The truth is, we live in a world where AI systems can fit both descriptions simultaneously, leading to strong arguments for and against using it. Whether such systems yield a net benefit is an ethical question that requires vigorous debate. We don’t see an easy answer.