First, Make No Harmful Models Many AI systems for Covid-19 used biased data.

Published

Apr 22, 2020

Reading time

1 min read

Researchers have rushed out a battery of AI-powered tools to combat the coronavirus, but an assessment of dozens of models is a wake-up call for machine learning engineers.

What’s new: Many models built to spot Covid-19 infection, predict the likelihood of hospitalization, or forecast outcomes are built on flawed science, according to a survey published in the British Medical Journal.

What they found: A group of clinicians, scientists, and engineers led by Laure Winants, an epidemiologist at Maastricht University in the Netherlands, found that biased data compromised all of the 31 models analyzed.

Nearly a dozen models used patient data that did not represent populations of people infected by the virus.
Most models trained to detect Covid-19 infection in CT scans were trained on poorly annotated data. Many of the researchers who built them neglected to benchmark their work against established machine learning methods.
Many models designed to predict patient outcomes were trained only on data from patients who had died or recovered. These models didn’t learn from patients who remained symptomatic by the end of the study period, yielding prognoses that were either overly optimistic or overly dire.

Results: In a commentary that accompanied the survey, BMJ’s editors declared the models so “uniformly poor” that “none can be recommended for clinical use.”

The path forward: The authors recommend that machine learning researchers adopt the 22-point TRIPOD checklist as a standard for developing predictive medical AI. Developed by an international consortium of physicians and data scientists, the checklist is designed to help engineers report their work clearly and reduces risk of developing models with biased data.

Why it matters: Patients and health care systems alike need more accurate and faster diagnoses and prognoses. The AI community is used to publishing preliminary results to accelerate progress, but the health care community tends to wait for rigorous peer review to avoid causing harm.

We’re thinking: Given how fast the Covid-19 situation is evolving, sharing results early and often is a good thing. But the AI community also needs new mechanisms to make sure preliminary models don’t cause harm.

Subscribe to The Batch