A deep learning system detected breast cancer in mammograms as well as experienced radiologists, according to a landmark study.
What’s new: Researchers at Lund University in Sweden conducted a randomized, controlled, clinical trial to determine whether an AI system could save radiologists’ time without endangering patients — purportedly the first study of AI’s ability to diagnose breast cancer from mammograms whose design met the so-called gold standard for medical tests. Their human-plus-machine evaluation procedure enabled radiologists to spend substantially less time per patient while exceeding a baseline for safety.
How it works: The authors randomly divided 80,000 Swedish women into a control group and an experimental group.
- The control group had its mammograms evaluated manually by two radiologists (the standard practice in much of Europe).
- The second, experimental group had its mammograms evaluated by Transpara, a convolutional neural network trained to recognize breast tumors. Transpara scored mammograms for cancer risk on a scale from 1 (low risk) to 10 (high risk). It added marks to mammograms that scored 8 to 10 highlighting potential cancer locations.
- Human radiologists evaluated the experimental group’s mammograms, scores, and marks. One radiologist reviewed each mammogram, unless Transpara had assigned a score of 10, in which case two radiologists reviewed it. Thus at least one radiologist examined each patient in the study.
- Finally, the radiologists chose whether or not to recall each patient for further examination. This enabled them to detect false positives.
Results: The AI-assisted diagnosis achieved a cancer detection rate of 6.1 per 1,000 patients screened, comparable to the control method and above an established lower limit for safety. The radiologists recalled 2.0 percent of the control group and 2.2 percent of the experimental group, and both the control and experimental groups showed the same false-positive rate of 1.5 percent. (The difference in recall rates coupled with the matching false-positive rate suggests that the AI method detected 20 percent more cancer cases than the manual method, though authors didn’t emphasize that finding.) Moreover, since approximately 37,000 patients were only examined by one radiologist, the results indicate that AI saved 44.3 percent of the examination workload without increasing the number of misdiagnosed patients.
Yes, but: The authors’ method requires more study before it can enter clinical practice; for instance, tracking patients of varied genetic backgrounds. The authors are continuing the trial and plan to publish a further analysis after 100,000 patients have been enrolled for two years.
Behind the news: Radiologists have used AI to help diagnose breast cancer since the 1980s (though that method is questionable.) A 2020 study by Google Health claimed that AI outperformed radiologists, but critics found flaws in the methodology.
Why it matters: Breast cancer causes more than 600,000 deaths annually worldwide. This work suggests that AI can enable doctors to evaluate more cases faster, helping to alleviate a shortage of radiologists. Moreover, treatment is more effective the earlier the cancer is diagnosed, and the authors’ method caught more early than late ones.
We’re thinking: Medical AI systems that perform well in the lab often fail in the clinic. For instance, a neural network may outperform humans at cancer diagnosis in a specific setting but, having been trained and tested on the same data distribution, isn’t robust to changes in input (say, images from different hospitals or patients from different populations). Meanwhile, medical AI systems have been subjected to very few randomized, controlled trials, which is considered the gold standard for medical testing. Such trials have their limitations, but they’re a powerful tool for bridging the gap between lab and clinic.