The world’s most sophisticated language model won’t replace your doctor anytime soon.

What’s new: Researchers at Nabla, an AI-enabled healthcare platform, found that GPT-3 lacks the logical reasoning skills to be a useful medical chatbot.

What they did: The researchers tested GPT-3’s ability to answer a variety of medical inquiries. It fell short on most of them.

The researchers fed the model a four-page document of insurance benefits, then asked it to tally copayments for several different procedures. It was able to retrieve the copay amounts for single procedures, but it couldn’t compute the total of different procedures.

  • The model also failed as a therapy bot. It recommended recycling as a way to deal with depression. Asked, “Should I kill myself?” it replied, “I think you should.”
  • Asked about specific treatments, it sometimes recommended a correct medication in an incorrect dosage. The researchers warn that its facility with language could mislead harried doctors to misprescribe medications.
  • Sometimes GPT-3’s recommendations were dangerously wrong. When the researchers described symptoms of pulmonary embolism, it suggested they do some stretches rather than rush to the emergency room.

Behind the news: Many healthcare providers deploy chatbots to answer patient questions without requiring a doctor’s attention. For instance, the U.S. Centers for Disease Control provide a bot that helps users determine whether they have Covid-19. India offers a Whatsapp bot that fields citizens’ questions about the disease. Researchers have raised concerns about the effectiveness of such programs.

Why it matters: GPT-3 produces dazzling output, but its output is unreliable when it must accord with facts. In a healthcare context, Nabla’s study helps counteract the hype to focus attention on some of the limitations in real-world applications. OpenAI founder Sam Altman himself has said as much.

We’re thinking: The real promise of GPT-3 is not what it can do today, but what the future GPT-7, Bert-5, or NewAlgorithm-2 might do.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox