A United States federal judge threw ChatGPT’s legal research out of court.
What’s new: An attorney who used ChatGPT to generate a legal brief faces disciplinary action after opposing lawyers discovered that the brief referred to fictional cases and quotations invented by the chatbot, The New York Times reported.
Citation situation: The lawyer, Steven A. Schwartz, was assisting in a personal injury lawsuit on the plaintiff’s side in a federal court in New York City. When the defendant appealed to have the case dismissed, Schwartz countered with a brief based on results from a ChatGPT query.
- Schwartz asked the model to find similar cases in which rulings had favored his client’s argument. It cited six cases and offered quotations from the rulings. He asked the model to verify that the cases were real, and it responded with variations of “The case does indeed exist and can be found in legal research databases such as Westlaw and LexisNexis.”
- He and his co-attorneys filed the resulting brief to the court. The defendant’s lawyers, upon reviewing the document, notified the judge that they were unable to find further information about any of the cases.
- When the judge sought clarification, Schwartz filed a statement admitting to the error and expressing regret. He had never used ChatGPT before, he said, and did not know it was unreliable.
- Schwartz and his firm’s lead lawyer on the case face an in-person disciplinary hearing on June 8.
Ripple effects: In the case’s wake of this case, a federal judge in Texas decreed that lawyers in cases before him may use generative AI to write their briefs only if they file paperwork stating that they manually verified the output for accuracy.
Why it matters: Within the AI community, it may be common knowledge that large language models sometimes confidently state falsehoods as though they were true. Among the general public, though, this fact may not be so well understood. Schwartz’s mishap is a painfully public demonstration of what can happen when people trust such models to supply facts.
We’re thinking: People outside the AI community might reasonably assume that the technology is qualified to assist in legal research. After all, in April, GPT-4, the large language model behind the most powerful version of ChatGPT, reportedly ranked in the 90th percentile on a U.S. bar exam. (A recent reappraisal revised GPT-4’s score downward to between the 68th and 48th percentiles.) This goes to show that AI performance on these tests doesn’t necessarily map well to human performance, since any junior law student would know not to invent cases. There’s important work to be done to apply LLMs to legal work. Meanwhile, we urge researchers who are testing LLMs’ ability to meet real-world qualifications to resist hype when reporting their results.