What Language Models Know

Published

Sep 11, 2019

Reading time

2 min read

Watson set a high bar for language understanding in 2011, when it famously whipped human competitors in the televised trivia game show Jeopardy! IBM’s special-purpose AI required around $1 billion and a squadron of engineers. New research suggests that today’s best language models can accomplish similar tasks right off the shelf.

What’s new: Researchers at Facebook AI Research and University College London pitted top-shelf language models against task-specific networks in a Jeopardy!-like challenge they call Language Model Analysis (LAMA). Their LAMA data set provides a large corpus of sentences, each missing a key fact.

Key Insight: The latest language models are pretrained to address a variety of downstream tasks. In learning language representations, they retain knowledge that can be used to complete statements lacking key words.

How it works: LAMA builds its incomplete sentences based on facts drawn from Google-RE (facts from Wikipedia), T-REx (facts aligned with Wikipedia text), ConceptNet (a semantic network), and SQuAD (questions and answers).

LAMA requires models to fill in a missing subject or object. For example, “The theory of relativity was developed by ___.”
The researchers evaluated off-the-shelf versions of BERT, ELMo, and Transformer-XL without further training.

Results: BERT-Large filled in the blanks most accurately overall, and it was best at completing statements based on Google-RE and ConceptNet. It proved only half as accurate as task-specific models on LAMA’s SQuAD portion, which contains more complicated sentences. Similarly, BERT’s performance suffers when T-REx facts contain multiple subjects or blanks.

Why it matters: The Allen institute last week reported using BERT to score better than 90 percent on the multiple-choice questions in the New York Regents science test for the eighth grade. That system included additional task-specific models and retrieved external information to complete tasks. This research suggests that BERT as-is would score well on the Regents test.

Takeaway: Large, pretrained language models can glean and recall nearly as much information — from some data sets, at least — as specially designed question answering models. This knowledge can allow them to accomplish various language tasks, including fill-in-the-blank, without special preparation.

Subscribe to The Batch