Big Bot Makes Small Talk A research summary of Facebook's Generative BST chatbot

Published

May 27, 2020

Reading time

2 min read

Facebook recently rolled out its entry in the World’s Biggest Chatbot sweepstakes. In keeping with the company’s social-networking dominance, the bot is designed to excel at chitchat on any subject.

What’s new: Led by Stephen Roller, Facebook researchers built Generative BST, a transformer-based model comprising up to 9.4 billion parameters. They trained the bot on their own BlendedSkillTalk dataset of 5,000 conversations among 2,500 people who were instructed to be knowledgeable, empathetic, and generous with personal details.

Key insight: The keys to small talk are personality, knowledge, empathy, and balancing response length (too short shows lack of interest, too long betrays poor listening). BlendedSkillTalk is designed to teach the first three traits. Finding the right response length is a matter of generation strategy.

How it works: Many chatbots generate a set of potential responses and score the best one in a technique known as retrieval. In contrast, generative language models create responses one token at a time, often producing dull or repetitive output. Generative BST combines these approaches in a method called retrieve and refine.

The retriever network reads human dialogue turn by turn and learns to choose actual responses from responses sampled at random. The generator learns to re-create actual responses based on earlier turns.
The retriever predicts minimum response lengths to ensure that they’re conversationally appropriate and discourage repetitive output.
The generator uses beam search to generate a variety of related responses. It creates a set of initial tokens and then adds tokens one at a time based on the context generated so far.
At inference, Generative BST selects the most likely candidate based on the conversation to that point.

Results: Human judges scored the performance of Generative BST and Google’s Meena (see “Toward Open-Domain Chatbots” above) according to Acute-Eval, a chatbot benchmark also developed by Facebook. Sixty-five percent of judges found Generative BST more human-like, while 75 percent found it more engaging. The researchers experimented with various techniques to build variants with different skills. For instance, 70 percent of judges found the version called BST Unlikelihood, which used a different generation approach, more human-like than Meena, but only 64 percent found it more engaging.

Yes, but: The judges’ positive assessment of Generative BST’s human-like qualities relative to other chatbots doesn’t imply that any of them can carry on coherent conversations. You can read some nonsensical turns with Generative BST here.

Why it matters: Generative BST held the record for chatbot parameter count for only a short time before Microsoft announced its 17 billion-parameter Turing-NLG. But its malleable generator remains unique. Other researchers may be able to use this framework to create chatbots with particular qualities and behaviors.

We’re thinking: Facebook’s bot takes Big Tech rivalry to a new level. The Googlers behind Meena reported a conversation (illustrated above) in which their system, considering education for barnyard animals, punned, “Horses go to Hayvard.” The Facebook authors tried out the joke on Generative BST. The bot merely deadpanned: “I don’t get it.”

Subscribe to The Batch