Collaborative Text Generator A language model that collaborates with human writers

Published

Apr 05, 2023

Reading time

3 min read

Text from current language models can be useful as a rough draft, but that leaves the polishing to human writers. A language model learned how to generate and respond to editorial directions.

What’s new: Timo Schick and colleagues at Meta proposed Plan, Edit, Explain, and Repeat (PEER), a text generator designed to collaborate with human writers.

Key insight: Data that demonstrates the motivations, execution, and results of editing is hard to come by. Wikipedia, in which every article includes a history of edits as well as comments on them, comes close, but an editor trained solely on Wikipedia would be limited to encyclopedia-style text. However, a model trained on Wikipedia to undo revisions can synthesize a supplemental dataset of unrevised and revised examples. Applying the undo function to varied text can generate synthetic “unedited” drafts for training the editor.

How it works: PEER comprises four T5 large language models: PEER-Edit (which executed revisions), PEER-Undo (which undid revisions), PEER-Explain (which explained revisions), and PEER-Document (which generated synthetic primary-source documents as a basis for revisions). The authors trained them on Wikipedia, 6.9 million examples that include texts before and after a revision, a revision plan (a directive to revise the text, such as “add information about the scandal”), an explanation (a reason for the revision, which may duplicate the revision plan), and cited documents (primary sources on which the text is based).

Given an unrevised text and three cited documents, PEER-Edit learned to generate a revision plan and the revised text.
PEER-Undo took the revised text and the same cited documents, and learned to generate the revision plan and unrevised text.
PEER-Explain took the unrevised text, revised text, and cited documents and learned to generate an explanation.
PEER-Document took the unrevised text, revised text, and revision plan and learned to generate one of the documents.
The authors used the trained models to generate synthetic datasets based on articles in Wikinews (crowdsourced news articles) and StackExchange (questions and answers on topics including cooking, gardening, and politics). Using PEER-Undo, they generated synthetic unrevised texts to be paired with the published articles. PEER-Explain and PEER-Document generated the plans and documents.
They further trained PEER-Edit on the generated datasets as well as Wikipedia.
At inference, PEER-Edit took in unrevised text and generated a plan and a revised text. To collaborate with humans, it can either revise a text based on a user’s plan or generate a plan for a user to execute. Users can perform these tasks in any combination, any number of times.

Results: The authors evaluated PEER-Edit using SARI, a measure of similarity between two revised versions of a text relative to the unrevised original (higher is better). Comparing generated revisions to ground-truth revisions of Wikinews, the Wikipedia-trained PEER-Edit (175 billion-parameters) achieved 49.3 SARI, and the same architecture trained on the synthetic Wikinews dataset achieved 51.6 SARI. Both were more similar to the human revisions than was the unrevised text, which achieved 32.8 SARI. They also evaluated PEER-Edit on six tasks such as grammar correction and removal of biased words. Averaged across these tasks, a 175-billion parameter model achieved 44.3 SARI and a 3 billion-parameter version achieved 43.6 SARI. Prompted to perform the same tasks, InstructGPT (1.3 billion parameters) achieved 39.4 SARI, and Tk-Instruct (3 billion parameters, fine-tuned to correct grammar and simplify text) achieved 23.5 SARI.

Yes, but: Text generators can produce factually false statements. While PEER-Edit sometimes corrected misinformation, it also fabricated falsehoods, which it backed up by fabricating citations.

Why it matters: Training text generators to provide explanations for their decisions and citations for the facts they use may lead to more interpretable models.

We’re thinking: The raw output of generative models is fun and exciting, but imagine their potential as collaborators with creative people!

Subscribe to The Batch