Un-Redacting Mueller

Published

Apr 24, 2019

Reading time

1 min read

Last week’s release of the redacted Mueller Report prompted calls to fill in the blanks using the latest models for language generation. A fun test case for state-of-the-art natural language processing—or irresponsible deepfakery tailor-made for an era of disinformation and paranoia?

A really, really bad idea: University of Washington computational linguistics professor Emily M. Bender unleashed a tweet storm explaining why machine learning engineers should resist the temptation. Using AI to “unredact” the report would:

encourage unrealistic notions of what AI can achieve
create confusion about the report's actual contents
unduly influence discussion about the unredacted document, should it become available
create controversy around any names inserted by the AI

What to do instead: For people interested in applying language generation in ways relevant to politics, Bender suggests working on rumor detection or “tools that might help users think twice before retweeting."

Takeaway: A language model knows only what’s in the data it was trained on. It can’t possibly know what the report's redactors hid from view, and it can't reason about it. Given the state of today's machine learning tech, a well informed human would make far better guesses about what’s missing.

Subscribe to The Batch