Deep Unlearning AI Researchers Teach Models to Unlearn Data

Published

Sep 01, 2021

Reading time

1 min read

Privacy advocates want deep learning systems to forget what they’ve learned.

What’s new: Researchers are seeking ways to remove the influence of particular training examples, such as an individual’s personal information, from a trained model without affecting its performance, Wired reported.

How it works: Some researchers have experimented with preparing data prior to training for potential removal later, while others have worked to remove the effect of selected examples retroactively.

Researchers from the Universities of Toronto and Wisconsin-Madison developed a training method called SISA in which different versions of a model are trained on non-overlapping subsets of the same dataset. During inference, they combine the predictions from each model via majority vote. This makes it possible to remove selected training examples and retrain only the model associated with their subset.
A team at Harvard, Stanford, and University of Pennsylvania later showed that SISA would fail to remove the influence of data if the requests to do so weren’t randomly distributed. The team mitigated this problem by introducing noise in the training algorithm based on ideas from differential privacy.
Researchers from Google, Cornell, and University of Waterloo showed how to remove the impact of a training example on a model’s weights if its loss function meets certain mathematical conditions.

Behind the news: Evolving data privacy laws could wreak havoc on machine learning models.

The European Union’s General Data Privacy Regulation includes a “right to be forgotten” that could force companies retroactively to remove the influence of specific data from trained models, some observers argue.
California’s Privacy Rights Act gives citizens the right to know how their data is being used and request that it be deleted, even if it has been sold to a third party.

Why it matters: Enabling models to unlearn selectively and incrementally would be less costly than retraining repeatedly from scratch. It also could give users more control over how their data is used and who profits from it.

We’re thinking: Wait … what was this article about?

Subscribe to The Batch