The Telltale Artifact A technique for detecting GAN-generated deepfakes

Published

Sep 30, 2020

Reading time

2 min read

Deepfakes have gone mainstream, allowing celebrities to star in commercials without setting foot in a film studio. A new method helps determine whether such endorsements — and other images produced by generative adversarial networks — are authentic.

What’s new: Lucy Chai led MIT CSAIL researchers in an analysis of where image generators fool and where they fail. They developed a technique to detect portions of an image that betray fakery.

Key insight: Large-scale features of generated images are highly varied, but generated textures contain consistent artifacts. Convolutional neural networks (CNNs) are especially sensitive to textures, which makes them well suited to recognizing such artifact-laden areas. A CNN tailored for analyzing small pieces of images can learn to recognize parts dominated by generated textures.
How it works: The authors built classifiers that survey images one patch at a time. They ran the classifiers on output from StyleGAN, Glow, and a generator model based on Gaussian mixture models (GMMs). They averaged the patchwise classifications to analyze each GAN’s vulnerability to detection.

The authors created a dataset of images generated by a Progressive GAN trained on the CelebA-HQ dataset of celebrity portraits.
They modified Resnet and Xception architectures to classify patches of user-determined size and trained them on the generated images. They removed the deeper layers, which analyze larger image areas, to concentrate the models on fine details.
They used the classifications to produce heatmaps of image areas recognized as generated (blue) or not (red). Predominantly blue images were deemed to have been generated.
By averaging the heatmaps over many images produced by the same GAN, the authors were able to identify the areas where that model is especially prone to leaving artifacts. For instance, StyleGAN and Glow generated high concentrations of artifacts in facial details, while GMM tended to generate them in backgrounds.

Results: The authors’ best classifier achieved 100 percent average precision on StyleGAN output and 91.38 percent on GMM. These scores outperformed non-truncated MesoInception4, Resnet-18, Xception, and CNN models, which achieved average precision between 99.75 and 73.33 percent. On Glow, the authors’ best classifier achieved 95 percent average precision, whereas the best full model scored 97.32 percent.

Why it matters: The better GANs become, the more useful they can be for both good and ill. In shedding light on areas where particular GANs produce more artifacts, this work illuminates pathways for researchers to improve them. But it also provides a map for malefactors to make their activities harder to detect. In fact, when the researchers trained a GAN to fool their classifiers, accuracy fell to less than 65 percent.

We’re thinking: Building a discriminator that recognizes a particular generator’s output is easier than building a good generator. In fact, GAN researchers routinely degrade discriminators to give the generator a fighting chance to fool it. But social media platforms, among others, would like to catch all generated images, regardless of the generator that produced them. Looking for common artifacts offers a promising approach — until a variety of generators learn how to avoid producing them.

Subscribe to The Batch