Better Than GAN?

Published

Jun 12, 2019

Reading time

1 min read

Generative adversarial networks can synthesize images to help train computer vision systems. But GANs are compute-hungry and don’t always produce realistic output. Now there’s a more efficient and true-to-life alternative.

What’s new: DeepMind introduces an upgrade to the Vector Quantized-Variational AutoEncoder it unveiled last year. VQ-VAE-2 generates images faster than GANs, with finer detail and class labels. An image recognizer trained exclusively on pictures from VQ-VAE-2 classified ImageNet data 15 percent more accurately than the same model trained on GAN-generated images.

How it works: VQ-VAE-2 is a variational autoencoder with modifications:

A typical variational autoencoder represents input images as condensed continuous vectors and uses these representations to reconstruct the images. These compact representations can be randomly sampled during inference to create new images that look realistic.
The first-generation VQ-VAE works the same way, but it maps encoder outputs to a set of closest representations from a “cookbook” rather than a continuous set of possibilities — the process known as quantization. The cookbook is trained jointly with the rest of the model.
Quantization limits the encoder's power, but overall it creates a more stable task for the system to train on.
VQ-VAE-2 splits the representation into two co-dependent processes to handle local and global image features.
The decoder CNN uses both top and bottom representations to construct output images that combine both local and global features.

Why it matters: Although GANs have been improving at a rapid clip, VQ-VAE-2 generates better images using substantially less computation. It also produces more diverse output, making it better suited for data augmentation.

We’re thinking: Advanced generative models could drive advances in fields beyond computer vision. Generative models aren't yet widely used for data augmentation, but if such algorithms can help in small-data settings, they would be a boon to machine learning.Meanwhile, we’ll enjoy the pretty pictures.

Subscribe to The Batch