Images produced by generative adversarial networks can be nearly indistinguishable from the real thing. If a picture is worth a thousand words, a GAN-made 3D representation could be worth a million.
What’s new: Gadelha et al. at the University of Massachusetts Amherst developed a neural network, Projective Generative Adversarial Network, that generates 3D models of novel objects from any number of 2D silhouettes.
Key insights: PrGAN’s discriminator relies on 2D images rather than 3D shapes.
- Generating 3D models solely from 2D images requires fewer parameters than earlier approaches like MIT’s 2016 3D-GAN, which is directly trained on hand-built 3D models to produce novel 3D shapers. That means a smaller memory footprint and less training time. And since PrGAN is trained on images rather than 3D models, training data is more readily available.
- Many algorithms that produce 2D views of 3D objects are incompatible with backpropagation, making training on such a system on 2D images impossible. The math behind PrGAN’s projection module allows backprop to be used for training.
How it works: All GANs have a generator that produces new output and a discriminator that learns to classify that output as real or fake. PrGAN has a generator that creates 3D shapes. Its discriminator compares 2D images of those shapes with real-world pictures. A projection component creates the 2D views.
- The generator is a CNN trained to construct 3D shapes from random inputs. The resulting 3D representation is voxelized, or built from small cubes, rather than a polygonal mesh.
- The generator passes a 3D shape to the projection module, which computes a 2D view from a random viewpoint. The real-life 2D silhouettes and generated 2D views are used in training the discriminator. If the discriminator correctly classifies generated 2D views as fake, the generator is adjusted to produce more realistic results.
- PrGAN’s projection algorithm and shape generator can be trained with depth maps, color schemes, and segmented images rather than binary silhouettes. The additional information allows more detailed, diverse shapes.
Why it matters: The graphic design industry is looking toward deep learning to enhance productivity. PrGAN simplifies creation of 3D models from 2D images. This could be a great help to designers and game makers looking to generate 3D representations quickly.
We’re thinking: Deep learning consumes huge amounts of data. PrGAN takes advantage of plentiful 2D images to produce 3D representations, which are far less common. This sort of data-set bootstrapping could be an alternative in situations where training data is scarce.