Style and Substance An improved GAN technique for style transfer

Published

Sep 30, 2020

Reading time

2 min read

GANs are adept at mapping the artistic style of one picture onto the subject of another, known as style transfer. However, applied to the fanciful illustrations in children’s books, some GANs prove better at preserving style, others better at preserving subject matter. A new model is designed to excel at both.

What’s new: Developed by researchers at Hacettepe University and Middle East Technical University, both in Turkey, Ganilla aims to wed photographic content and artistic style for illustrations in children’s books. It converts photos into virtual artwork in the styles of 10 published children’s book illustrators, including favorites like Patricia Polacco and Kevin Henkes, while staying true to scenes in photos.

How it works: Ganilla is almost identical to CycleGAN except for a specially crafted generator.

The researchers divided their generator into a downsampling stage and an upsampling stage.
The downsampling stage is a modified Resnet-18 with additional skip connections to pass low-level features, such as textures and edges, from one layer to the next.
The upsampling stage consists of layers of transposed convolutions that increase the size of the feature map and skip connections from the downsampling stage. The skip connections in this stage help preserve subject matter without overwriting style information.
The authors trained the model on unpaired images from two datasets. The first contained nearly 5,500 images of landscape scenery, the second hundreds of works by each of 10 illustrators.

Results: There’s no way to measure objectively how well a model generates landscapes in specific artistic styles, so the authors used quantitative and qualitative approaches to compare Ganilla’s output with that of a CycleGAN, DualGAN, and CartoonGAN trained on the same data.

They trained a pair of CNNs to assess the GANs’ proficiency at transferring style (trained on small portions of images from each artist) and content (trained on full-size photos). The style classifier scored CycleGAN highest, while the content classifier gave DualGAN the edge. Ganilla ranked highest when style and content scores were averaged.
The researchers asked 48 people to (a) rate whether each GAN-made illustration looked like the illustrator’s work, (b) describe what they thought the picture showed, and (c) rank generated images in terms of overall appeal. They scored Ganilla’s output highest for mimicking the human illustrators and depicting the source content. However, they rated DualGAN’s output slightly more appealing.

Yes, but: Based on examples in the paper, the training illustrations tended to be heavy on stylized human and animal characters, while the photos contain very few characters. We’re curious to see what Ganilla would do with more photos of people and animals.

Why it matters: GANs are powerful creative tools, and — like printmaking and photography before them — they’re spawning their own adversarial dynamic in the arts. Artists working in traditional media have raised concerns about GANs being trained to make derivatives of their work. Now, digital artists are accusing traditional artists of creative theft for making paint-on-canvas reproductions of their AI-abetted digital compositions.

We’re thinking: When it comes to art, we favor GANs as a creative partner.

Subscribe to The Batch