At the recent NVIDIA GPU Technology Conference (GTC) 2019, Synced reported on a ‘magical brush’ app that could transform simple line drawings and sketches into realistic landscapes. GauGAN enables users to not only control the semantic content but also the style of the generated image. NVIDIA has now open-sourced the model behind the stunning images.
Changing semantic content
Changing image styles
NVIDIA’s simple tool allows anyone to build their own “magical brush.” The re-implementation guide on Github includes detailed installation steps covering dataset preparation, training, and inference.
The paper’s authors recommend COCO-Stuff, Cityscapes or ADE20K as the training dataset, and a few sample images from COCO-stuff are included in the code repo for users to experiment with. There is also a pre-trained model available for quick deployment and testing.
Those who want to reproduce the results all by themselves will probably need NVIDIA sponsorship, as the model was trained on an NVIDIA DGX1 machine with 8 V100 GPUs.
Users can control both semantics and style when synthesizing an image
The algorithm behind GauGAN is Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE), an improved solution for normalization layers.
Common normalization methods such as Batch Normalization learn the Affine layers after the normalization step, and so semantic information from the input tends to be “washed away.” SPADE learns the Affine layer directly from the semantic segmentation map so that the input semantic information can be kept and will act across all layer outputs.
Difference between Batch Norm and SPADE
The paper Semantic Image Synthesis with Spatially-Adaptive Normalization has been accepted by CVPR 2019 for oral presentation.