I recently read the CycleGAN paper (link), which I found very interesting because CycleGAN models have the incredible ability to accurately change images into something they’re not (e.g. changing a picture of a horse into a picture of a zebra). Very cool. Let’s dive into how it works.
Some of CycleGAN’s applications (left to right): changing a Monet painting to a real-world picture, changing zebras to horses, changing a picture of a location in the summer to a picture of the same location in the winter. All of the applications can be reversed as well. Image credits to Zhu et al., the authors of the CycleGAN paper.
GANs. Generative Adversarial Networks — the base idea for CycleGAN. GANs have a special loss function: adversarial loss. Basically, the GAN is made up of two different machine learning models. First, there’s the generator. This model is the one that generates images (as the name implies). There’s also the discriminator; its purpose is to try to differentiate between real images and fake images that are given to it by the generator. The adversarial loss is the function that the overall GAN tries to optimize. Think of it as the generator and discriminator fighting over which one is the better model. When the loss is optimized, the generator is (in theory) creating images that the discriminator can’t distinguish from real images. The fundamental principle is that the GAN is able to generate fake images of anything, given that it has enough existing examples of that thing. I previously used a simple GAN to generate images of shoes, which you can find here.
Representation of a GAN that generates an image of a shoe.
CycleGAN. What CycleGAN does differently from a standard GAN is that it doesn’t generate images from random noise. It uses a given image to get a different version of that image; this is the image-to-image translation that allows CycleGAN to change a horse into a zebra. However, image-to-image translation is not a feature that is unique to CycleGAN; CycleGAN is one of the first models to allow for unpaired image-to-image training. What that means is that you don’t have to have a picture of a horse and a picture of what that horse would look like as a zebra in the dataset. Instead, you can have a bunch of horses and a bunch of zebras separately. This is useful in situations where you aren’t able to get paired data (e.g. if you wanted to be able to generate a version of a zebra from a horse, it’s not reasonable to paint each horse like a zebra in order to get paired data). The feature of unpaired image-to-image translation is very powerful and has lots of applications.
The CycleGAN paper’s example of paired vs. unpaired data. Paired data is from changing a pencil sketch of an object to its real life counterpart. Unpaired data consists of images vs. paintings. Image credits to Zhu et al., the authors of the CycleGAN paper.
Applications. Apart from changing a horse to a zebra, there are far more ways to apply CycleGAN. It’s a very versatile model, so versatile that it can also be used to change an apple into an orange! Yeah not the strongest example, so here are some other ones.
- Generating a realistic rendering of what a building would look like based on its blueprints.
- Creating an image of how a location would look in each different season.
- Changing paintings to be a real image.
- Rendering a realistic representation of how a suspect’s face would look based on a police sketch.
- Enhancing aspects of photos to make them look more professional.
Convenience. It’s not enough to have a bunch of applications, it also needs to be very easy to use. CycleGAN is extremely usable because it doesn’t need paired data. It’s often pretty difficult to get a large amount of accurate paired data, and so the ability to use unpaired data with high accuracy means that people without access to sophisticated (and expensive) paired data can still do image-to-image translation. Great news for machine learning enthusiasts like me! You can use CycleGAN through the official repository on GitHub.
I’ve listed some other resources below that may be of interest.