Our goal of this project is to use GAN networks to translate photos into a painting-style image. This image translation is being known as a filter of the image, and is used on many popular image processing app and social media, like Instagram, and Prisma.
We first plan to implement an translation of portrait face photos. The image resolution of the CelebA dataset is too low (64*64), so we use another dataset called Label Faces in the Wild (LFW) by UMass Amherst that provided images in 256*256.
We retrieved paintings of various artists from Wikiart.org, mainly containing works from Monet and Van Gogh.
Later on, we found that the difference between the portrait dataset and paintings is too substantial that it is difficult to get a good result on these face photos. We used another dataset that consists photos of flowers and landscapes for better results.
We tried many different GAN structures to test the performance of them.
DCGAN is a basic GAN network that comes with one generator and one discriminator. Due to the simplicity and unidirectional transitioning, the network provided relatively random results.
In CycleGAN, two discriminators and generators are used for bidirectional training. The cycle loss that computes the L1 norm of "cycled" generated result and original image, and identity loss that encourage the mapping to preserve color composition between the input and output are observed. We tried using ResNet and U-Net on generators. In discriminator, PatchGAN is used to focus on details of images.
In ACGAN, a classifier (auxiliary classifier) is used to assist the discriminator to classify the inputs into different categories. This is used to find a more meticulous categorizing and tuning for different types of inputs. A ResNet-inspired generator is adapted with two concatenated convolution layers.
The majority of testing and tuning was conducted in CycleGAN.
In CycleGAN, the loss of generators and discriminators did not change too much over time, however, the predicted image may still evolve slowly by epoch. On some cases, the network generated good results; yet on some data points, there are patterned noise on generated images.
Below are some selections of good results.
As for face datasets, it is more difficult to obtain a good result from a different setting. It is harder to transform the textures from objects to faces.
We were still testing the structure of ACGAN, as the network can only output some randomized and weird results so far, based on a fixed input, or a fixed noise.
By trying different network structures and tuning different parameters, we've successfully created some good translation of painting styles. However, there are still many limitations to the model, as it failed to transfer the style to datasets that is vastly different from the original dataset. For instance, the Monet painting style may transfer to a simple landscape painting, but it is much more difficult to adapt this to some face shot with complicated (or over simplified) background. Therefore, there is still some limitations to our current approach, and we are on our way to solve this.
Zhu, Park et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. https://junyanz.github.io/CycleGAN/