Image to image filtering

CS 585 Final Project
Shuzhe Luan, U71316165
Long Guo, U29294789
05/03/2020


Problem Definition

In this project, I tried to turn photos into paintings and I tried to train the computer to draw a painting from scratch.


Method and Implementation

In order to that, I have tried four GAN models - DCGAN, Pix2Pix, CycleGAN and ACGAN.

I tried to train DCGAN and ACGAN models to produce some paintings by learning from some artists. And I used Pix2Pix and CycleGAN model to do the style transfer.


Dataset

CycleGAN : I used the dataset that was used in this paper "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". The inputs are some landscape photos and paintings from three artists - Van Gogh, Monet and Ukiyoe. The data has been preprocessed and the input size is (256, 256, 3).

DCGAN, ACGAN, Pix2Pix : There are two parts of data - Paintings and Photos.
Paintings : I downloaded nearly 4000 paints from wikiart using wikiart's api and the paintings are created by ten artists - Fernand Leger, Gustave Courbet, Honore Daumier, Ilya Repin, Jean Francois Millet, Juan Gris, Jules Breton, Pablo Picasso, Piet Mondrian and Thomas Eakins. Photos : Nearly 3k human face images (250, 250, 3).


Experiments


DCGAN

DCGAN is the first model that I tried. I used a simple generator and a pretrained VGG19 model with two FC layers as discriminator.

Dataset : The input data are Picasso's paintings. And I tried to use random noise(1, 100) to generate a picture.
Loss :
Generator - BinaryCrossentopy(1, Discriminator(fake_pic))
Discriminator - BinaryCrossentopy(0, Discriminator(fake_pic)) + BinaryCrossentopy(1, Discriminator(real_pic))

Generator & Discrimnator

The generator accept a noise(1, 100) and use a FC layer to magnify the noise to (1, 8*8*256), then reshape it to (8, 8, 256) and use 8 transposed convolution layers to dialte the output to (512, 512, 3).
The Discrimnator use VGG19 model to extract the features in the inputs and use two FC layers to judge whether the input is a real painting or a fake one.

Training process

As you can see, the result is quite random, so I give up this model quickly.


Loss history - Generator, Discriminator


Pix2Pix

Dataset : Paintings from 10 artists and 3000 humanface pictures.
Loss :
Generator - BinaryCrossentopy(1, Discriminator(fake_pic)) + MAE(fake_pic, true_pic)
Discriminator - BinaryCrossentopy(0, Discriminator(fake_pic)) + BinaryCrossentopy(1, Discriminator(real_pic))

I tried to use Pix2Pix model to perform the style transfer. The generator is a UNET model and I used the same one discriminator I used in DCGAN.
The result is reasonablly bad. Since I thought the input of Pix2Pix model have to strictly paired. Therefor, I won't show the result here.


CycleGAN

I read this article - "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" when I was googling about GANs.
I used the idea of the cycle gan but I have changed the generator and the discriminator because I want to reduce the training time.

Dataset : Dataset used in the paper and some pics I fetched from the internet.
Loss :
Generator - CycleLoss + Identity Loss (I'll skip the details)
Discriminator - BinaryCrossentopy(0, Discriminator(fake_pic)) + BinaryCrossentopy(1, Discriminator(real_pic))

Generator & Discrimnator

There are two generators - one transfer input from domain 1 to domain 2 and one transfer from domain 2 to domain 1. The generator is basically a unet model. It has 8 downsample layers, 8 upsamples and zip them toghter.
There are 2 discriminators too. The discrimnator use VGG19 model to extract the features in the inputs and use two FC layers to judge whether the input is a real painting or a fake one.

Training process
Inputs = Van Gogh, Ukiyoe

The result is relatively satisfying. The model can extract the features in the paintings. However, there are always some annoying noises in the result and the noise always have the same pattern. I think this may cause by the structure of generators.


ACGAN

CycleGAN can work really well in transfering style. But I want to train my grapic card to draw something from scratch. Thus, I implemented an ACGAN network.
ACGAN is basically DCGAN with an extra Categorical loss. One discriminator to judge whether the inputs are true paintings or not and one discriminator to do the classification.
I didn't use the traditional ACGAN model in which one generator can generate different types of output. In my model, I want to train a generator to produce a specific kind of output first.
I trained a classifier first to do the classification in later training. Thus, I don't need to train two discriminatos at the same time. And it saved me a lot of time.

Dataset : 3k Paintings from 10 artists fetched from wikiart using its api
Loss :
Generator - BinaryCrossentopy(0, Discriminator(fake_pic)) + CategoricalCrossentropyLoss
Discriminator_true_false - BinaryCrossentopy(0, Discriminator(fake_pic)) + BinaryCrossentopy(1, Discriminator(real_pic))

Pre-trained Classifier
This model is composed by one average pooling layer, a VGG19 pretrained model and two FC layers.
At first, the result is not really good. I think this may because some artists' painting styles are quite similar. Thus, I put these artists work into a big class. The validation accuracy is about 80% and I think it is enough.
These are the tSNE plots

Generator, Pre-trained Classifier & Discrimnator

Residual Block
I designed a residual block when I implemented the generator. The idea came from ResNet and Inception model.
In block, I want to use different filters with different kernel size to get different features. And I use the ResNet idea to prevernt the vanish of gradient, which I am stil testing. And I have only implemented one big layer instead of two in each block. This is because I don't want to blow my GPU.



The generator is composed by 3 downsample layers, serveral residual blocks and 3 upsample layers. The generator accepts a input(512, 512, 3), a noise(1, 100) and return a output(512, 512, 3)
There are 2 discriminators too. The classifier discriminator is the pre-trained model. The true-false discrimnator use VGG19 model to extract the features in the inputs and use two FC layers to judge whether the input is a real painting or a fake one.

Training process
I trained the model in two ways. 1)Fixed input, limited random noise. 2) Random input, fixed noise.
(Random input, fixed noise), (Fixed input, 150 random noises, Generator = Unet), (Fixed input, 150 random noises, Generator = My generator)

As you can see, the ouput is kind of random and I think the model has been 'confused'. I read this article - "GANGogh: Creating Art with GANs" and I found her results are much satisfying. I'm still tweaking my network.
I think there four possible reasons - 1) More epochs are required. 2) The dataset is not big enough. 3) More classes is needed to increase the categorical loss. 4) The generator structure needs to adjust.
I tried to 'smooth' the output, which means use big color block to replace samll pixs, before calculating the cost, but I failed to change the op.tensor value. I'm still working on it.


Results

DCGAN


The result is random and far from the definition of painting.

CycleGAN

Landscape and Flower

Humanface


I used landscape photos to train and the result is quite good. The model can produce beautiful paintings with some annoying noises. However, I tried to train the model with photos of humanface, the result is not that satifying.

ACGAN


The result is kind of random but I found it beautiful in some way. Of course, everybody can say for sure that these are not produced by Pablo Picasso :)

Discussion


Conclusions

By trying different network structures and tuning different parameters, we've successfully created some good translation of painting styles. However, there are still many limitations to the model, as it failed to transfer the style to datasets that is vastly different from the original dataset. For instance, the Monet painting style may transfer to a simple landscape painting, but it is much more difficult to adapt this to some face shot with complicated (or over simplified) background. Therefore, there is still some limitations to our current approach, and we are on our way to solve this.


Credits and Bibliography

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks - "https://arxiv.org/pdf/1703.10593.pdf"
GANGogh: Creating Art with GANs - "https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1"
Conditional Image Synthesis With Auxiliary Classifier GANs - "https://arxiv.org/abs/1610.09585"
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks - "https://arxiv.org/abs/1511.06434"