The generator loss is a combination of discriminator loss for generator and the feature matching loss hence the value of the loss seems high, but this I think is fine since with feature matching we are trying to match the statistics of dataset fed in batches and there is bound to be some loss.
Let’s take a look at how the predictions of discriminator vary with training on real and generated images - Seems like the discriminator is confused and this is essential for training. Also the gradients for the first layer in both discriminator and generator are shown below to give an idea of how an good gradient flow are when the model is setup well.
Ahh the above would have been a beautiful result if I had got that in the first try but the below is what I got. A discriminator loss that just reaches values so small that there was no gradient flowing for the discriminator or values of scale of weights for the generator. The below plots show exactly what I mean. As always looking at the gradients just tells you that there is soemthing wrong with the scale of weights for the generator and the learning is just too slow.
Key takeaways to train a successful GAN:
- Initialization - Can’t stress this enough. I was not able to get any result with xavier initialization or with random normal intialization of deviation 0.2. Looking at the scale of gradients it just made sense to start with much smaller weights so that generator can start moving in the right direction from the beginning. Ended up using weights with random normal initialization with deviation of 0.02.
- Batch normalization - These are supposed to help gradient flow but an improper decay or slow decay on the parameters used in batch norm is again going to hinder gradient flow. 0.9 seemed like the sweet spot.
- Not sure how important this could be but I found that having equal number of layers in both generator and discriminator and having the number of weights at a ratio (approx 0.5) that doesn’t make one model lag was helpful for training.
- As for the dataset having inputs that similar traits to some extent seemed important - What I mean is trying to create varied outputs like cats and buses without doing a conditional generation is more complicated in terms of training.
- For experimental purposes start with a small dataset and see if the gradients and initializations are in the right place for the model to train.
Results
CelebA
Flowers
Sampling random points in latent space
Now lets try taking a random walk along just one hidden space dimension.
Dimension 25
Dimension 50
Dimension 75
From what I was able to observe it seems like the samples from latent space when making random walks generate different image depending on whether we are on the negative or positive side of the dimension. This can be seen from the above results as well.
Edit: A feedback I got on the post for the above statement was that while walking the latent space can get you from one image to another, there can be significant semantic relationships between those images (e.g. the same subject under a change of lighting, a rotation, etc.). Some dimensions can encode information about the geometry of the “scene”, and others may encode information about how that “scene” is rendered. Of course, the features are anonymous, so some finessing and reverse-engineering is needed to figure that out. (credits to reddit u/Ameren)
I agree completely with the comment and do believe the relationships captured by the latent space do get complicated and less interpretive - I was just trying to point out the interpretation for the dimensions shown above.
Code for DCGAN in tensorflow can be found at TensorflowProjects/Unsupervised_learning
As always let me know if you have comments or ideas.
References and Useful links
- Newmu/dcgan_code
- carpedm20/DCGAN-tensorflow
- Soumith’s eyescream blog post
- OpenAI’s Generative models
- Image completion post by Brandon Amos
You May Also Enjoy
6 minute read
When working with language models, the way we phrase our prompts can dramatically affect the quality and usefulness of the responses we receive. In this post we explore the dichotomy in setting up prompts - being open-ended vs descriptive - and understand the implication of each. Read more
4 minute read
When we think about Multi-Layer Perceptrons (MLPs), we often visualize them as interconnected neurons processing information. However, there’s an elegant alternative perspective - viewing MLPs as hashing functions that partition input space and mapping functions on these partitions. Read more