TowardsMachineLearning

Introduction to GANs

Introduction to GANs in Computer Vision

Introduction to GANs in Computer Vision-

In the previous article we have discussed about the Adversarial examples, Attacks and defenses. If you haven’t checked the previous article yet, then click here to read that to get better understanding of the topic before we go bit advanced in this topic.

Furthermore, you can click here to watch full live session recording on GANs.

1_2fw4xCW_lMKLKZDhqbT3Ng.png

So, In this article we will learn about the Generative models. As the name suggests Generative models are those which are used to generate the data, there are different types of generative models like variable auto encoders (VAES), GANs and many more. In this article we will primarily focus on the Generative adversarial networks (GANs) as this is the most dominated as compare to others.

Table of the content-

  1. Comparison between Generative Models and discriminative models
  2. GANs : concept, types, mathematics
  3. Different Application of GANs
  4. Deep Convolutional GAN (DCGANs) : Implementation

Generative models Vs Discriminative Models-

A generative model describes how a dataset is generated, in terms of a probabilistic model. By sampling from this model, we are able to generate new data while the discriminative model is used for the classification purpose; they are also called as Classifier.

bbb.JPG

In the above figure, you can see discriminator take input X as the feature and then predict the class Y, in terms of probability it is given as conditional probability of Y with respect to X { P(Y|X) }, while in generator we give some random noise and class Y as input and respectively we get the image X, in terms of probability it is given as P(X|Y).

In short discriminator is good in classifying and generator works is to generate the images.

Let’s look at some images generated using the generative models;

Image a.

Image b.

Image a. shows the generated image of Dogs and Image b. shows the generated images of human using GANs.

Let’s start discussing about the GANs;

Generative Adversarial models (GANs)

generative adversarial network is a deep-learning-based generative model used for unsupervised learning proposed by Ian Goodfellow and his colleagues in 2014.

It consists of two networks generator and the discriminator.

  • Generator:  Model that is used to generate new adversarial examples from the problem domain.
  • Discriminator: Model that is used to classify examples as real (from the domain) or fake (generated).

Both the network complete against each other, the generator tries to fool the discriminator by generating data similar to those in training set whereas the discriminator tries not to be fooled by identifying the fake samples generated by generator.

Till now, we have seen the overview of the GANs, now let’s move forward to see what we actually want from GANs.

Let’s see the probability distribution:

40.JPG

The above figure shows you the images sampled from the real data distribution, and we tried to plot the distribution on the 2D coordinates, you can see the distribution in 2D coordinates.

41.JPG

Now, the above images are sampled from the GANs which was not trained much and can’t able to generate the data like real world, you can see the distribution on 2D coordinates. So our target is to match the distribution of the generated data to real data

Now, the question is how can we train Generator to generate images that looks like the true data distribution?

Initially our generator is untrained and generating some random noise, you can see it in the above image

So, in order to train the generator we need to use the discriminator, let’s see

Steps to train the Generator:

  1. we first give a random noise to generator and it’s generating any image (irrelevant)
  2. now, we will create a discriminator and take the real images database, all the images under this data base will be labeled as 1 while the generated one will be labeled as 0
  3. Now the discriminator work is to classify the image, if the images are from the real data it has to classify it as 1 and if it is from generator it has to classify it as 0.
  4. Now, for a moment suppose your discriminator is good in classifying the real and generated images ( although the training of both the networks will be simultaneously, we will get back to it shortly) now, the generator will generate some random images and it will be send to discriminator, the D will classify it as 0, then we backpropagate all the gradients back to generator, and generator will try to modify the pixels(minimize the generator loss) to make the image classify as 1,
  5. the procedure will continue until the discriminator will gives a output as 0.5 , by 0.5 means at that time the discriminator is not able to distinguish whether the images is from real data or generated data, that what we want.

So, this is a basic intuition behind the training, now few insights:

  1. I mentioned both the networks needed to be trained simultaneously, the reason behind is that generator is basically learning from the discriminator, so if your discriminator is very-very good in initial phase then it always going to classify the image from generator as 0 and the generator loss may fail to get converge.
  2. The most important role is of discriminator, if discriminator is bad, then also we will not get the desired result.
  3. And it’s also important that G should improve as D improves or vice versa.
  4. From the above points it’s clarified that discriminator can’t be too good and too bad in initial phase and G&D should be improved altogether, so what we basically do for a single iteration we train one time G and k times D, and this method will helps us to train the GANs well.

Now let’s look at the cost function of Generator and discriminator

  • The cost of Discriminator: The loss function will be cross entropy, for the first part its saying the D should correctly label real data as one, while the second is little modified as it is a summation over the generated images which need to label as 0 by D as all the images in real database is 1 and all the images coming from the generator will be 0.
  • The cost of Generator: so for a generator what we want? We want just opposite of the discriminator, so we need to minimize the opposite of what D is trying to minimize, and since there is not role of real image in generator so the first part will not be there.

Note: The loss of generator is called as saturating cost. And if we use this loss then it is called as MMGAN.

Now, the problem with the saturating loss is we can’t able to train efficiently in initial stages as the gradients are vanishing, so we converted the above loss to non saturating loss which resolve this issue, which can be seen in the below fig.,

If we use a non saturating cost for generator it will be called as NS GAN.

Table Description automatically generated

So this is all about the mathematics, if you do not understand much, feel free to watch the live session to get better intuition, as this article is supplement with the live session[live session link].

Interesting application of GANs

1: Generate Human Faces

Tero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces. The result produces was so remarkable. The face generations were trained on celebrity examples, meaning that there are elements of existing celebrities in the generated faces, making them seem familiar, but not quite.

Examples of Photorealistic GAN-Generated Faces Taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

2: Image to image translation 

Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.

Examples include translation tasks such as:

  • Translation of semantic images to photographs of cityscapes and buildings.
  • Translation of satellite photographs to Google Maps.
  • Translation of photos from day to night.
  • Translation of black and white photographs to color.
  • Translation of sketches to color photographs.

3: Super Resolution

Christian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrate the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.

sr.JPG

4: Face Generation :

[Karras et al. (2018): A Style-Based Generator Architecture for Generative Adversarial Networks

5: PIX2PIX :

Visit this website : Image-to-Image Demo – Affine Layer (by christopher Hesse), they take input as edge draw by the human and it is converted into a cat or shoes or whatever, you can try it.

There are many application of GANs such as Text-to-Image Translation (text2image), Semantic-Image-to-Photo Translation, Generate New Human Poses, Photos to Emojis, Photo Inpainting,Video Prediction and a lot more.

Many big companies are also using the GANs few of them are –

Graphical user interface, text, application, chat or text message Description automatically generated

So this is all about the introduction to GANs, for the implementation part of DCGANs, you can checkout my live session.

Thank for reading the article, Happy learning.

REFERENCES-

Article Credit:-

Name: Ansh Nahar

Qualification: Major – B. Tech(ECE), IIITDM Jabalpur

Research area: Deep Learning & Computer Vision
Linkedin

Leave a Comment