TowardsMachineLearning

Introduction to GANs in Computer Vision

Introduction to GANs in Computer VisionIn the previous article we have discussed about the Adversarial examples, Attacks and defences. If you haven’t checked the previous article yet, then feel free to check it using the below given links, as it will help you to get better intuition.[Article 1st link][article 2nd link][Live session link]1_2fw4xCW_lMKLKZDhqbT3Ng.pngSo, In this article we will learn about the Generative models. As the name suggests Generative models are those which are used to generate the data, there are different types of generative models like variable auto encoders (VAES), GANs and many more. In this article we will primarily focus on the Generative adversarial networks (GANs) as this is the most dominated as compare to others.Table of the content:
  1. Comparison between Generative Models and discriminative models
  2. GANs : concept, types, mathematics
  3. Different Application of GANs
  4. Deep Convolutional GAN (DCGANs) : Implementation
Generative models Vs Discriminative ModelsA generative model describes how a dataset is generated, in terms of a probabilistic model. By sampling from this model, we are able to generate new data while the discriminative model is used for the classification purpose; they are also called as Classifier.bbb.JPGIn the above figure, you can see discriminator take input X as the feature and then predict the class Y, in terms of probability it is given as conditional probability of Y with respect to X { P(Y|X) }, while in generator we give some random noise and class Y as input and respectively we get the image X, in terms of probability it is given as P(X|Y)​.In short discriminator is good in classifying and generator works is to generate the images.Let’s look at some images generated using the generative models;Fig a.Fig b.Fig a. shows the generated image of Dogs and fig b. shows the generated images of human using GANs.Let’s start discussing about the GANs;Generative Adversarial models (GANs)generative adversarial network is a deep-learning-based generative model used for unsupervised learning proposed by Ian Goodfellow and his colleagues in 2014.It consists of two networks generator and the discriminator.•Generator:  Model that is used to generate new adversarial examples from the problem domain.•Discriminator: Model that is used to classify examples as real (from the domain) or fake (generated).Both the network complete against each other, the generator tries to fool the discriminator by generating data similar to those in training set whereas the discriminator tries not to be fooled by identifying the fake samples generated by generator.Till now, we have seen the overview of the GANs, now let’s move forward to see what we actually want from GANs.Let’s see the probability distribution:The above figure shows you the images sampled from the real data distribution, and we tried to plot the distribution on the 2D coordinates, you can see the distribution in 2D coordinates.Now, the above images are sampled from the GANs which was not trained much and can’t able to generate the data like real world, you can see the distribution on 2D coordinates.So our target is to match the distribution of the generated data to real dataNow, the question is how can be train G to generate images that’s looks like the true data distribution?Initially our generator is untrained and generating some random noise, you can see it the below imageSo, in order to train the generator we need to use the discriminator, let’s seeSteps to train the Generator:
  1. we first give a random noise to generator and it’s generating any image (irrelevant)
  2. now, we will create a discriminator and take the real images database, all the images under this data base will be labeled as 1 while the generated one will be labeled as 0
  3. Now the discriminator work is to classify the image, if the images are from the real data it has to classify it as 1 and if it is from generator it has to classify it as 0.
  4. Now, for a moment suppose your discriminator is good in classifying the real and generated images ( although the training of both the networks will be simultaneously, we will get back to it shortly) now, the generator will generate some random images and it will be send to discriminator, the D will classify it as 0, then we backpropagate all the gradients back to generator, and generator will try to modify the pixels(minimize the generator loss) to make the image classify as 1,
  5. the procedure will continue until the discriminator will gives a output as 0.5 , by 0.5 means at that time the discriminator is not able to distinguish whether the images is from real data or generated data, that what we want.
So, this is a basic intuition behind the training, now few insights:
  1. I mentioned both the networks needed to be trained simultaneously, the reason behind is that generator is basically learning from the discriminator, so if your discriminator is very-very good in initial phase then it always going to classify the image from generator as 0 and the generator loss may fail to get converge.
  2. The most important role is of discriminator, if discriminator is bad, then also we will not get the desired result.
  3. And it’s also important that G should improve as D improves or vice versa.
  4. From the above points it’s clarified that discriminator can’t be too good and too bad in initial phase and G&D should be improved altogether, so what we basically do for a single iteration we train one time G and k times D, and this method will helps us to train the GANs well.
Now let’s look at the cost function of Generator and discriminator
  • The cost of Discriminator: The loss function will be cross entropy, for the first part its saying the D should correctly label real data as one, while the second is little modified as it is a summation over the generated images which need to label as 0 by D as all the images in real database is 1 and all the images coming from the generator will be 0.
  • The cost of Generator: so for a generator what we want? We want just opposite of the discriminator, so we need to minimize the opposite of what D is trying to minimize, and since there is not role of real image in generator so the first part will not be there.
 Note: The loss of generator is called as saturating cost. And if we use this loss then it is called as MMGAN.Now, the problem with the saturating loss is we can’t able to train efficiently in initial stages as the gradients are vanishing, so we converted the above loss to non saturating loss which resolve this issue, which can be seen in the below fig.,If we use a non saturating cost for generator it will be called as NS GAN.TableDescription automatically generatedSo this is all about the mathematics, if you do not understand much, feel free to watch the live session to get better intuition, as this article is supplement with the live session[live session link].Interesting application of GANs1: Generate Human FacesTero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces. The result produces was so remarkable. The face generations were trained on celebrity examples, meaning that there are elements of existing celebrities in the generated faces, making them seem familiar, but not quite.Examples of Photorealistic GAN-Generated Faces Taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.2: Image to image translation  Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.Examples include translation tasks such as:
  • Translation of semantic images to photographs of cityscapes and buildings.
  • Translation of satellite photographs to Google Maps.
  • Translation of photos from day to night.
  • Translation of black and white photographs to color.
  • Translation of sketches to color photographs.
3: Super ResolutionChristian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrate the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.sr.JPG4: Face Generation :[Karras et al. (2018): A Style-Based Generator Architecture for Generative Adversarial Networks 5: PIX2PIX :
  • Visit this website : Image-to-Image Demo – Affine Layer (by christopher Hesse), they take input as edge draw by the human and it is converted into a cat or shoes or whatever, you can try it.
There are many application of GANs such as Text-to-Image Translation (text2image), Semantic-Image-to-Photo Translation, Generate New Human Poses, Photos to Emojis, Photo Inpainting,Video Prediction and a lot more.Many big companies are also using the GANs few of them are –So this is all about the introduction to GANs, for the implementation part of DCGANs, you can checkout my live session.Thank for reading the article, Happy learning.REFERENCE:1 – https://cs230.stanford.edu/2- [Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples]3 –Adversarial Examples: Attacks and Defenses for Deep Learning by  Xiaoyong YuanPan HeQile ZhuXiaolin Li4- Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-Task Training5-https://www.tensorflow.org/tutorials/generative/dcgan6- https://developers.google.com/machine-learning/gan/generative#:~:text=A%20generative%20model%20includes%20the,to%20a%20sequence%20of%20words.Introduction to GANs in Computer VisionIn the previous article we have discussed about the Adversarial examples, Attacks and defences. If you haven’t checked the previous article yet, then feel free to check it using the below given links, as it will help you to get better intuition.[Article 1st link][article 2nd link][Live session link]1_2fw4xCW_lMKLKZDhqbT3Ng.pngSo, In this article we will learn about the Generative models. As the name suggests Generative models are those which are used to generate the data, there are different types of generative models like variable auto encoders (VAES), GANs and many more. In this article we will primarily focus on the Generative adversarial networks (GANs) as this is the most dominated as compare to others.Table of the content:
  1. Comparison between Generative Models and discriminative models
  2. GANs : concept, types, mathematics
  3. Different Application of GANs
  4. Deep Convolutional GAN (DCGANs) : Implementation
Generative models Vs Discriminative ModelsA generative model describes how a dataset is generated, in terms of a probabilistic model. By sampling from this model, we are able to generate new data while the discriminative model is used for the classification purpose; they are also called as Classifier.bbb.JPGIn the above figure, you can see discriminator take input X as the feature and then predict the class Y, in terms of probability it is given as conditional probability of Y with respect to X { P(Y|X) }, while in generator we give some random noise and class Y as input and respectively we get the image X, in terms of probability it is given as P(X|Y)​.In short discriminator is good in classifying and generator works is to generate the images.Let’s look at some images generated using the generative models;Fig a.Fig b.Fig a. shows the generated image of Dogs and fig b. shows the generated images of human using GANs.Let’s start discussing about the GANs;Generative Adversarial models (GANs)generative adversarial network is a deep-learning-based generative model used for unsupervised learning proposed by Ian Goodfellow and his colleagues in 2014.It consists of two networks generator and the discriminator.•Generator:  Model that is used to generate new adversarial examples from the problem domain.•Discriminator: Model that is used to classify examples as real (from the domain) or fake (generated).Both the network complete against each other, the generator tries to fool the discriminator by generating data similar to those in training set whereas the discriminator tries not to be fooled by identifying the fake samples generated by generator.Till now, we have seen the overview of the GANs, now let’s move forward to see what we actually want from GANs.Let’s see the probability distribution:40.JPGThe above figure shows you the images sampled from the real data distribution, and we tried to plot the distribution on the 2D coordinates, you can see the distribution in 2D coordinates.41.JPGNow, the above images are sampled from the GANs which was not trained much and can’t able to generate the data like real world, you can see the distribution on 2D coordinates.So our target is to match the distribution of the generated data to real dataNow, the question is how can be train G to generate images that’s looks like the true data distribution?Initially our generator is untrained and generating some random noise, you can see it the below imageSo, in order to train the generator we need to use the discriminator, let’s seeSteps to train the Generator:
  1. we first give a random noise to generator and it’s generating any image (irrelevant)
  2. now, we will create a discriminator and take the real images database, all the images under this data base will be labeled as 1 while the generated one will be labeled as 0
  3. Now the discriminator work is to classify the image, if the images are from the real data it has to classify it as 1 and if it is from generator it has to classify it as 0.
  4. Now, for a moment suppose your discriminator is good in classifying the real and generated images ( although the training of both the networks will be simultaneously, we will get back to it shortly) now, the generator will generate some random images and it will be send to discriminator, the D will classify it as 0, then we backpropagate all the gradients back to generator, and generator will try to modify the pixels(minimize the generator loss) to make the image classify as 1,
  5. the procedure will continue until the discriminator will gives a output as 0.5 , by 0.5 means at that time the discriminator is not able to distinguish whether the images is from real data or generated data, that what we want.
So, this is a basic intuition behind the training, now few insights:
  1. I mentioned both the networks needed to be trained simultaneously, the reason behind is that generator is basically learning from the discriminator, so if your discriminator is very-very good in initial phase then it always going to classify the image from generator as 0 and the generator loss may fail to get converge.
  2. The most important role is of discriminator, if discriminator is bad, then also we will not get the desired result.
  3. And it’s also important that G should improve as D improves or vice versa.
  4. From the above points it’s clarified that discriminator can’t be too good and too bad in initial phase and G&D should be improved altogether, so what we basically do for a single iteration we train one time G and k times D, and this method will helps us to train the GANs well.
Now let’s look at the cost function of Generator and discriminator
  • The cost of Discriminator: The loss function will be cross entropy, for the first part its saying the D should correctly label real data as one, while the second is little modified as it is a summation over the generated images which need to label as 0 by D as all the images in real database is 1 and all the images coming from the generator will be 0.
  • The cost of Generator: so for a generator what we want? We want just opposite of the discriminator, so we need to minimize the opposite of what D is trying to minimize, and since there is not role of real image in generator so the first part will not be there.
 Note: The loss of generator is called as saturating cost. And if we use this loss then it is called as MMGAN.Now, the problem with the saturating loss is we can’t able to train efficiently in initial stages as the gradients are vanishing, so we converted the above loss to non saturating loss which resolve this issue, which can be seen in the below fig.,If we use a non saturating cost for generator it will be called as NS GAN.TableDescription automatically generatedSo this is all about the mathematics, if you do not understand much, feel free to watch the live session to get better intuition, as this article is supplement with the live session[live session link].Interesting application of GANs1: Generate Human FacesTero Karras, et al. in their 2017 paper titled “Progressive Growing of GANs for Improved Quality, Stability, and Variation” demonstrate the generation of plausible realistic photographs of human faces. The result produces was so remarkable. The face generations were trained on celebrity examples, meaning that there are elements of existing celebrities in the generated faces, making them seem familiar, but not quite.Examples of Photorealistic GAN-Generated Faces Taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.2: Image to image translation  Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.Examples include translation tasks such as:
  • Translation of semantic images to photographs of cityscapes and buildings.
  • Translation of satellite photographs to Google Maps.
  • Translation of photos from day to night.
  • Translation of black and white photographs to color.
  • Translation of sketches to color photographs.
3: Super ResolutionChristian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrate the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.sr.JPG4: Face Generation :[Karras et al. (2018): A Style-Based Generator Architecture for Generative Adversarial Networks 5: PIX2PIX :
  • Visit this website : Image-to-Image Demo – Affine Layer (by christopher Hesse), they take input as edge draw by the human and it is converted into a cat or shoes or whatever, you can try it.
There are many application of GANs such as Text-to-Image Translation (text2image), Semantic-Image-to-Photo Translation, Generate New Human Poses, Photos to Emojis, Photo Inpainting,Video Prediction and a lot more.Many big companies are also using the GANs few of them are –Graphical user interface, text, application, chat or text messageDescription automatically generatedSo this is all about the introduction to GANs, for the implementation part of DCGANs, you can checkout my live session.Thank for reading the article, Happy learning.REFERENCE:1 – https://cs230.stanford.edu/2- [Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015): Explaining and harnessing adversarial examples]3 –Adversarial Examples: Attacks and Defenses for Deep Learning by  Xiaoyong YuanPan HeQile ZhuXiaolin Li4- Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-Task Training5-https://www.tensorflow.org/tutorials/generative/dcgan6- https://developers.google.com/machine-learning/gan/generative#:~:text=A%20generative%20model%20includes%20the,to%20a%20sequence%20of%20words.

Article Credit:-

Name: Ansh Nahar
Designation: Major – B. Tech(ECE), IIITDM Jabalpur
Research area: Deep Learning & Computer Vision

Leave a Comment