Introduction to GANs: Adversarial attacks and Defenses for Deep Learning
The popularity of deep learning is growing, we are using the neural nets in every field to increase the productivity and the capabilities and of course neural nets has a lot of potential which can easily be justified by seeing their application. So till time, we are using the neural nets like ANN, CNN etc. for a classification purpose, but it’s not limited there, we can use the neural nets to generate the new images, audios, text etc. which can mimic the real data distribution. And here the concept of the Generative adversarial networks come in the play, in short we call them GANs.
So, in this article we are mainly going to talk about the GANs and their applications, GANs are one of the advance topic in the deep learning and believe me GANs has a lot of application which are extremely interesting, we will see them at the end of the article, and GANs are one of the latest research topics as well, with a lot of industrial application.
In order to simply the learning I have divided the content in two parts; the first part will focus on the word “ADVERSARIAL” and in the second part we will see the GANs.
[Article1 link], [article2 link]
Both of these articles are in supplement with the live session [link].
Table of the content:
1: Adversarial Examples
2: Types of Adversarial Attacks
3: Malicious application of the Adversarial Examples
4: FGSM and other adversarial attacks
5: Defence against the Adversarial Attacks
several machine learning models, including state-of-the-art neural networks, are vulnerable to adversarial examples, Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake; they’re like optical illusions for machines.
For example we can modify a cat image to look an iguana;
So, the assumption is we have a very good classifier, which is trained on a lot of the data and is used for classifying the objects. So in the above figure, the left side image is classified as cat with a 92 % accuracy and after adding some kind of noise in the image, the same cat images is classified as iguana with 94% accuracy.
Although for a human eyes we can see the right side image is cat, but the classifier is classifying it as the iguana, you can check the similarities between cat and iguana in the next image.
Let’s see some more adversarial examples
In this above fig. we can see a panda is classified as the gibbon by adding a noise to it.
Sometimes the modification of the image is so less that even human eyes can’t able to notice whether someone added the noise in an image or not just like in the below image.
Types of Adversarial Attacks
Attacks are mainly classified in two types:
1: Black box
2: white box
And these further are classified in two types in reference to the neural networks
1: Targeted attacks
2: Non-targeted attacks
- Non Targeted attacks:
In the non targeted attacks our goal is to find an image that tricks a neural network, for example we need to find an image that will be classify as let say ‘Iguana’ by the classifier but the input image is not ‘iguana‘, input image can be anything apart from iguana.
So, how the non- targeted attacks works:
So in the above figure we have started with the X (white noise) and we want this image to get classified as iguana, so in short work we need the prediction that is equal to the one hot encode vector of iguana, we define the loss function as L2 loss, we can also use the L1 loss, but L2 loss works better for this type of problem, and we forward propagate the X in network, calculate the loss function, and back propagate the gradients all the way back to the network, we perform many iteration until we get the image which is predicted as iguana.
After this optimization we get the image X which is predicted as iguana by the network, so basically what we are doing? We are just modifying the pixels value of the white noise until we get the prediction equal to iguana,
But the question is, will the forged image(x) looks like an iguana or not after the optimization?
So the answer is there is a little probability that the image looks likes an iguana we will see it by below fig.
So here we can see the blue portion is the space of the possible input images (anything) and red is the space of the real images (iguana), so let’s say if we are working on 32*32 size coloured image, the total no. of image of X can be 256(32*32*3) (3 is the no. of the channels and 256 is the range of a pixel) which is seen by the green space,
So here you can see that there is a little intersection between the green and the red space and it’s highly unlikely that our images(X after optimization) lie in that portion, so in simple word there is a very high probability that your image X after optimization will not looks likes an Iguana.
- Targeted attacks:
In non targeted attacks we have constraint on the targeted output( prediction of the classifier) but in targeted attacks, we have the constraint on X(input image) as well.
So the goal is to force the neural network to classify a specific image let say ‘watering-can’ to let say ‘carb’.
So in the above fig. you can see, we give an image of “watering-can”, and it is classified correctly by the classifier. Now we will modify the same image so that it can classify as “crab”.
So we added some random noise in the same image and now it’s classified as the ‘crab’.
Let’s look at how the targeted attack works;
The explanation is same as that of the non-targeted attacks but here we have a little modification on the loss function as the input image is a cat so we have a constraint on x as well, and we follow the same optimization until we get this image to be classified as iguana.
So, all the case study which we have done is for the white box model that means we have access to the complete model, but what if the model is black box (means no access, no query).
How we are going to attack a model which is black box (no multiple queries, no access to parameters)?
The answer is transferability, so you will create your own classifier which will perform the same work as of the black box model which you are targeting and forge an adversarial example on your own classifier, and there is high probability that the same forged adversarial example will behave like the adversarial example for the black box model, this is called as transferability, the reason being is that we use the same activation function, initializers, optimizers, and other parameter for building any classifier which is a cause of transferability, and researchers are also trying to understand it better that why it happens and it’s one of the research topic as well.
Malicious application of the adversarial examples
- Face recognition: Attackers can break the face recognition system that’s used for the security, and can get access to your device.
- Social media: Social media has a feature which detect which is violent images or not, suppose someone created an adversarial example of them which breaks the algorithms, and the internet will be flooded with violent images. It can also be used in voting times to spread false news regarding the other parities. And many other malicious examples can be there.
- Autonoums vechile: in the side figure you can observe the adversarial
Stop sign which is classified as speed limit sign, so if a car need to stop at a
Moment but it’s limiting the speed, it can cause a accident as well.
& there are a lot many malicious examples; I just discussed few of them.
Why adversarial examples exists?
Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy : ‘EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES’ gave the answer as The linearity in NN with a mathematical explanation.
So in the above fig. we have taken the example of linear regression, in which we will give some input x to a neuron which don’t have any activation function, it’s just to simplify the understanding. So after successful training we get, a corresponding weight vector w and bias b = 0, now for a x =(1, -1,2,0,3,-2) we calculate the y hat which is equal to -4, now what we want we want to add a minor change in x such that the y hat changes drastically, so our x* = x+ epsilon*wT, then we take epsilon = 0.2 and observe even a small change in x can drift the y hat to positive side.
If you look at: y hat(x*) = wTx* + b = wT( x+epsilon*wT) = wTX + epsilon*wwT , the quantity wwT is always positive so we have to play with the sign of eplison in order to drift the y hat.
- Insights from above result
If w is large x* can’t be equal to x so they think of another approach and instead of epsilon*wT, they take only epsilon*sign(w), so if we can correctly play with the sign of W we can always push WX* into positive direction, that’s a basic intuition to see how the things works.
And they modified the above expression for the neural networks i.e.
xadv ← x + α sign(▽x L (F(x), ytrue))
now we will see the working of FGSM.
FGSM (Fast gradient sign method)
Fast gradient sign method (FGSM) is one of the method to create the adversarial examples, it is able to generate adversarial examples rapidly .FGSM perturbs an image in the image space towards gradient sign directions. it can be described using the following formula:
xadv ← x + α sign(▽x L (F(x), ytrue))
- L : Loss function
- F(x): output of model F.
- α : parameter control disrotion
- Sign : sign function.
FGSM only requires gradients to be computed once. Thus, FGSM can craft large batches of adversarial examples in a very short time. so In brief to accomplish the task we need to find how much each pixel in the image contributes to the loss value, and add a perturbation accordingly.
Figure : A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to the sign of the elements of the gradient of the cost function with respect to the input, we can change GoogLeNet’s classification of the image. Here our of .007 corresponds to the magnitude of the smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers.
The working of FGSM :
- Taking an input image
- Making predictions on the image using a trained CNN
- Computing the loss of the prediction based on the true class label
- Calculating the gradients of the loss with respect to the input image
- Computing the sign of the gradient
Apart from FGSM we have many adversarial attacks like I-FGSM (Iterative – Fast Gradient Sign Method.), Deep-fool, JSMA, Carlini &Wagner L2 etc.
Defending your model from the adversarial attack
Defence against the adversarial attacks is must, as we have seen their malicious application. We do have many defense techniques but none is the ultimate one, as attackers try to create new-new methods and we need to design the defense techniques after that, in this article we are going to see some of the basic defense techniques to defend your model against adversarial attacks
- Create a safety net: [Lu et al. (2017): SafetyNet: Detecting and Rejecting Adversarial Examples Robustly] :
In this method, we build a additional classifier that will classify the real and the adversarial image before inputting the image into the model, although we can create the adversarial example for the safety net as well, but fooling two networks at a same time is much difficult as compare to one.
- Train on correctly labeled adversarial examples :
This can also be done, but it is very costliest, first we need to label the images and then we need to train the model on them.
- Adversarial training:
In this method, at a time of training of the model, we simultaneously create the adversarial examples and train the model on them at the same time.
There are far better methods, you can check the research paper Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-Task Training for more information.
So this is all about the adversarial words, in the second part of the article we will discuss about the generative models (GANs), application of different types of the GANs, and implementation of DCGANs.
Thanks a lot for your time & happy learning.