• Nazar Kaminskyi

Protection against adversarial examples in image classification tasks (part1)

Mis à jour : sept. 15

In today's realities, you will hardly find people who have not heard anything about artificial intelligence. Some of these people scroll in their heads scenes from the movie Terminator or Robot Chappie, and for others, it's just a set of algorithms. The last decade has been fruitful in the field of artificial intelligence. Machine and Deep Learning have achieved incredible results in problems that used to be "too" difficult for classical algorithms and now all work together.

The variety of applications with neural networks, which is a specific technology of artificial intelligence, is impressive - image recognition systems, autonomous cars, smart robots, voice assistants, forecasting, fraud detection, machine translation, natural language processing, music and text generation, recommendation systems, bioinformatics, so...this is far from the end of this list. Some of these systems are critical in global system solutions and their failure can lead to undesirable or even tragic consequences.

The great popularity of AI undoubtedly involves certain consequences and raises new problems. One such problem of neural networks was discovered by Szegedy et al. [1] in 2014. The authors found that several models of machine learning, including modern neural networks, are vulnerable to precisely created noise. After this, researchers started actively study this vulnerability of neural networks called adversarial example. Later in this blog post, we will look at it in more detail, and then get acquainted with existing methods of protection against them. And in the next part of this blog, which will be released soon, I will describe in detail the experiments and the results in a protection against adversarial examples in image classification tasks, so stay with me and let’s go!

Adversarial Examples

In general, adversarial examples can be generated in all areas listed in the introduction, some information can be found in the next blog as well as in articles [2, 3]. For a better visual perception, we will focus on the problem of image classification. So, as mentioned above, Szegedy et al found that a little, correctly calculated noise for the input image can easily fool the state of art neural network. The following image is an illustration of this phenomenon.

It’s easy to reach high confidence in the incorrect classification of an adversarial example, as shown in Explaining and Harnessing Adversarial Examples, Goodfellow et al, ICLR 2015.

The main danger is that for humans these two images are identical, but the network twice gives a different classification (unfortunately, incorrect in the second case).

Therefore, the main idea of this phenomenon can be expressed by the following formula:

I will not go into details, because you can find a great explanation for the generation of the simplest harmful examples in the next blog, and advanced readers and true fans of Machine Learning can find all the information in articles [2] and [3].

Consider the basic concepts of adversarial examples in accordance with [3]:

1. Adversarial Falsification (Error type):

a. False positive attacks (Type I Error) - in image classification problem this example looks like noise to a human, however, the network classifies it as a certain object with high accuracy;

b. False negative attacks (Type II Error) - in image classification problem this example can be easily identified by a human, however, the deep neural network can't properly classify it.

2. Adversary’s Knowledge:

a. white box - creating an attack requires information about the internal structure (architecture) of the classifier: hyperparameters, activation functions, hidden layers, weights, etc;

b. black box - creating an attack does not imply information about the internal structure (architecture) of the classifier, available only data and output of the classifier;

c. semi-white (gray) box - an adversary is assumed to know the architecture of the target model, but to have no access to the weights in the network.

3. Adversarial Specificity:

a. Targeted attacks misguide deep neural networks to aspecific class;

b. Non-targeted attacks do not assign a specific class to the neural network output.

4. Attack Frequency:

a. One-time attacks;

b. Iterative attacks.

5. Perturbation Scope:

a. Individual attacks generate different perturbations for each clean input;

b. Universal attacks.

6. Perturbation Limitation

a. Optimized Perturbation sets perturbation as the goal of the optimization problem.

b. Constraint Perturbation sets perturbation as the constraint of the optimization problem.

Today, there are dozens of ways to generate this special noise = generate adversarial examples: Fast Gradient Sign Method (FGSM), Iterative Gradient Sign Method (IGSM), Jacobian Saliency Map Attack (JSMA), DeepFool (DF), One-Step Target Class Method (OSTCM), Basic Iterative Method (BIM), Iterative Least-Likely Class Method (ILLC), Compositional Pattern-Producing Network-Encoded Evolutionary Algorithm (CPPN EA), Carlini and Wagner’s Attack (C&W), Universal Perturbation (UP), Feature Adversary (FA), Hot/Cold method (H/C), Model-based EnsemblingAttack (MEA), Ground-Truth Attack (GTA), Targeted Audio Adversarial Examples (TAAE), Black-box Zeroth Order Optimization (ZOO), One Pixel Attack (OPA), Natural GAN (NGAN), Zero-Query Attacks (ZQA), Natural Evolution Strategies (NES), Boundary Attack (BA), Greedy Search Algorithm (GSA), Genetic Attack (GA), Improved Genetic Algorithm (IGA), Probability Weighted Word Saliency (PWWS), Replacement Insertion & Removal of Words (RI&RoW), Real-World Noise (RWN), Genetic Algorithms and Gradient Estimation (GA&GE).

A good toolbox for the adversarial examples generation is described here.

Defense methods

Obviously, the discovery of adversarial examples has helped to better understand the nature of neural networks, scientists are working on an explanation of the neural networks and find different ways to deal with vulnerabilities. Next, we consider the main types of defense. A detailed analysis can be found in [2, 3].

Today there are 2 main defense strategies: reactive - detect adversarial examples afterdeep neural networks are built and proactive - make deep neural networks more robust before adversaries generate adversarial examples [2].

● Reactive

○ Adversarial Detecting

■ Most often, a detector is a small and straightforward neural network of binary classification that classifies input as a clean input or an adversarial example.

○ Input Reconstruction

■ The basic idea is that adversarial examples can be transformed to clean data via reconstruction. After transformation, the adversarial examples will not affect the prediction of deep learning models.

○ Network Verification

■ This method checks whether the input data violates the properties of the neural network that were determined during the training phase. Verifying properties of deep neural networks is a promising solution to defend adversarial examples, because it may detect the new unseen attacks.

● Proactive

○ Network Distillation

■ The main idea is training the model twice, initially using the one-hot ground truth labels but ultimately using the initial model’s probability as outputs to enhance the robustness. So we can hide the gradient information of the model, in order to confuse the adversaries, since most attack algorithms are based on the classifier′s gradient information.

○ Adversarial (Re)training

■ Adversarial training is an intuitive defense method against adversarial samples, which attempts to improve the robustness of a neural network by training it with adversarial samples.

○ Classifier Robustifying

■ Design robust architectures of deep neural networks to prevent adversarial examples.


I hope that now you have a basic idea of adversarial examples and the most common methods of protection against them. The question still remains whether we will be able to build an optimal classifier that would have excellent performance, be resistant to various attacks, and, most importantly, be resistant to new types of attacks. We also have to overcome one of the most important properties of artificial applications - transferability. It means that the adversarial examples generated to target one model also have a high probability of misleading other models. So it is clear that machine learning engineers have a lot of work to do.


Stay tuned and in the next part of the blog post, I will share with you my study of the robustness of neural networks using a combination of proactive and reactive methods. It will be interesting !

All the best and wipe your keyboards with alcohol!


  1. Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL: http://arxiv.org/abs/1312.6199.

  2. Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li Adversarial Examples: Attacks and Defenses for Deep Learning. National Science Foundation Center for Big Learning, University of Florida. – 2018 URL: https://arxiv.org/abs/1712.07107.

  3. Han Xu, Yao Ma, Haochen Liu, Debayan Deb, Hui Liu, Jiliang Tang, Anil K. Jain: Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Michigan State University. - 2019 URL: https://arxiv.org/abs/1909.08072.

Preste ©2020. Tous droits réservés. Mentions légales