Protection against adversarial examples in image classification (part 2)
A few weeks ago we published a blog post named Defense against adversarial examples in image classification where we explained what adversarial examples are and why they can have dangerous consequences. We also described basic protection methods against them and suggested several useful articles to learn more about this topic. Today, as promised, we share with you our own research in this area. Do not hesitate to read our first article again if you want to refresh your memories, otherwise, let's dive in!
In order to test our idea, we need to prepare some data of source images and adversarial examples. For the main image dataset, we chose CIFAR-10, one of the most popular and relatively complex public image datasets. Based on it, FGSM (e = 0.004/0.007/0.01), BIM (e = 0.004/0.007/0.01) and DeepFool attacks were created with different parameters. The detailed tutorial for these attacks can be found here and a good PyTorch implementation here.
The following is an example from CIFAR-10 and, the visual result after one of the attacks. Since the size of the image from the dataset is only 32x32 px, we see a very pixelized image, but this does not affect the overall experiment. Moreover, due to the small size of the images, we can notice a small difference between the original image and a strong adversarial example more easily, which would be much more difficult with large images.
ResNet18 was chosen as the classification Deep Neural Network (see a code example here).
The main idea of this experiment is to combine two different types of protection: reactive and proactive.
Adversarial (Re)training is used as a proactive method because it is not too difficult to implement, prevents retraining, and is effective for one-step attacks. The main idea of this method is to train the network on data that already contains adversarial examples. Since there are many types of attacks and each attack has its own parameters, it is actually impossible to include all of them in the training set and a more general approach is needed. Instead, we apply a transformation to the images that reduces the level of noise contained in the image, and thus the sensitivity of the model to adversarial attacks. Conducting a training phase on originals and such transformed images makes our model more robust and generalizing. This specific way of training the model can be considered as the first part of the defense.
Input Reconstruction is used as a reactive method because it is also not too difficult to use and readily can be applied to existing classifiers. Since there is a lot of image manipulation, we get a wide range of transformations that can be used, so we also apply an image noise reduction technique to minimize the impact of adversarial attacks on inputs. Applying this transformation to an input image before classification will be the second part of the defense.
For both approaches, one of the popular methods of image noise reduction is a discrete wavelet transform (DWT), so we chose it for our research. With the scikit-image package, we can easily apply wavelet transforms.
Peak signal-to-noise ratio (PSNR) and Mean squared error (MSE) metrics are used to measure the variation between original and altered images. Using these metrics, we select the key parameters for the DWT on a small number of images. Basic requirement is that images must visually look the same and PSNR value has to be high. We selected two types of wavelets for testing: Haar wavelet and Daubechies wavelet.
We assume that all the images that our classification network will classify are potentially adversarial and therefore they need to be dealt with input reconstruction. Thus, the scheme of the experiment is given below.
First of all, we trained three networks: standard ResNet18 (without transformation), ResNet18 with DWT Haar, ResNet18 with DWT Daubechies. The results are shown in Table 1 below. As we can see, the accuracy of the classification using additional transformations decreased slightly, but not critically.
Based on the first network (ResNet_18_standard), we generated adversarial examples as already mentioned. This simulates a situation when an adversary has no information about the type of defense we used, but somehow knows the architecture of the neural network. We got the following classification results with ResNet_18_standard and generated adversarial examples:
Finally, we apply to new images the same transformation that was used to train the corresponding neural network before. We then classify the transformed image with our three models (ResNet_18_DWT_Haar or ResNet_18_DWT_Daubechies) to compare their performance. The results are shown in Tables 2.1 and 2.2.
As we can see below, adversarial examples critically reduce the classification accuracy of the unprotected model. Applying only one transformation to the network with the standard training path significantly increases the accuracy of classification. Finally, the proposed combined type of defense with DWT(Haar) shows the best preservation of the initial classification accuracy.
The average error using the proposed method is 6% while the average error using the standard teaching method with the additional transformation of input data - 31%, and the average error using the standard network - 67%.
For the strongest type of attacks, when the original system shows an accuracy of less than 5%, the proposed protective approach ensures the accuracy of classification at a level not lower than 77%, which is only 12% lower than the original accuracy of the model.
Let’s visualize some results of the best network ResNet_18_DWT_Haar.
We can notice that sometimes the network correctly copes with classification without transformation.
Sometimes an additional transformation is necessary for correct classification.
Still, sometimes the attack is so strong that the network with the input reconstruction can not cope with it.
In this blog, we covered our research in protecting neural networks from adversarial examples by combining proactive and reactive methods. We obtained promising initial results, which gives grounds for further development and testing of the approach. Although adversarial examples are often used in the context of fraud, they could also be used for better purposes, for example, replacing ugly QR-codes with beautiful images.
We believe the phenomenon of adversarial examples is an important finding and it requires further exploration to increase the robustness of applications leveraging deep neural networks.
Thank you for your attention and do not hesitate to leave comments or questions.
* Follow us on LinkedIn for next blog updates:
* Interested in our skills? Let's discuss your projects together:
* Our public Github repository: