Most of the advanced machine learning models based on CNN can now be easily fooled by very small changes to the samples we are going to make a prediction on, and the confidence in such a prediction is much higher than with normal samples. . So, in order to improve our performance on such samples, we need to test the model before mass production. In terms of the robustness of the model, in this article we will talk about adversarial attacks and a Python-based toolkit that can help us test some predefined attacks using a unified and simple API. Here are the main points covered in this article.
- What are enemy attacks in Machine Learning?
- Possible attack strategies
- How can Foolbox be used for the robustness of the model?
- Structure of FoolBox
- FoolBox implementation
Let’s start the discussion by understanding the contradictory attacks.
What are enemy attacks in Machine Learning?
Adversarial machine learning is a type of machine learning that tries to exploit patterns by creating hostile assaults based on publicly available pattern data. Failure of a machine learning model is the most common explanation.
The vast majority of machine learning algorithms were created to work on certain sets of problems with data coming from the same statistical distribution for training and testing. Opponents can provide data that violates this statistical assumption when these models are applied to real-world data. This material could be organized in a way that takes advantage of vulnerabilities in the system and taints conclusions.
A adversarial attack is a method that causes the machine learning model to misclassify objects by making small changes to them. Such attacks are known to be vulnerable to neural networks (NNs). Historically, research into conflicting methods began in the field of image recognition. It has been shown that minor changes in images, such as adding insignificant noise, can lead to significant changes in classifier predictions and even completely confuse ML models.
Consider the following demonstration of conflicting examples: starting with an image of a stop sign, the attacker now usually adds a small disturbance in an attempt to force the model to make a false prediction as a sign of yield, what the model does as calculated. The two images look identical to us, but since the model is based on numbers, adding such a disturbance has a significant impact on the pixel values, leading to a false prediction.
We will now discuss commonly known or identified attack strategies.
The most common type of attack is a breakout attack. Spammers and hackers, for example, frequently try to avoid detection by masking the content of spam emails and malware. Samples are tampered with in order to avoid detection and to be classified as legitimate. It does not imply that you have control over the training data. Image-based spam, in which spam content is embedded in an attached image to avoid textual analysis by spam filters, is a good example of evasion. Another example of evasion is spoofing attacks on biometric verification systems.
Poisoning is the process of contaminating training data in contradictory ways. Data collected during operations can be used to retrain machine learning systems. Intrusion detection systems (IDS), for example, are frequently recycled using this data. An attacker could contaminate this data by injecting malicious samples into the system during operation, causing recycling to fail.
Model theft (also known as model extraction) is the act of an adversary probing a black box machine learning system in order to reconstruct or extract the data used to train the model. This can be difficult if the training data or the model itself is sensitive and proprietary. Pattern theft, for example, could be used to extract a proprietary pattern of stock trading, which the opponent could then use for financial gain.
Attack by inference
Inference attacks take advantage of overgeneralization of training data, which is a common flaw in supervised machine learning models, to identify the data used in training the model. Attackers can do this even if they do not know or do not have access to the parameters of a target model, posing security risks for models trained on sensitive data.
How can Foolbox guarantee the robustness of the model?
Foolbox is a new Python module for creating contradictory perturbations as well as quantifying and comparing the robustness of machine learning models. Foolbox communicates with the most important deep learning frameworks including PyTorch, Keras, TensorFlow, Theano, and MXNet, and supports a variety of conflicting criteria, including targeted misclassification and top-k misclassification, as well as various measures of distance. Let us now briefly see the structure of FoolBox.
Structure of FoolBox
Five elements are needed to create conflicting examples, and those elements result in the five pillars of FoolBox: First, a model that takes an input such as an image and makes a prediction, say class probabilities. Second, a criterion for determining what constitutes an opposition, for example a misclassification.
Finally, a distance measure is used to determine the size of a disturbance, for example the L1 norm. Finally, an attack algorithm generates an accusatory disturbance using the input and its label, as well as the model, the accusatory criterion and the distance measure.
In this section, we’ll look at a few use cases for this toolkit. As we discussed earlier, this framework supports most of the widely used deep learning frameworks, to work with the desired framework make sure you have it installed, then just using pip install the FoolBox .
In this example we will be using TensorFlow. First, we’ll need to create our transfer learning model (tf.keras.applications, in this case I’m using the ResNet50), which we’ll then pass to Foolbox’s TensorFlowModel class. A similar class is available for other frameworks. We also need to specify the preprocessing expected by the respective model, such as inverting an axis, converting from RGB to BGR, subtracting the mean and dividing by std, as well as the space limits of input, which must only contain the values ââexpected by the model.
import foolbox as fb import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # Tensorflow based model = tf.keras.applications.ResNet50V2(weights="imagenet") preprocessing = dict() bounds = (-1, 1) fmodel = fb.TensorFlowModel(model, bounds=bounds, preprocessing=preprocessing)
Now that we have also launched the ResNet and FoolBox model, before we proceed with the formulation of the attack, we need to have a sample of image data that can be obtained directly using Foolbox, which comes with functions support under the utils package which provides a small set of sample images. from different sets of computer vision data.
mages, labels = fb.utils.samples(fmodel, dataset="imagenet", batchsize=16)
To launch an attack, you must first create an instance of the corresponding class. Foolbox uses a wide variety of adversarial attacks. Each attack begins with a model for locating opponents and a criterion for defining what an opponent is. Misclassification is the default criteria.
It can then be applied to a reference entry and to the corresponding label which the adversary must approach. The internal hyperparameter tuning is used by attacks to find the least disturbance.
For example, when implementing the famous Fast Gradient Sign Method (FGSM), it looks for the smallest step size that converts the input to opponent. As a result, manual tuning of hyperparameters for attacks such as FGSM is no longer necessary.
As shown below, we can choose the type of attack and under which we can feed the Tensorflow model. In addition, epsilons are defined which are nothing other than the level of disturbance that we want to test.
# Intialize the attack class epsilons = np.linspace(0.0, 0.005, num=20) attack = fb.attacks.LinfDeepFoolAttack() raw, clipped, is_adv = attack(fmodel, images, labels, epsilons=epsilons)
Now we can just check the robust accuracy by averaging is_adv and doing that in epsilons as well.
# accuracy when model is attacked robust_accuracy = 1 - np.float32(is_adv).mean(axis=-1) # visualizing the result plt.plot(epsilons, robust_accuracy) plt.title('Perturbation Vs Accuracy of the Model')
As we can see from the graph above, when there is 0 perturbation, the model accuracy is at its highest when the toolbox starts testing epsilon values, and the accuracy tends to decrease so quickly even for a very small change in disturbance. From this, we can say that models based on neural networks are very prone to such attacks.
Through this article, we have discussed the contradictory attacks in neural networks and the possible attack strategies that can be considered. On the other hand, to ensure the robustness of the model, we have discussed a framework called FoolBox which aims to test our model by predefined attacks, and finally we can check the quality of our model.