6.1.5 Practical Examples and Code

2025.10.06.
AI Security Blog

Moving from the theoretical construction of attacks and defenses to their practical application is where the core skills of an AI Red Teamer are forged. The Adversarial Robustness Toolbox (ART) is your workbench. This section translates the concepts from previous chapters into executable code, demonstrating the iterative cycle of testing, defending, and re-evaluating model security.

Case Study 1: A Basic Evasion Attack

Our first objective is a classic yet fundamental task: crafting an adversarial example to fool an image classifier. We will use the Fast Gradient Sign Method (FGSM), a fast, one-step attack, to make a model misclassify a handwritten digit.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Scenario: Misclassifying a ‘7’ as a ‘1’

You have a standard Convolutional Neural Network (CNN) trained on the MNIST dataset. Your goal is to take a correctly classified image of the digit ‘7’ and subtly modify it so the model confidently predicts it is a ‘1’.

Implementation with ART

First, you need a pre-trained model wrapped in an ART classifier. Assuming you have this `classifier` object ready, generating the attack is remarkably straightforward.


# 1. Import necessary components
import torch
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import PyTorchClassifier

# Assume 'model' is your pre-trained PyTorch model and 'x_test' is your input image
# Assume 'classifier' is your ART-wrapped model object

# 2. Instantiate the FGSM attack
# Epsilon (eps) controls the magnitude of the perturbation.
attack = FastGradientMethod(estimator=classifier, eps=0.2)

# 3. Generate the adversarial example
x_test_adversarial = attack.generate(x=x_test)

# 4. Test the model's prediction on the new example
prediction_adv = classifier.predict(x_test_adversarial)
predicted_class = torch.argmax(prediction_adv, dim=1).item()

print(f"The adversarial example is classified as: {predicted_class}")
# Expected Output: The adversarial example is classified as: 1

Analysis of the Result

The code executes the attack and generates a new image tensor, `x_test_adversarial`. While numerically different, this new image is often visually indistinguishable from the original to a human observer. Yet, the model’s prediction flips.

7 Original Prediction: 7 + Subtle Noise Perturbation = 7 Adversarial Prediction: 1

The process of creating an adversarial example: adding carefully crafted, imperceptible noise to an original image to induce misclassification.

Case Study 2: Defending and Re-evaluating

A successful attack is only the first step. The next is to test potential defenses. One of the most effective and widely studied defenses is Adversarial Training, which involves training the model on adversarial examples to teach it to ignore them.

Scenario: Hardening the Model with Adversarial Training

Using the same FGSM attack from before, you will now create an adversarially trained model. Your goal is to see if this new, “hardened” model can resist the attack that fooled the original version.

Implementation with ART

ART provides a convenient `AdversarialTrainer` class that automates this process. It generates adversarial examples on-the-fly during the training loop.


# 1. Import the trainer
from art.defences.trainer import AdversarialTrainer

# 2. Create the FGSM attack instance to be used for training
attack_for_training = FastGradientMethod(estimator=classifier, eps=0.2)

# 3. Instantiate the Adversarial Trainer
# This trainer wraps your original model.
trainer = AdversarialTrainer(classifier=classifier, attacks=attack_for_training, ratio=0.5)

# 4. Train the model
# The ratio=0.5 means 50% of each batch will be clean data, 50% adversarial.
trainer.fit(x_train, y_train, nb_epochs=10, batch_size=128)

# The 'classifier' object now references the hardened model.

Re-evaluation: Attacking the Defended Model

Now, you run the exact same attack from Case Study 1 against the newly trained model. The only difference is that the `classifier` object has been updated by the `AdversarialTrainer`.

When you run the prediction code again on the adversarial example, the outcome should be different. The model, having learned from similar examples, should now correctly identify the image as a ‘7’.

Table 6.1.5.1: Attack Success Comparison
Model Type Original Image Prediction Adversarial Image Prediction Attack Success
Standard CNN 7 (Correct) 1 (Incorrect) Yes
Adversarially Trained CNN 7 (Correct) 7 (Correct) No

Case Study 3: Escalating with a Stronger Attack

Your adversarial training successfully defended against FGSM. But is the model truly robust? A determined adversary wouldn’t stop there. The next step in a red team engagement is to escalate to a more powerful attack to find the true limits of the defense.

Scenario: Bypassing the Defense with Carlini & Wagner (C&W)

The Carlini & Wagner (C&W) attack is an optimization-based method. Unlike the one-shot FGSM, it iteratively searches for the smallest possible perturbation that will cause a misclassification. This makes it much more powerful and harder to defend against.

Implementation with ART

The code structure remains similar, demonstrating ART’s consistent API. You simply swap out the attack object.


# 1. Import the C&W L2 attack
from art.attacks.evasion import CarliniL2Method

# 2. Instantiate the attack against the DEFENDED classifier
# Note: This attack is computationally more expensive.
cw_attack = CarliniL2Method(classifier=classifier, confidence=0.0, max_iter=100)

# 3. Generate the new adversarial example
x_test_adv_cw = cw_attack.generate(x=x_test)

# 4. Test the prediction
prediction_adv_cw = classifier.predict(x_test_adv_cw)
predicted_class_cw = torch.argmax(prediction_adv_cw, dim=1).item()

print(f"C&W adversarial example is classified as: {predicted_class_cw}")
# Expected Output: C&W adversarial example is classified as: 1 (or another wrong digit)

Analysis and Lessons Learned

In many cases, the C&W attack will succeed where FGSM failed. It finds a different, more subtle path to fool the model that the adversarial training did not prepare it for. This highlights a critical lesson in AI security: a defense is only as good as the attacks it’s tested against.

Robustness is not a binary state. By escalating your attacks, you uncover deeper vulnerabilities and provide a more accurate assessment of the model’s security posture. ART makes this process of swapping, testing, and comparing attack methodologies efficient, allowing you to focus on the strategic implications of your findings.