22.2.4 Attack execution

2025.10.06.
AI Security Blog

With the environment prepared and the FGSM attack script coded, you are now at the operational stage. This phase is not merely about running a file; it’s about observing the system’s reaction, understanding the immediate outputs, and preparing for iterative refinement. Let’s execute the attack and analyze the initial results.

Launching the FGSM Attack

Executing the attack is straightforward. Open your terminal or command prompt, navigate to the directory containing your Python script (e.g., fgsm_attack.py), and run it. The script you wrote in the previous section is designed to be self-contained, handling everything from model loading to image generation.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

# Navigate to your project directory first
# cd /path/to/your/project

python fgsm_attack.py
                

Upon execution, the script will perform the sequence of operations you defined:

  1. Initialize and load the pre-trained ResNet model.
  2. Load and preprocess the target image (e.g., the ‘labrador.jpg’).
  3. Calculate the loss and gradients with respect to the input image.
  4. Generate the perturbation using the sign of the gradients.
  5. Create the adversarial image by adding the scaled perturbation.
  6. Save the original, perturbation, and adversarial images as output files.

Monitoring the Console Output

Your script should provide real-time feedback in the console. This immediate output is your first indicator of the attack’s success. A well-structured output confirms the model’s initial prediction and reveals its final, manipulated prediction.

Loading pre-trained ResNet18 model...
Processing image: labrador.jpg
Epsilon: 0.007

Original Prediction: Labrador retriever, Confidence: 0.92
...
Adversarial Prediction: guenon, Confidence: 0.45

Attack artifacts saved to 'output' directory.
                

This console log is critical. It provides a quick, text-based confirmation that the model’s classification has shifted from “Labrador retriever” to something entirely different, in this case, “guenon” (a type of monkey). The drop in confidence score is also a typical, though not guaranteed, side effect.

Analyzing the Generated Visual Artifacts

The true impact of the attack becomes clear when you inspect the images saved by your script. You should find three key files in your output directory. These artifacts are the primary evidence of your operation.

Original Image Pred: Labrador + Perturbation (Scaled Noise) ε = 0.007 = Adversarial Image Pred: Guenon

When you view the adversarial image, it should look nearly identical to the original to your eyes. The perturbation image, however, will likely appear as random, grayscale noise. This is the core principle of imperceptible adversarial attacks: a small, carefully crafted change, invisible to humans, can cause a catastrophic failure in the model’s logic.

The Red Teamer’s Iterative Approach: Tuning Epsilon

A single successful execution is just a data point. A red teamer’s goal is to understand the boundary of the system’s vulnerability. The primary knob you can turn in FGSM is epsilon (ε), which controls the magnitude of the perturbation. You should re-run the attack with different epsilon values to map the model’s response.

Modify the epsilon value in your script and observe the trade-offs. This iterative process helps you find the “sweet spot”—the lowest possible epsilon that still achieves misclassification, thereby maximizing stealth.

Epsilon (ε) Value Visual Distortion Likely Attack Outcome Red Team Objective
0.001 Imperceptible Likely fails; model still predicts ‘Labrador’. Establish baseline (attack threshold is higher).
0.007 Very low / Imperceptible Likely succeeds in misclassification. Successful stealthy attack. This is an ideal result.
0.05 Slightly noticeable noise/artifacts Almost certain to succeed, possibly with high confidence. Test model robustness against stronger (less subtle) attacks.
0.2 Clearly visible distortion Guaranteed success, but attack is obvious. Understand the model’s breaking point; not a stealthy operation.

By systematically testing different magnitudes, you move from simply executing an attack to performing a vulnerability assessment. You are now characterizing how fragile the model is, which is a far more valuable insight. This empirical data forms the basis for the next and final step: evaluating the results in a structured manner.