22.2.5. Results evaluation

2025.10.06.
AI Security Blog

You’ve successfully executed the FGSM attack. The model has processed your crafted adversarial examples. Now, the critical phase begins: interpreting the outcome. A simple “pass/fail” is insufficient. A thorough evaluation quantifies the model’s vulnerability and provides the evidence needed to recommend effective defenses.

This process moves beyond a binary outcome to a nuanced understanding of the model’s breaking points and its behavior under duress.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Core Evaluation Metrics: Beyond Accuracy

Your analysis should focus on several key metrics that together paint a complete picture of the attack’s impact.

Quantitative Analysis

Start with the hard numbers. These metrics are objective and form the basis of your technical report.

  • Attack Success Rate (ASR): The percentage of adversarial examples that successfully caused a misclassification. This is the primary indicator of the attack’s effectiveness.
  • Model Accuracy (on Adversarial Data): The inverse of the ASR. If ASR is 90%, the model’s accuracy on this adversarial dataset is 10%. Comparing this to the model’s baseline accuracy on clean data highlights the performance degradation.
  • Confidence Shift: For successful attacks, measure the model’s confidence in the incorrect (adversarial) label. A high-confidence misclassification is a more severe failure than a low-confidence one. For unsuccessful attacks, check if the confidence in the *correct* label dropped significantly.

Example Scenario: A model with 98% accuracy on clean images is subjected to an FGSM attack with a low epsilon. The results show its accuracy drops to 15%. This 83-point drop is a clear, quantifiable measure of its brittleness.

A simple table can summarize these findings effectively:

Epsilon (ε) Clean Accuracy Adversarial Accuracy Attack Success Rate Avg. Confidence (Misclassified)
0.007 99.2% 75.4% 23.8% 68.1%
0.02 99.2% 41.9% 57.3% 82.5%
0.05 99.2% 8.7% 90.5% 91.3%

Qualitative Analysis

Numbers alone don’t tell the whole story. You must visually inspect the adversarial examples. The goal of many evasion attacks is stealth. If the perturbation is obvious to a human observer, the attack has limited practical value in many scenarios.

Ask yourself:

  • Is the perturbation visible at the given epsilon value?
  • Does the adversarial image still look like the original class to you?
  • Could this perturbed input realistically appear in a production environment?

X Original Image Pred: “Cat” + ε * sign(∇) FGSM Noise = X’ Adversarial Image Pred: “Dog”

Visualizing the Epsilon-Accuracy Trade-off

One of the most powerful ways to communicate the model’s vulnerability is by plotting its accuracy as a function of epsilon. This graph clearly demonstrates how quickly the model’s performance degrades as the attack strength increases. A robust model will show a slow, graceful decline in accuracy, while a brittle model will exhibit a steep drop even at very small epsilon values.

Model Accuracy vs. Epsilon (ε) 100% 50% 0% Accuracy 0 0.02 0.04 0.06 0.08 Epsilon (ε) Brittle Model Robust Model

From Data to Actionable Insights

Your evaluation is the bridge between a technical finding and a business risk. The final step is to synthesize your quantitative and qualitative data into a clear narrative.

Instead of stating “The FGSM attack worked,” you can now provide a much more powerful assessment:

“With a visually imperceptible perturbation level of epsilon=0.02, we were able to force the image classification model to misclassify inputs with an 82.5% confidence in the wrong category. This represents a critical vulnerability, as the model’s accuracy dropped from 99.2% to 41.9% under these conditions, creating a significant vector for system evasion.”

This level of detail transforms a technical exercise into an undeniable security finding, paving the way for targeted defensive measures like adversarial training or input sanitization. Your evaluation is not the end of the test; it’s the foundation of the solution.