You’ve successfully executed the FGSM attack. The model has processed your crafted adversarial examples. Now, the critical phase begins: interpreting the outcome. A simple “pass/fail” is insufficient. A thorough evaluation quantifies the model’s vulnerability and provides the evidence needed to recommend effective defenses.
This process moves beyond a binary outcome to a nuanced understanding of the model’s breaking points and its behavior under duress.
Core Evaluation Metrics: Beyond Accuracy
Your analysis should focus on several key metrics that together paint a complete picture of the attack’s impact.
Quantitative Analysis
Start with the hard numbers. These metrics are objective and form the basis of your technical report.
- Attack Success Rate (ASR): The percentage of adversarial examples that successfully caused a misclassification. This is the primary indicator of the attack’s effectiveness.
- Model Accuracy (on Adversarial Data): The inverse of the ASR. If ASR is 90%, the model’s accuracy on this adversarial dataset is 10%. Comparing this to the model’s baseline accuracy on clean data highlights the performance degradation.
- Confidence Shift: For successful attacks, measure the model’s confidence in the incorrect (adversarial) label. A high-confidence misclassification is a more severe failure than a low-confidence one. For unsuccessful attacks, check if the confidence in the *correct* label dropped significantly.
Example Scenario: A model with 98% accuracy on clean images is subjected to an FGSM attack with a low epsilon. The results show its accuracy drops to 15%. This 83-point drop is a clear, quantifiable measure of its brittleness.
A simple table can summarize these findings effectively:
| Epsilon (ε) | Clean Accuracy | Adversarial Accuracy | Attack Success Rate | Avg. Confidence (Misclassified) |
|---|---|---|---|---|
| 0.007 | 99.2% | 75.4% | 23.8% | 68.1% |
| 0.02 | 99.2% | 41.9% | 57.3% | 82.5% |
| 0.05 | 99.2% | 8.7% | 90.5% | 91.3% |
Qualitative Analysis
Numbers alone don’t tell the whole story. You must visually inspect the adversarial examples. The goal of many evasion attacks is stealth. If the perturbation is obvious to a human observer, the attack has limited practical value in many scenarios.
Ask yourself:
- Is the perturbation visible at the given epsilon value?
- Does the adversarial image still look like the original class to you?
- Could this perturbed input realistically appear in a production environment?
Visualizing the Epsilon-Accuracy Trade-off
One of the most powerful ways to communicate the model’s vulnerability is by plotting its accuracy as a function of epsilon. This graph clearly demonstrates how quickly the model’s performance degrades as the attack strength increases. A robust model will show a slow, graceful decline in accuracy, while a brittle model will exhibit a steep drop even at very small epsilon values.
From Data to Actionable Insights
Your evaluation is the bridge between a technical finding and a business risk. The final step is to synthesize your quantitative and qualitative data into a clear narrative.
Instead of stating “The FGSM attack worked,” you can now provide a much more powerful assessment:
“With a visually imperceptible perturbation level of epsilon=0.02, we were able to force the image classification model to misclassify inputs with an 82.5% confidence in the wrong category. This represents a critical vulnerability, as the model’s accuracy dropped from 99.2% to 41.9% under these conditions, creating a significant vector for system evasion.”
This level of detail transforms a technical exercise into an undeniable security finding, paving the way for targeted defensive measures like adversarial training or input sanitization. Your evaluation is not the end of the test; it’s the foundation of the solution.