To effectively test a computer vision system, you must understand its fundamental weaknesses. Adversarial examples exploit the gap between how machines “see” and how humans perceive the world. This chapter introduces two primary tools for this purpose: imperceptible perturbations that poison an entire image and localized, robust patches designed for real-world deception.
Adversarial Perturbations: The Invisible Attack
An adversarial perturbation is a layer of carefully engineered noise applied across an entire image. To a human observer, the modified image appears identical to the original. To a machine learning model, however, this subtle manipulation is enough to completely change its interpretation, causing a confident misclassification.
Think of it as exploiting the model’s over-reliance on specific pixel patterns that are meaningless to us. By slightly adjusting thousands of pixels in a direction the model is most sensitive to, you can push its decision boundary, guiding it toward an incorrect conclusion.
Figure 1: The process of creating an adversarial image via perturbation.
How Perturbations are Generated
The most common methods for generating these perturbations require white-box access to the model—specifically, access to its gradients. The gradient tells you how a tiny change in each input pixel will affect the final output probability. By moving pixels “uphill” along the gradient for an incorrect class, you can efficiently trick the model.
The Fast Gradient Sign Method (FGSM) is the classic example. It’s a one-step method that calculates the gradient of the loss with respect to the input image and adds or subtracts a small value (epsilon) from each pixel to maximize the loss.
# Pseudocode for the Fast Gradient Sign Method (FGSM) function generate_adversarial_perturbation(model, image, true_label): # Calculate the loss for the true label loss = calculate_loss(model, image, true_label) # Get the gradients of the loss w.r.t the input image pixels gradients = compute_gradients(loss, image.pixels) # Collect the sign of each gradient (+1 or -1) signed_gradients = sign(gradients) # Create the perturbation by multiplying the signs by a small epsilon # Epsilon controls the "visibility" of the noise epsilon = 0.01 perturbation = epsilon * signed_gradients return perturbation # Apply the perturbation to the original image adversarial_image = original_image + generate_adversarial_perturbation(...)
Perturbations are powerful for digital-only threat scenarios. As a red teamer, you can use them to test a system’s resilience against manipulated data uploaded by a malicious user or to probe the raw decision-making logic of a model you have access to.
Adversarial Patches: The Physical Trojan Horse
Unlike perturbations, which are subtle and fragile, adversarial patches are overt, localized, and robust. A patch is a small, sticker-like image designed to cause misclassification when placed anywhere in a camera’s field of view. The goal is not to be stealthy but to be so distracting to the AI that it ignores all other context.
The magic of a patch is its resilience. It’s trained to be effective across different angles, lighting conditions, and scales. A well-designed patch printed on a piece of paper can make a model classify a banana as a toaster, regardless of where you place the sticker on or near the banana.
This robustness makes patches the tool of choice for transitioning adversarial attacks from the digital realm into the physical world, a topic we explore in the next chapter. For example, a patch on a stop sign could make an autonomous vehicle’s perception system ignore it completely or classify it as a speed limit sign.
Comparing the Tools
Understanding when to use a perturbation versus a patch is critical for an effective red team engagement. Their properties dictate their use cases.
| Characteristic | Adversarial Perturbation | Adversarial Patch |
|---|---|---|
| Visibility | Imperceptible to humans | Clearly visible, often looks like random noise or abstract art |
| Scope | Affects the entire image (global) | Confined to a small area (local) |
| Robustness | Fragile; small changes like rotation or resizing can break the effect | Robust; designed to work under various physical conditions (angle, lighting) |
| Generation | Often requires white-box access (model gradients) | Can be trained to be “universal” and effective in black-box scenarios |
| Primary Use Case | Digital attacks (e.g., manipulated file uploads), model analysis | Physical world attacks (e.g., stickers on objects), bypassing content filters |
Red Teaming Strategy: Choosing Your Weapon
As a red teamer, your choice between these two methods depends entirely on your objective and the target system’s environment.
- Probing Digital Defenses: If you are testing a web service that processes user-uploaded images (e.g., a content moderation filter, a profile picture analyzer), perturbations are your ideal tool. They test the system’s resilience against crafted digital inputs without any physical interaction. The goal is to see if input sanitization, resizing, or compression defenses are effective.
- Assessing Physical-World Systems: When your target is a system that interacts with the real world—like surveillance cameras, autonomous vehicles, or inventory management robots—patches are the superior choice. Your engagement would involve printing the patch and testing its real-world effectiveness, revealing vulnerabilities that digital-only testing would miss.
Both techniques expose a fundamental truth: AI vision models do not “see” like we do. They learn statistical correlations in pixel data, and these correlations can be systematically exploited. Your job is to find those exploits before a real adversary does.