Effective AI security is not just about identifying vulnerabilities; it’s about knowing the direct countermeasures for specific adversarial actions. This reference table maps common attack vectors to their corresponding defensive strategies, providing a tactical guide for building resilient systems. Think of this as a playbook where every offensive move has one or more defensive counters.
The relationship between attack and defense is a continuous cycle. An attacker develops a new technique, a defender creates a countermeasure, and the attacker adapts. Understanding these pairings is fundamental to anticipating threats and prioritizing your defensive investments.
Core Attack-Defense Mappings
The following table provides a concise cross-reference between attack classes and specific defensive mechanisms. Note that defenses are often layered; a single mechanism is rarely a complete solution.
| Attack Category | Specific Attack Technique | Primary Defense Strategy | Specific Defense Mechanism(s) |
|---|---|---|---|
| Evasion (Test-time attacks) |
FGSM, PGD, C&W, DeepFool, Universal Adversarial Perturbations (UAPs) | Model Robustness & Input Sanitization |
|
| Data Poisoning (Training-time attacks) |
Label Flipping, Backdoor Attacks (e.g., BadNets, Trojaning), Clean-Label Attacks | Data Integrity & Model Hygiene |
|
| Privacy Violation | Membership Inference Attacks (MIA), Model Inversion, Attribute Inference | Data Obfuscation & Generalization |
|
| Model Stealing (Extraction) |
Query-Based Model Extraction (black-box), Functionality Stealing | Access Control & Intellectual Property Protection |
|
| LLM / Prompt Attacks | Direct/Indirect Prompt Injection, Jailbreaking, Malicious Persona Activation | Input/Output Filtering & System Sandboxing |
|
| Supply Chain | Malicious Pre-trained Models, Trojanized Training Data, Compromised Libraries (e.g., pickle) | Asset Verification & Provenance |
|
The Defense Lifecycle
Defenses are not static. They exist within a dynamic lifecycle where proactive hardening and reactive responses work in tandem. A mature security posture relies on this continuous loop to adapt to evolving threats.
Figure 25.4.4.1 – The continuous AI security lifecycle, balancing proactive and reactive measures.
- Harden (Proactive): This phase involves building inherently resilient models. Techniques like adversarial training and differential privacy are implemented here, before the model is deployed. The goal is to raise the cost for an attacker from the outset.
- Detect (Reactive): Once a system is live, you need mechanisms to identify attacks in progress. This includes monitoring query patterns for model stealing, checking input data for anomalies that might signal poisoning, or flagging prompts that match injection signatures.
- Respond: When an attack is detected, an automated or manual response is triggered. This could mean blocking an IP address, quarantining suspicious data for review, or temporarily taking a model offline.
- Adapt: This is the crucial feedback loop. Information from detected attacks is used to improve the proactive hardening phase. For example, if you detect a new type of evasion attack, you incorporate examples of it into your next round of adversarial training.