Navigating the complex landscape of AI security requires understanding not just individual concepts, but how they interrelate. A single vulnerability can be exploited through various attacks, affecting different stages of the machine learning lifecycle and demanding specific defensive postures. This matrix serves as a quick-reference tool to map these connections.
Use this table to trace a concept—like “Data Poisoning”—across its associated domains, lifecycle stages, typical goals, and common mitigations. This cross-referencing is invaluable during engagement planning, threat modeling, and report writing, helping you to articulate the full scope of a finding or a defensive recommendation.
AI Security Concept Matrix
| Concept | Primary Domain | Affected ML Lifecycle Stage | Typical Attacker Goal | Common Mitigations |
|---|---|---|---|---|
| Evasion & Integrity Attacks | ||||
| Adversarial Examples | Integrity, Availability | Inference | Induce misclassification, bypass security filters (e.g., malware detection, content moderation). |
|
| Data Poisoning | Integrity | Training | Degrade overall model performance, create targeted misclassifications, or install backdoors. |
|
| Backdoor Attacks (Trojans) | Integrity, Confidentiality | Training, Fine-tuning | Create a hidden trigger that causes specific, predictable model misbehavior when activated. |
|
| Privacy & Confidentiality Attacks | ||||
| Membership Inference | Confidentiality (Privacy) | Inference | Determine if a specific data record was part of the model’s training set. |
|
| Model Inversion | Confidentiality (Privacy) | Inference | Reconstruct sensitive features or entire data samples from the training data by querying the model. |
|
| Attribute Inference | Confidentiality (Privacy) | Inference | Infer sensitive attributes about a data subject, even if the model was not explicitly trained to predict them. |
|
| Model & System Exploitation | ||||
| Model Stealing (Extraction) | Confidentiality (IP) | Inference | Recreate a functional copy of a victim model by repeatedly querying its API. |
|
| Prompt Injection | Integrity, Confidentiality | Inference (LLMs) | Override an LLM’s original instructions to perform unintended actions or reveal sensitive system prompts. |
|
| Jailbreaking | Integrity, Availability | Inference (LLMs) | Bypass a model’s safety and ethics alignment to generate harmful, biased, or forbidden content. |
|
| Denial of Service (Resource Depletion) | Availability | Inference, Deployment | Overwhelm the model or its infrastructure with computationally expensive queries, making it unavailable. |
|