25.4.1 Concept relationship matrix

2025.10.06.
AI Security Blog

Navigating the complex landscape of AI security requires understanding not just individual concepts, but how they interrelate. A single vulnerability can be exploited through various attacks, affecting different stages of the machine learning lifecycle and demanding specific defensive postures. This matrix serves as a quick-reference tool to map these connections.

Use this table to trace a concept—like “Data Poisoning”—across its associated domains, lifecycle stages, typical goals, and common mitigations. This cross-referencing is invaluable during engagement planning, threat modeling, and report writing, helping you to articulate the full scope of a finding or a defensive recommendation.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

AI Security Concept Matrix

Cross-Reference of AI Security Concepts
Concept Primary Domain Affected ML Lifecycle Stage Typical Attacker Goal Common Mitigations
Evasion & Integrity Attacks
Adversarial Examples Integrity, Availability Inference Induce misclassification, bypass security filters (e.g., malware detection, content moderation).
  • Adversarial Training
  • Input Sanitization/Filtering
  • Gradient Masking (caution: can create false sense of security)
Data Poisoning Integrity Training Degrade overall model performance, create targeted misclassifications, or install backdoors.
  • Data Provenance Checks
  • Outlier/Anomaly Detection
  • Model Retraining & Monitoring
Backdoor Attacks (Trojans) Integrity, Confidentiality Training, Fine-tuning Create a hidden trigger that causes specific, predictable model misbehavior when activated.
  • Input Filtering (for triggers)
  • Model Pruning
  • Secure Supply Chain (for pre-trained models)
Privacy & Confidentiality Attacks
Membership Inference Confidentiality (Privacy) Inference Determine if a specific data record was part of the model’s training set.
  • Differential Privacy
  • Regularization (e.g., dropout)
  • Reducing model overfitting
Model Inversion Confidentiality (Privacy) Inference Reconstruct sensitive features or entire data samples from the training data by querying the model.
  • Reducing output precision/confidence scores
  • Differential Privacy
  • Training on less sensitive data representations
Attribute Inference Confidentiality (Privacy) Inference Infer sensitive attributes about a data subject, even if the model was not explicitly trained to predict them.
  • Data Minimization
  • Federated Learning
  • Fairness-aware ML techniques
Model & System Exploitation
Model Stealing (Extraction) Confidentiality (IP) Inference Recreate a functional copy of a victim model by repeatedly querying its API.
  • Rate Limiting
  • API Query Monitoring
  • Watermarking Model Outputs
Prompt Injection Integrity, Confidentiality Inference (LLMs) Override an LLM’s original instructions to perform unintended actions or reveal sensitive system prompts.
  • Instructional Defense/Prompt Sanitization
  • Separation of user input and system instructions
  • Output Filtering
Jailbreaking Integrity, Availability Inference (LLMs) Bypass a model’s safety and ethics alignment to generate harmful, biased, or forbidden content.
  • Reinforcement Learning from Human Feedback (RLHF)
  • Input/Output Filtering
  • Red Teaming during development
Denial of Service (Resource Depletion) Availability Inference, Deployment Overwhelm the model or its infrastructure with computationally expensive queries, making it unavailable.
  • Rate Limiting & Throttling
  • Input Complexity Analysis
  • Infrastructure Autoscaling