3.1.2 Threat modeling for AI systems

2025.10.06.
AI Security Blog

With your objective and scope defined, the next logical step is to adopt an adversarial mindset. Threat modeling is not a checkbox exercise; it is the systematic process of thinking like an attacker. You will deconstruct the target AI system, identify its weak points, and hypothesize how it could be compromised. This process transforms a vague goal like “test the LLM” into a concrete list of potential attack vectors that will guide your entire engagement.

Beyond STRIDE: Why AI Demands a New Lens

Traditional threat modeling frameworks like STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) are excellent for conventional software. They force you to think about how an attacker might impersonate a user, alter data on disk, or crash a server. These threats are absolutely still relevant to the infrastructure supporting AI systems.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

However, they miss the novel attack surface introduced by machine learning itself. A traditional model won’t prompt you to ask questions like:

  • Could an attacker poison our training data to create a hidden backdoor in the model?
  • Can a carefully crafted input cause the model to misclassify a critical object with high confidence?
  • Is it possible to “steal” the proprietary model by repeatedly querying its public API?

These are not infrastructure problems; they are vulnerabilities in the statistical fabric of the model and the integrity of its data lifecycle. To address this, you need an expanded framework that treats the AI components as first-class citizens in the threat analysis.

The Data-Model-Infrastructure Triad

A practical approach for red teaming is to analyze the AI system through three distinct but interconnected planes of attack. This ensures you cover both classic and emerging AI-centric threats. We call this the Data-Model-Infrastructure (DMI) Triad.

The Data-Model-Infrastructure (DMI) Triad Model Plane Data Plane Infrastructure

The DMI Triad: Analyzing threats across the data, model, and supporting infrastructure planes.

  • Data Plane: Concerns the integrity, confidentiality, and availability of the data used to train and operate the model. Threats include data poisoning, evasion attacks, and data leakage from training sets.
  • Model Plane: Focuses on direct attacks against the machine learning model itself. Threats include model extraction (stealing), model inversion (recovering sensitive training data), and membership inference attacks.
  • Infrastructure Plane: Encompasses the conventional IT stack that supports the AI system. This is where STRIDE applies directly. Threats include unsecured cloud storage, vulnerable API endpoints, compromised container orchestration, and insecure CI/CD pipelines for model deployment.

A sophisticated attack often traverses these planes. For example, an attacker might exploit an infrastructure vulnerability (e.g., weak access controls on a data lake) to mount a data plane attack (e.g., poisoning the training data).

A Practical Workflow for Threat Modeling

Thinking in planes is useful, but you need a systematic process. Here is a step-by-step workflow to guide your analysis.

Step 1: Decompose the AI System

You cannot analyze what you do not understand. First, map out the entire AI system as a data flow diagram (DFD). Identify every component and how data moves between them. A typical pipeline might look like this:

AI System Data Flow Diagram Data Sources Training Pipeline Model Registry Inference API User Trust Boundary

A simplified DFD for an AI system, highlighting a critical trust boundary between the internal model deployment and external user queries.

Step 2: Identify Trust Boundaries

A trust boundary is any point in the system where data or control passes from a less-trusted entity to a more-trusted one. These are prime locations for attacks. In the diagram above, the line between the internal “Inference API” and the external “User” is a critical trust boundary. Any input crossing it must be considered potentially malicious.

Step 3: Enumerate Threats per Component using the DMI Triad

Now, go through your DFD component by component and brainstorm threats for each plane. A table is an effective way to organize this analysis.

Component Data Plane Threat Model Plane Threat Infrastructure Plane Threat
Training Pipeline Backdoor poisoning: Attacker injects a few malicious samples to make the model behave predictably on a secret trigger. N/A (Model is being created) Compromise of the training server via unpatched vulnerability to exfiltrate the pre-trained base model.
Model Registry N/A (Data is static model file) Model tampering: Attacker with access replaces the production model file with a malicious version. Misconfigured access controls on the artifact repository (e.g., public S3 bucket) allowing unauthorized model download.
Inference API Evasion attack: User submits a carefully crafted input (e.g., adversarial email) to bypass a spam filter. Model extraction: Attacker makes thousands of queries to deduce model architecture and weights, creating a functional copy. Denial of Service (DoS) attack by sending resource-intensive queries that cause high GPU/CPU usage.

This systematic enumeration ensures you don’t overlook entire classes of vulnerabilities. For instance, consider a simple evasion attack against an image classifier. The threat is in the Data Plane, but the test case involves interacting with the Infrastructure Plane (the API).

# Pseudocode for a conceptual evasion attack
function create_adversarial_image(original_image, model, target_class):
    # Start with the original image
    adversarial_image = original_image.copy()
    
    # Calculate the gradient of the loss with respect to the input image
    gradient = compute_gradient(model, adversarial_image, target_class)
    
    # Create a small perturbation by moving in the direction of the sign of the gradient
    perturbation = sign(gradient) * 0.01 // Epsilon value
    
    # Add the imperceptible noise to the image
    adversarial_image += perturbation
    
    # Clip values to ensure it's still a valid image (e.g., pixel values 0-255)
    adversarial_image = clip(adversarial_image, 0, 255)
    
    return adversarial_image

From Threats to Actionable Test Cases

The ultimate goal of threat modeling in a red team context is to generate a prioritized list of testable hypotheses. Each identified threat should be translated into a concrete test case that you can execute during the engagement.

  • Threat: Model extraction via the public API.

    Test Case: “Develop a script to query the API with 10,000 diverse inputs. Use the resulting labels to train a surrogate model (e.g., a simple decision tree). Measure the surrogate’s accuracy against a held-out validation set to determine the feasibility of model theft.”
  • Threat: Backdoor poisoning of the training data.

    Test Case: “If access to a representative data contribution channel is in scope, submit 100 images of a specific object (e.g., a green hat) labeled as a benign class (e.g., ‘lamp’). After the model retrains, test if it now consistently misclassifies images containing a green hat.”

This translation is the critical bridge from strategic planning to tactical execution. It ensures your red team activities are not random but are directly tied to plausible, high-impact threats you identified systematically.

Key Takeaways for Threat Modeling AI

  • Extend, Don’t Replace: Augment traditional frameworks like STRIDE with AI-specific considerations. The DMI Triad (Data, Model, Infrastructure) is a powerful mental model for this.
  • Decomposition is Non-Negotiable: You must map the system’s architecture and data flows to understand its attack surface and identify trust boundaries.
  • Think in Attack Chains: The most effective attacks often cross planes, such as using an infrastructure flaw to enable a data attack. Your model should account for these multi-stage scenarios.
  • The Output is Action: The final product of your threat modeling exercise is a set of specific, testable hypotheses that will form the core of your red team engagement plan.