15.1.2 Threat modeling integration

2025.10.06.
AI Security Blog

Security by design is the destination; threat modeling is the map you draw to get there. It’s the structured process of identifying what can go wrong before you’ve even written a line of production code. For AI systems, this isn’t just a good practice—it’s essential. The attack surface isn’t just your infrastructure and APIs; it includes your data, your model’s logic, and the very way it interprets human language.

Scenario: The AI-Powered Healthcare Assistant

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Imagine your team is developing a chatbot to help patients schedule appointments and get answers to basic medical questions. It’s powered by a large language model (LLM) fine-tuned on medical literature and integrated with the clinic’s scheduling database. What could possibly go wrong?

  • A user could trick the bot into revealing another patient’s appointment details.
  • A malicious actor could “poison” the training data with false medical information.
  • A cleverly crafted prompt could cause the bot to bypass its safety filters and offer dangerous medical advice.

Threat modeling forces you to ask these questions systematically, long before a real patient is harmed.

The Four-Step Process for AI Systems

Traditional threat modeling often follows a simple loop: decompose the system, identify threats, determine mitigations, and validate. For AI, we adapt this process to account for the unique components like models, data pipelines, and prompt interfaces.

  1. Decompose the System: Map out every component, data flow, and trust boundary.
  2. Identify Threats: Use a framework to brainstorm potential attacks against each component.
  3. Prioritize & Mitigate: Assess the risk of each threat and design controls to counter it.
  4. Validate & Iterate: Confirm that mitigations work and treat the threat model as a living document.

Step 1: Decomposing the AI System

Before you can find threats, you must understand the system’s anatomy. A data flow diagram (DFD) is your primary tool here. You need to identify every process, data store, external entity, and the data flowing between them. Crucially, you must draw trust boundaries—the lines where data passes from a less trusted zone to a more trusted one.

AI System Data Flow Diagram for Threat Modeling Clinic Internal Network (High Trust) Patient (Untrusted) Web API LLM Service Vector DB (Knowledge Base) Patient DB (PII/PHI) Prompt Processed Query RAG Query DB Query Generated Response Final Answer

In this diagram, the trust boundary is critical. The web API is the gatekeeper. Any data crossing from the untrusted patient into the clinic network must be sanitized and validated. The LLM Service, which has access to sensitive data, becomes a high-value target.

Step 2: Identifying Threats with an AI-Centric STRIDE

STRIDE is a classic mnemonic for threat modeling (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege). To make it effective for AI, you must reinterpret its categories through a machine learning lens. Let’s call it STRIDE-ML.

Category Traditional Threat AI-Specific Manifestation (STRIDE-ML)
Spoofing Impersonating a user Prompt Injection: Tricking the model into adopting a malicious persona or ignoring its instructions.
Tampering Modifying data in transit Data Poisoning: Corrupting training or RAG data to manipulate model behavior and outputs.
Repudiation Denying an action was taken Lack of Explainability: Inability to prove why a model made a harmful decision, hindering incident response.
Information Disclosure Leaking sensitive data Model Inversion / PII Extraction: Crafting prompts that cause the model to leak sensitive training data or confidential information from its context.
Denial of Service Overwhelming a service Resource Exhaustion: Submitting computationally expensive prompts (e.g., “repeat this word a billion times”) to freeze or crash the model service.
Elevation of Privilege Gaining unauthorized access System Prompt Jailbreaking: Bypassing safety filters to access underlying tools, APIs, or data sources the model shouldn’t have access to.

Step 3: Documenting and Mitigating Threats

Once you’ve brainstormed threats, you need to document and prioritize them. A simple risk rating (e.g., High, Medium, Low) based on impact and likelihood is often sufficient. The goal is to focus your defensive efforts where they matter most.

Your documentation for each threat should be clear and actionable. It’s not just an academic exercise; it’s a backlog of security work for your engineering team. Here’s how you might document a high-priority threat for our healthcare bot:

# Threat ID: TH-001
# Component: Web API -> LLM Service
# STRIDE-ML Category: Elevation of Privilege

Threat:
  Description: >
    A malicious user can craft a prompt that bypasses the system's
    pre-filters and instructs the LLM to ignore its primary role.
    The prompt could then instruct the model to query the Patient DB
    for information about other users.
  Attack Vector: Maliciously crafted natural language input.

Risk:
  Impact: High (Disclosure of Patient Health Information - PHI)
  Likelihood: Medium (Requires knowledge of prompt injection techniques)
  Rating: High

Mitigation:
  - ID: MIT-001
    Description: Implement a secondary, hardened LLM to act as a
                 prompt firewall, re-writing or blocking malicious inputs.
    Status: In Progress
  - ID: MIT-002
    Description: Strictly parameterize all database queries. The LLM
                 should only be able to request data for the authenticated user.
    Status: Implemented

The Living Threat Model: A Bridge to Continuous Validation

A threat model is not a PDF you create once and file away. It’s a dynamic, living document. Every time you add a new feature, integrate a new tool, or learn about a new attack technique from the research community, you must revisit your model.

Does the new feature introduce new data flows or trust boundaries? Does a newly discovered jailbreak technique invalidate one of your mitigations? Your threat model becomes the central repository of your security assumptions. The next step, continuous validation, is about actively testing those assumptions to ensure they hold true in the real world—a perfect handoff from proactive planning to ongoing assurance.