Security by design is the destination; threat modeling is the map you draw to get there. It’s the structured process of identifying what can go wrong before you’ve even written a line of production code. For AI systems, this isn’t just a good practice—it’s essential. The attack surface isn’t just your infrastructure and APIs; it includes your data, your model’s logic, and the very way it interprets human language.
Scenario: The AI-Powered Healthcare Assistant
Imagine your team is developing a chatbot to help patients schedule appointments and get answers to basic medical questions. It’s powered by a large language model (LLM) fine-tuned on medical literature and integrated with the clinic’s scheduling database. What could possibly go wrong?
- A user could trick the bot into revealing another patient’s appointment details.
- A malicious actor could “poison” the training data with false medical information.
- A cleverly crafted prompt could cause the bot to bypass its safety filters and offer dangerous medical advice.
Threat modeling forces you to ask these questions systematically, long before a real patient is harmed.
The Four-Step Process for AI Systems
Traditional threat modeling often follows a simple loop: decompose the system, identify threats, determine mitigations, and validate. For AI, we adapt this process to account for the unique components like models, data pipelines, and prompt interfaces.
- Decompose the System: Map out every component, data flow, and trust boundary.
- Identify Threats: Use a framework to brainstorm potential attacks against each component.
- Prioritize & Mitigate: Assess the risk of each threat and design controls to counter it.
- Validate & Iterate: Confirm that mitigations work and treat the threat model as a living document.
Step 1: Decomposing the AI System
Before you can find threats, you must understand the system’s anatomy. A data flow diagram (DFD) is your primary tool here. You need to identify every process, data store, external entity, and the data flowing between them. Crucially, you must draw trust boundaries—the lines where data passes from a less trusted zone to a more trusted one.
In this diagram, the trust boundary is critical. The web API is the gatekeeper. Any data crossing from the untrusted patient into the clinic network must be sanitized and validated. The LLM Service, which has access to sensitive data, becomes a high-value target.
Step 2: Identifying Threats with an AI-Centric STRIDE
STRIDE is a classic mnemonic for threat modeling (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege). To make it effective for AI, you must reinterpret its categories through a machine learning lens. Let’s call it STRIDE-ML.
| Category | Traditional Threat | AI-Specific Manifestation (STRIDE-ML) |
|---|---|---|
| Spoofing | Impersonating a user | Prompt Injection: Tricking the model into adopting a malicious persona or ignoring its instructions. |
| Tampering | Modifying data in transit | Data Poisoning: Corrupting training or RAG data to manipulate model behavior and outputs. |
| Repudiation | Denying an action was taken | Lack of Explainability: Inability to prove why a model made a harmful decision, hindering incident response. |
| Information Disclosure | Leaking sensitive data | Model Inversion / PII Extraction: Crafting prompts that cause the model to leak sensitive training data or confidential information from its context. |
| Denial of Service | Overwhelming a service | Resource Exhaustion: Submitting computationally expensive prompts (e.g., “repeat this word a billion times”) to freeze or crash the model service. |
| Elevation of Privilege | Gaining unauthorized access | System Prompt Jailbreaking: Bypassing safety filters to access underlying tools, APIs, or data sources the model shouldn’t have access to. |
Step 3: Documenting and Mitigating Threats
Once you’ve brainstormed threats, you need to document and prioritize them. A simple risk rating (e.g., High, Medium, Low) based on impact and likelihood is often sufficient. The goal is to focus your defensive efforts where they matter most.
Your documentation for each threat should be clear and actionable. It’s not just an academic exercise; it’s a backlog of security work for your engineering team. Here’s how you might document a high-priority threat for our healthcare bot:
# Threat ID: TH-001
# Component: Web API -> LLM Service
# STRIDE-ML Category: Elevation of Privilege
Threat:
Description: >
A malicious user can craft a prompt that bypasses the system's
pre-filters and instructs the LLM to ignore its primary role.
The prompt could then instruct the model to query the Patient DB
for information about other users.
Attack Vector: Maliciously crafted natural language input.
Risk:
Impact: High (Disclosure of Patient Health Information - PHI)
Likelihood: Medium (Requires knowledge of prompt injection techniques)
Rating: High
Mitigation:
- ID: MIT-001
Description: Implement a secondary, hardened LLM to act as a
prompt firewall, re-writing or blocking malicious inputs.
Status: In Progress
- ID: MIT-002
Description: Strictly parameterize all database queries. The LLM
should only be able to request data for the authenticated user.
Status: Implemented
The Living Threat Model: A Bridge to Continuous Validation
A threat model is not a PDF you create once and file away. It’s a dynamic, living document. Every time you add a new feature, integrate a new tool, or learn about a new attack technique from the research community, you must revisit your model.
Does the new feature introduce new data flows or trust boundaries? Does a newly discovered jailbreak technique invalidate one of your mitigations? Your threat model becomes the central repository of your security assumptions. The next step, continuous validation, is about actively testing those assumptions to ensure they hold true in the real world—a perfect handoff from proactive planning to ongoing assurance.