AI Threat Modeling in Practice: A STRIDE and MITRE ATLAS Workshop Guide

2025.10.17.
AI Security Blog

AI Threat Modeling in Practice: A STRIDE and MITRE ATLAS Workshop Guide

So you’ve built a shiny new AI model. It’s clever, it’s fast, and it’s probably going to make someone a lot of money. You’ve followed all the MLOps best practices, your CI/CD pipeline is a thing of beauty, and your infrastructure is locked down tighter than Fort Knox. You’re done, right? Time to ship it.

Hold on. What if I told you that your state-of-the-art AI is like a brilliant, eccentric artist living in that fortress? You’ve secured the walls, but the artist can be tricked, manipulated, and driven mad by whispers through the mail slot. What if a competitor could subtly poison your training data, making your model go haywire on a specific day? What if a malicious user could craft a single, bizarre-looking image that your model, which is 99.9% accurate, confidently identifies as a school bus when it’s actually a stop sign?

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Welcome to the weird, wonderful, and frankly terrifying world of AI security. The attack surface isn’t just your code and your cloud config anymore. It’s the data. It’s the model’s logic. It’s the very assumptions you built your system on.

If that sounds overwhelming, good. It should. But we’re not here to panic. We’re here to get organized. We’re here to do something engineers are great at: systematically take things apart to see how they can break. We’re going to run a threat modeling workshop.

This isn’t your grandpa’s security checklist. This is a creative, collaborative, and slightly paranoid Dungeons & Dragons campaign for your AI system. And our game master’s screen has two powerful tools on it: a classic framework called STRIDE and the AI-specific monster manual, MITRE ATLAS.

Why Your Old Security Playbook Is a Recipe for Disaster

For years, application security has been about securing the castle. We build firewalls (the moat), use authentication (the gatekeepers), and encrypt data (the secret passages). We find vulnerabilities in the code—the cracks in the walls. It’s a well-understood domain. If you find a SQL injection vulnerability, you know how to fix it. The rules are clear.

AI systems are different. They aren’t just castles; they’re living, learning organisms inside the castle. You can have perfectly secure, bug-free code, and your AI can still be spectacularly compromised. Why? Because the attack surface has fundamentally changed.

Think of it like this: a traditional app is like a car engine. You can inspect every part, understand its function, and if a part is broken, you replace it. An AI model is more like the car’s driver. You can teach the driver, give them rules, and test their skills. But you can’t predict every single strange situation they’ll encounter on the road. And a clever adversary won’t try to break the engine; they’ll try to fool the driver with optical illusions and bad directions.

This new attack surface includes things like:

  • Data Poisoning: Tainting the training data to create a hidden backdoor or bias in the model.
  • Evasion Attacks: Crafting inputs during inference that are specifically designed to fool the model.
  • Model Inversion: Querying a model in such a way that you can reconstruct the private training data it learned from.
  • Model Stealing: Interacting with a public API to create a functionally identical copy of a proprietary model.

Your static code scanner isn’t going to find these. Your firewall won’t stop them. You need a new way of thinking. You need to model the threats.

The Shifting Attack Surface Traditional App (“The Castle”) Secure Code Network Firewall Access Control Encrypted Data-at-Rest SQL Injection AI System (“The Organism”) Data Poisoning Evasion Attacks Model Inversion Model Stealing …in addition to all traditional threats!

Our Tools: STRIDE and MITRE ATLAS

To navigate this new terrain, we need a map and a compass. STRIDE is our compass; it gives us direction. MITRE ATLAS is our map; it shows us the monsters lurking in the shadows.

STRIDE: The Classic Threat Categories, Remixed for AI

STRIDE is a threat modeling methodology developed at Microsoft ages ago. It’s an acronym that stands for six categories of threats. It’s brilliant because it’s simple and comprehensive. It forces you to ask the right kind of questions about every part of your system.

But the standard definitions are all about traditional software. Let’s give them an AI-flavored remix.

  • Spoofing: Pretending to be something or someone you’re not.
    • Traditional: Faking an IP address or using stolen user credentials.
    • AI Remix: Creating a fake “adversarial patch”—a sticker you can put on a stop sign that makes a self-driving car’s vision system see a 65 mph speed limit sign. The sticker is spoofing a legitimate road sign to the model.
  • Tampering: Modifying data or code illicitly.
    • Traditional: Changing a value in a database, like your bank account balance.
    • AI Remix: Data poisoning. You subtly modify a few thousand images in a massive training dataset. The model still trains perfectly, but you’ve implanted a backdoor. For example, if the model sees a picture of your company’s CEO, it classifies it as “Untrustworthy.” The model’s logic has been tampered with.
  • Repudiation: Denying that you did something.
    • Traditional: A user deleting logs to hide their tracks after a breach.
    • AI Remix: An AI-powered trading bot executes a disastrous series of trades. Without proper MLOps logging (which model version was used? what were its exact inputs?), it can be impossible to prove whether the model was exploited, had a bug, or was just behaving “as designed” on weird market data. You can’t repudiate the action if you can’t explain it.
  • Information Disclosure: Exposing information to unauthorized individuals.
    • Traditional: A leaky S3 bucket exposing customer records. Classic.
    • AI Remix: Model inversion. An attacker queries your language model with carefully crafted prompts like “The social security number for John Smith is…” and observes the autocomplete suggestions. If John Smith’s PII was in the training data, the model might just hand it over. It’s a data breach through a conversation.
  • Denial of Service (DoS): Making a system or service unavailable.
    • Traditional: Flooding a web server with traffic (a DDoS attack).
    • AI Remix: Crafting a computationally “hard” input. Some models have inputs that, while valid, require exponentially more processing power to analyze. An attacker could send a handful of these requests to your inference API, maxing out your expensive GPU cluster and making the service unavailable for everyone else. It’s a DoS attack with a scalpel, not a sledgehammer.
  • Elevation of Privilege: Gaining capabilities without proper authorization.
    • Traditional: A regular user finding a way to get admin access.
    • AI Remix: Bypassing a content filter. A language model is supposed to refuse to answer harmful questions. An attacker uses a “jailbreak” prompt (a complex, layered set of instructions) that tricks the model into ignoring its safety rules, effectively elevating their privilege from “regular user” to “user who can make the AI generate dangerous content.”

See the pattern? The categories are the same, but the expression of the threat is totally different.

Golden Nugget: STRIDE gives you the “what.” It tells you what kind of bad thing could happen. But it doesn’t tell you how it could happen in the world of AI. For that, we need our monster manual.

MITRE ATLAS: The AI Attacker’s Playbook

If STRIDE is the “what,” MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is the “how.” It’s a knowledge base of adversarial tactics and techniques based on real-world observations and academic research. It’s literally a catalog of all the nasty things people can do to AI systems.

Don’t try to memorize it all. Just understand its structure. It’s organized like its famous cousin, MITRE ATT&CK, for traditional cybersecurity:

  • Tactics: These are the attacker’s high-level goals. Things like “Evasion,” “Poisoning,” “Model Stealing.”
  • Techniques: These are the specific methods used to achieve a tactic. Under the “Evasion” tactic, you’ll find techniques like “Adversarial Patch” or “Gradient-based Attacks.”

ATLAS is your well of creativity during a threat modeling session. When you’re looking at your model’s inference API and asking, “How could someone Tamper with this?” (STRIDE), you don’t have to guess. You can flip open the ATLAS playbook to the “Evasion” tactic and see a list of proven techniques. It turns “I don’t know” into “Which of these 10 nasty tricks should we worry about first?”

How STRIDE and ATLAS Work Together STRIDE: The “What” (The category of harm) Example: Tampering “An attacker wants to modify our system’s behavior by feeding it malicious data.” MITRE ATLAS: The “How” (The specific adversarial technique) Example: Adversarial Patch (T1490) “The attacker prints a specific sticker (the ‘patch’) and places it on a stop sign to cause a misclassification.” Connects

Running The Workshop: A Step-by-Step Guide

Alright, enough theory. Let’s get our hands dirty. Here’s how you actually run one of these sessions. This isn’t a one-person job. You need to get the right people in a room (virtual or physical) for a couple of hours. No distractions.

Step 0: The Prep Work

  1. Assemble the Fellowship: You need a diverse crew.
    • The Data Scientist/ML Engineer: They know the model, the data, and the assumptions it makes.
    • The Software Developer: They built the API, the data pipelines, and the surrounding application.
    • The DevOps/MLOps Engineer: They manage the infrastructure, the deployment process, and the monitoring.
    • The Product Owner: They know what the system is supposed to do and what would be a business disaster.
    • You (The Facilitator): Your job is to guide the conversation, ask dumb questions, and keep everyone focused.
  2. Get the Artifacts: You need a diagram of your system. Not a marketing slide, but a real, honest-to-god architecture diagram. It should show the entire lifecycle of your AI system. If you don’t have one, drawing it is your first step.

Your diagram should look something like this, showing the flow of data and the key components.

Example AI System Architecture Diagram Data Sources (e.g., S3, DBs) Preprocessing Training Loop Model Registry Deployment (CI/CD) Inference API User/Client App Monitoring (Logs, Metrics) Training Environment (High Trust) Serving Environment

Step 1: Deconstruct Your System

Put the diagram on the whiteboard. Walk through it, piece by piece. Don’t talk about threats yet. Just get everyone on the same page about what each component does and how data flows between them.

Now, draw the trust boundaries. A trust boundary is a line where data or execution control passes from a less trusted entity to a more trusted one (or vice-versa). The most obvious one is between the public internet and your backend. But there are others. Is the environment where you train your model separate from your production inference servers? That’s a boundary. Do you pull in data from third-party vendors? That’s a huge boundary.

Why is this so important? Because interesting attacks almost always happen when data crosses a trust boundary.

Step 2: Apply STRIDE to Every Component and Flow

This is where the systematic brainstorming begins. Pick a component, any component. Let’s start with the Inference API. Now, go through STRIDE, letter by letter, and ask the hard questions.

  • Spoofing: Can an unauthenticated user send requests? Can a user pretend to be another user? Can an attacker spoof the input data format to cause an error?
  • Tampering: This is the big one for inference. How can a user tamper with the input to get a desired, malicious output? (We’ll use ATLAS for this in a moment). Can they tamper with the model loaded by the API? (e.g., if it pulls from a model registry, can they compromise the registry?).
  • Repudiation: If the model gives a bizarre or harmful output, can we trace it back to the exact input, model version, and user? Are our logs good enough to prove what happened?
  • Information Disclosure: Can an attacker craft inputs that reveal something about the training data (model inversion)? Can error messages leak internal system details? Can they figure out the model’s architecture by sending specific queries?
  • Denial of Service: Can a user send a “hard” input that hogs all the GPU resources? What are the rate limits? Can a malformed input crash the API server?
  • Elevation of Privilege: Can a user bypass a safety filter or moderation layer by crafting a “jailbreak” prompt? Can they get the model to execute code or access internal APIs it shouldn’t?

Do this for every single box and every single arrow on your diagram. Yes, it’s tedious. It’s also where you’ll find the gold. The conversation that happens here is the most valuable part of the exercise.

Step 3: Supercharge Your Brainstorming with ATLAS

You’ll notice that during Step 2, you’ll hit a wall on some points. For “Tampering the input,” the team might say, “Uh, maybe they send a weird picture?”

This is when you pull out the monster manual.

Say, “Okay, let’s look at the Evasion tactic in ATLAS.” You bring it up on the screen. Suddenly, you have a vocabulary for the attack. You see techniques like:

  • Adversarial Patch (AML.T0001): “Could someone do this to our system? If our model identifies safety equipment on a factory floor, could an attacker create a sticker to put on a piece of faulty equipment that makes it look safe to our model?”
  • Adversarial Perturbation (AML.T0000): “This is adding invisible noise to an image. Could an attacker do this to bypass our malware detection model? They add a few pixels to the malware binary, it’s still malicious, but now our model says it’s benign.”
  • Poisoning against ML-based CTI (AML.T0027): “If our spam filter learns from user ‘report spam’ clicks, could an attacker create thousands of fake accounts to report their competitor’s legitimate emails as spam, effectively training our model to block them?”

ATLAS turns vague fears into concrete, testable scenarios. For each threat you identify, document it.

Golden Nugget: The goal is not to find a solution for every threat on the spot. The goal is to identify and document them. Create a “threat list” as you go. A simple table will do.

Threat ID Component STRIDE ATLAS Tactic/Technique Scenario
T-001 Inference API Tampering Evasion / Adversarial Patch An attacker puts a specific sticker on a competitor’s product, causing our quality control vision model to misclassify it as “defective.”
T-002 Data Ingestion Tampering Data Poisoning A compromised 3rd-party data source slowly injects mislabeled examples, creating a bias against a specific demographic in our loan approval model.
T-003 LLM Chat API Information Disclosure Model Inversion A user crafts prompts to make the model “remember” and reveal PII of other customers that was accidentally included in the training data.

Step 4: Prioritize and Plan

You will end up with a long list of scary-sounding threats. You cannot fix them all. If you try, you’ll fail. You need to prioritize.

For each threat, give it a rough rating. You can use a formal system like DREAD (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) or just a simple High/Medium/Low based on two questions:

  1. Likelihood: How likely is this to happen? Does it require a state-sponsored actor with a PhD in machine learning, or can a script kiddie do it?
  2. Impact: If this happens, how bad is it? Is it a minor annoyance, or does it bankrupt the company and put you on the front page of the New York Times?

Threats that are High-Likelihood and High-Impact are your dragons. Slay them first. Threats that are Low-Likelihood and Low-Impact are the goblins. You can probably live with them for a while.

The output of this step should be concrete. Don’t just say “Fix T-001.” Create actual tickets in your backlog: “Research and implement adversarial robustness training for the QC vision model,” or “Add structured logging to the inference API to trace every prediction back to its input,” or “Set up a differential privacy analysis pipeline for our training data.”

Now your threat model isn’t a dusty document. It’s a living part of your development process.

Real-World War Stories

This all sounds theoretical until you see it in the wild. Here are a few (anonymized) examples of things that have gone wrong.

The Helpful Chatbot That Became a Racist PR Nightmare

A company launched a Twitter chatbot that was supposed to learn from conversations. The STRIDE threat was Tampering with the model’s knowledge. The ATLAS technique was a form of Data Poisoning, executed in real-time. Trolls realized they could “train” the bot by bombarding it with hateful and offensive language. Within 24 hours, the friendly bot was spouting vile nonsense. The threat model would have asked: “What happens if the ‘data source’ (public tweets) is malicious? What are our guardrails?” They had none.

The Magic Fraud Detection Model That Leaked Customer Data

A fintech startup had a brilliant model for predicting fraudulent transactions. It was so good, they exposed it via a detailed API. An attacker, posing as a potential customer, started querying the API relentlessly. They weren’t trying to commit fraud. They were probing the model’s decision boundaries. The STRIDE threat was Information Disclosure.
The ATLAS technique was a blend of Model Extraction (figuring out how the model worked) and Model Inversion. By sending thousands of slightly different queries, they were able to reconstruct sensitive features the model had learned, like typical spending patterns for specific zip codes, which could be used to de-anonymize users. The threat model would have asked: “What can an attacker learn by just using our API as intended, but at a massive scale?”

The Simple Evasion That Bypassed a Million-Dollar Security System

An organization used a sophisticated AI model to detect malicious PDFs. It was trained on millions of samples. A red team was hired to test it. They didn’t use a complex exploit. They took a known malicious PDF and simply appended a huge, legitimate image file (like a high-res photo of a cat) to the end of it. The PDF still worked, but the overall file structure was now dominated by the benign image data. The AI, looking at the file as a whole, classified it as “benign image.” The STRIDE threat was Elevation of Privilege (the malicious file got past the filter).
The ATLAS technique was a simple form of Evasion. The threat model would have asked: “How does our model handle inputs that are a hybrid of malicious and benign content? Are we only looking at the file holistically?”

It’s a Mindset, Not a Checklist

If you’ve made it this far, you get it. AI threat modeling isn’t a one-time, check-the-box activity. It’s a cultural shift. It’s about fostering a healthy sense of paranoia. It’s about empowering your team to ask “what if?” early and often.

Your AI is not a magic black box. It’s a system. And like any system, it can be broken. The difference is that the ways it can break are new, subtle, and often counter-intuitive.

So grab that whiteboard. Schedule that meeting. Put your beautiful AI system on the operating table and start asking the uncomfortable questions. The monsters are already out there; it’s time you learned their names.