24.5.1 Incident Response Playbook

2025.10.06.
AI Security Blog

Purpose: This document provides a structured, actionable playbook for responding to security incidents involving AI and machine learning systems. It is designed to be a template; you must adapt it to your organization’s specific models, infrastructure, and risk tolerance. An effective response minimizes damage, restores service, and prevents recurrence.

The AI Incident Response Lifecycle

When an AI system is compromised, your response should follow a structured lifecycle. This ensures all critical steps are taken, from initial detection to post-incident analysis. While the phases are sequential, you may need to cycle back to earlier stages as new information emerges.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Preparation Identification Containment Eradication Recovery Lessons Learned

Phase 1: Preparation

Success during an incident is determined long before it occurs. Your preparation phase should focus on establishing the people, processes, and technology needed to respond effectively.

  • Define Roles & Responsibilities: Establish a clear command structure. Who is the Incident Commander? Who handles technical analysis versus communication? (See table below).
  • Establish Communication Channels: Set up a dedicated, out-of-band communication channel (e.g., a specific Slack channel, Microsoft Teams group) for the incident response team.
  • Maintain Model & Data Inventory: Keep an up-to-date inventory of all production models, their versions, training datasets, and critical dependencies.
  • Enable Comprehensive Logging: Ensure you are logging model inputs (prompts), outputs (inferences), confidence scores, and any intermediate data transformations. These are your primary evidence sources.
  • Develop Baseline Performance Metrics: Know what “normal” looks like for your model’s accuracy, latency, and output distribution. Deviations are key indicators of an incident.

Phase 2: Identification

This phase begins when an anomalous event is detected. The goal is to quickly determine if a security incident has occurred, its scope, and its potential impact.

Common AI Incident Indicators

Incident Type Key Indicators Primary Data Source
Prompt Injection / Jailbreaking Sudden appearance of unexpected, harmful, or policy-violating outputs. Detection of specific keywords or patterns designed to bypass safeguards. Inference logs (inputs and outputs).
Model Evasion (Adversarial Examples) Significant, unexplained drop in model accuracy or performance on specific subclasses of data. High confidence scores for incorrect predictions. Model performance monitoring dashboards, inference logs.
Data Poisoning (Training Time) Model begins exhibiting biased or targeted malicious behavior. Emergence of backdoors (specific triggers causing misclassification). Gradual performance degradation. Retrospective data quality checks, model behavior testing, training pipeline logs.
Model Theft / Extraction Anomalous patterns of API queries from a single source (e.g., high volume, systematic input variations). Unusually high computational resource usage. API gateway logs, cloud infrastructure monitoring.

Phase 3: Containment

Once an incident is confirmed, your immediate priority is to prevent further damage. Containment strategies for AI systems are unique and focus on isolating the compromised model or data pipeline.

  1. Isolate the Affected Model Endpoint: Use an API gateway or load balancer to route traffic away from the compromised model instance. If necessary, take the entire service offline.
  2. Revert to a Known-Good State: Immediately deploy a previous, validated model version from your model registry. This is a critical first step to restore service integrity, even if it’s temporary.
  3. Implement Emergency Input Filtering: If the attack involves specific prompts or inputs, deploy a temporary, strict filter at the application layer or API gateway to block the malicious patterns.
  4. Block Attacker IP/Accounts: Identify and block the source of the malicious activity at the network or application level.
  5. Preserve Evidence: Take snapshots of affected systems, databases, and logs. Do not delete or modify compromised data until it has been securely backed up for forensic analysis.

Phase 4: Eradication

With the immediate threat contained, the focus shifts to finding and eliminating the root cause of the incident.

  • Identify the Vulnerability: Was it a weakness in the prompt sanitization logic? A lack of validation in the data ingestion pipeline? A compromised MLOps credential?
  • Cleanse Corrupted Data: If data poisoning occurred, you must identify and remove the malicious samples from your training and validation sets. This may require significant manual review or automated anomaly detection.
  • Retrain or Fine-Tune the Model: Using the cleansed dataset, retrain the model. Consider adversarial training by including examples of the attack technique to build resilience.
  • Patch System Vulnerabilities: Address any underlying software or infrastructure vulnerabilities that enabled the attack.
# Pseudocode for a simple data cleansing script
def scan_for_poisoned_data(dataset_path, poison_signature):
    clean_records = []
    quarantined_records = []

    for record in load_dataset(dataset_path):
        # poison_signature could be a keyword, pattern, or a detector model
        if detect_signature(record['text'], poison_signature):
            log_and_quarantine(record)
            quarantined_records.append(record)
        else:
            clean_records.append(record)
    
    save_dataset(clean_records, "clean_dataset.json")
    save_dataset(quarantined_records, "quarantined_data.json")
    print(f"Scanned dataset. Found {len(quarantined_records)} suspicious records.")

Phase 5: Recovery

This phase involves carefully restoring systems to normal operation and validating that the threat has been eliminated.

  • Deploy the Patched Model: Promote the newly trained and validated model to a staging environment for final testing.
  • Phased Rollout: Deploy the model to production gradually (e.g., to 1% of traffic, then 10%, etc.). Closely monitor performance metrics and logs for any signs of recurrence.
  • Full Restoration: Once you are confident the model is stable and secure, restore full traffic.
  • Continuous Monitoring: Implement enhanced monitoring specifically for the attack vector that was used. For example, if it was a prompt injection, create alerts for the specific patterns observed.

Phase 6: Lessons Learned

Every incident is an opportunity to improve. Conduct a blameless post-mortem within one week of the incident’s resolution.

  • Document the Timeline: Create a detailed timeline of events from detection to resolution.
  • Analyze the Root Cause: What were the technical and procedural failures that allowed the incident to occur?
  • Evaluate the Response: What went well? What could have been done better? Was the playbook followed?
  • Update Defenses: Implement permanent improvements to your security controls, monitoring, and this playbook itself.
  • Update Red Team Scenarios: Incorporate the attack technique into your continuous red teaming exercises to ensure the fix is robust and to test for similar vulnerabilities.

AI Incident Response Team Roles

Role Primary Responsibilities
Incident Commander (IC) Overall leader of the response effort. Manages resources, makes critical decisions, and serves as the primary point of contact. Not necessarily the most technical person.
ML Engineer / MLOps Specialist Leads technical containment and eradication. Manages model rollbacks, deployments, MLOps pipeline security, and infrastructure changes.
Data Scientist / ML Researcher Analyzes model behavior and outputs to identify the attack. Leads data cleansing and model retraining efforts. Assesses the impact on model performance.
Security Analyst Conducts forensic analysis of logs (API, network, system). Identifies attacker indicators of compromise (IoCs). Manages blocking of malicious actors.
Communications Lead Manages all internal and external communications according to the communication plan. Ensures stakeholders are informed without causing panic.
Legal / Compliance Officer Assesses legal, regulatory, and ethical impact. Determines if data breach notification is required. Advises on evidence handling.