24.1.4 Escalation Process

2025.10.06.
AI Security Blog

An AI Red Team engagement operates on the principle of controlled, authorized testing. However, the discovery of a critical vulnerability, an unexpected system failure, or a significant operational disruption requires a clear, pre-defined process for immediate action. This escalation process ensures that critical findings are communicated to the right people at the right time, minimizing potential harm and enabling a swift response.

The Purpose of an Escalation Framework

A formal escalation process moves beyond standard reporting channels defined in the communication protocol. It is invoked when a situation demands attention that exceeds the routine authority or capacity of the primary points of contact. The primary goals are to:

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • Ensure Timely Mitigation: Prevent or limit damage from a severe vulnerability or system instability.
  • Maintain Operational Stability: Address issues that threaten the availability or integrity of the target system or connected production environments.
  • Provide Situational Awareness: Inform key stakeholders (e.g., system owners, security leadership, legal counsel) about high-risk discoveries.
  • Establish Clear Accountability: Define who is responsible for declaring an escalation, who must respond, and within what timeframe.

Escalation Levels and Triggers

This framework defines four distinct levels of escalation. Each level corresponds to a specific severity of finding and dictates the required communication and response actions. Your team should adapt these definitions based on the specific context of the engagement.

Level Severity Trigger Examples Primary Action
L1: Advisory Low-to-Medium
  • Discovery of a non-critical vulnerability with a complex exploit path.
  • Identification of a potential, but unconfirmed, data privacy issue.
  • A finding that requires clarification from the Blue Team to assess impact.
Notify primary Blue Team contact via standard reporting channels with a “High Priority” flag. Follow up during next scheduled sync.
L2: Operational Impact High
  • An action that causes significant, but reversible, performance degradation of the target system.
  • Disruption of a non-production, but critical, support service (e.g., logging).
  • Locking out a test account or consuming all available test resources.
Immediate notification to the primary Blue Team contact and Engineering Lead via a dedicated, monitored channel (e.g., a shared Slack/Teams channel). Acknowledgment required within 1 hour.
L3: Critical Vulnerability Critical
  • Unauthenticated access to sensitive data or model parameters.
  • A reliable method to cause persistent model denial of service.
  • Confirmed leakage of Personally Identifiable Information (PII) or confidential corporate data.
  • A vulnerability that allows for widespread system compromise.
Immediate phone call to the primary Blue Team contact and the designated Security Incident Response lead. Formal written report to follow within 4 hours.
L4: System Integrity / Safety Event Emergency
  • An action that impacts a live production system.
  • Discovery of a vulnerability that poses a physical safety risk (e.g., in an AI controlling robotics).
  • Uncontrolled, self-propagating model behavior.
  • Evidence of a prior, un-remediated breach by another actor.
Immediate “all-stop” on testing related to the finding. Immediate phone call to the project sponsor, CISO, and/or pre-defined emergency contact list. Activate the organization’s formal Incident Response (IR) plan.

Escalation Decision Flow

Every Red Team member must be able to quickly assess a finding and determine the appropriate course of action. The following diagram illustrates the standard decision-making process.

Finding Discovered Assess Impact & Severity Does finding pose immediate threat to production, safety, or data? No Standard Reporting Yes Is it an L4 Emergency? No Escalate via L2/L3 Yes Escalate via L4 Issue Acknowledged

Example Escalation Scenarios

Scenario 1: Persistent Model Degradation

Finding: A Red Team operator discovers a specific class of prompts that causes a generative text model to enter a recursive loop, consuming 100% of its allocated GPU resources and failing to respond to any other requests. Rebooting the model instance clears the state, but the prompt reliably triggers the failure again.

Action: This constitutes an L2: Operational Impact. It’s a reliable denial of service for the target system. The operator immediately ceases testing this vector and notifies the designated Engineering Lead via the shared emergency channel, providing the trigger prompt and system logs. The goal is to give the engineering team the information needed to patch the vulnerability without causing further disruption.

Scenario 2: Unfiltered PII in Model Output

Finding: While testing for prompt injection, an operator successfully coaxes the model to reveal snippets of its training data. Within this data, the operator finds a clear text block containing what appears to be a real customer’s name, email address, and support ticket history.

Action: This is a L3: Critical Vulnerability. It’s a confirmed data leak with significant privacy and compliance implications. The operator immediately documents the finding with screenshots, stops testing, and makes a phone call to the primary security contact as per the engagement plan. This allows the security team to begin their incident response and data privacy assessment process immediately, which may include legal and compliance stakeholders.

De-escalation and Post-Mortem

An escalated issue is only considered resolved after the responding team confirms mitigation and the Red Team verifies the fix. Once resolved, the issue is formally “de-escalated” and moves back into the standard reporting workflow for final documentation. For all L3 and L4 escalations, a post-mortem meeting should be strongly recommended to analyze the root cause, the effectiveness of the response, and any potential improvements to the system or the escalation process itself.