3.1.1 Objective and Scope Definition

2025.10.06.
AI Security Blog

An AI red team engagement without a clearly defined objective and scope is an exercise in futility. It’s the equivalent of setting sail without a map or a destination—you might find something interesting by chance, but you’re more likely to waste resources, drift into irrelevant areas, and fail to deliver meaningful value. This initial strategic step is the bedrock upon which the entire operation is built. Get it right, and you create a focused, efficient, and impactful assessment. Get it wrong, and you risk a chaotic engagement that produces ambiguous results and frustrates stakeholders.

This process boils down to answering two fundamental questions: “What are we trying to achieve?” (the objective) and “What are the rules of the game?” (the scope). Together, they form a charter for the engagement, aligning the red team with the system owners and ensuring everyone understands the mission’s purpose and boundaries.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Defining the Objective: The “Why” of the Engagement

The objective is the high-level goal. It’s the strategic “why” behind the red teaming effort. A well-crafted objective is specific, measurable, and tied to a potential business or security risk. It moves beyond a vague desire to “find vulnerabilities” and instead focuses on testing a specific hypothesis or validating a particular security concern.

Common objectives for an AI red team engagement include:

  • Assess Resilience to Evasion: Determine if the system’s safety filters and classifiers can be bypassed by adversarial inputs to generate prohibited content (e.g., hate speech, misinformation, malicious code).
  • Test for Data Extraction Vulnerabilities: Evaluate whether an attacker can extract sensitive information from the model’s training data, such as personally identifiable information (PII) or proprietary intellectual property.
  • Validate Alignment and Instruction Following: Probe the model for instruction-following exploits, jailbreaks, or logical overrides that cause it to violate its core operational principles or safety policies.
  • Identify Potential for Misuse and Abuse: Explore how the AI system could be weaponized by a specific threat actor for a malicious purpose, such as generating convincing phishing emails at scale or creating disinformation campaigns.
  • Quantify Bias and Fairness: Measure the model’s performance and output disparities across different demographic groups to identify and quantify harmful biases.

Success Criteria: Defining “Done”

An objective is incomplete without clear success criteria. How will you know when you have achieved your goal? Success isn’t just about finding a single flaw. It’s about demonstrating a repeatable and impactful outcome. For example, if the objective is to assess resilience to evasion, a success criterion might be: “Demonstrate three distinct and repeatable prompt injection techniques that successfully bypass the content filter to generate malicious code.” This makes the outcome tangible and undeniable.

Defining the Scope: Setting the Boundaries

If the objective is the destination, the scope is the map that defines the approved routes and restricted areas. Scoping is a critical de-risking activity. It prevents scope creep, protects out-of-bounds systems from accidental damage, and focuses the team’s finite time and energy on the targets that matter most. A comprehensive scope defines the technical, temporal, and procedural limits of the engagement.

Target Systems

This component specifies exactly what is and is not to be tested. Precision is vital. It’s not enough to say “the chatbot.” You must specify versions, environments, and interfaces.

  • In-Scope: The specific model endpoints (e.g., api.example.com/v2/chat-prod), user-facing applications, and associated data pipelines that are fair game.
  • Out-of-Scope: Systems that are explicitly forbidden. This often includes corporate infrastructure (e.g., authentication services, SSO), third-party services, or underlying cloud provider infrastructure that is not part of the AI system itself.

Permitted Techniques and Methodologies

This outlines the “rules of engagement.” It clarifies what tactics are allowed. A white-box assessment on a staging server will have far fewer restrictions than a black-box test against a production system.

  • Allowed: Adversarial prompt crafting, API fuzzing, model parameter manipulation (if white-box), analysis of output logits.
  • Prohibited: Denial-of-Service (DoS) attacks, social engineering of employees, any action that could intentionally degrade service for legitimate users, accessing other customers’ data.

Timeline and Access Level

This sets the temporal boundaries and the team’s level of knowledge.

  • Duration: A fixed start and end date/time for testing activities (e.g., “From Monday, Nov 4th, 09:00 UTC to Friday, Nov 15th, 17:00 UTC”). This may include “blackout periods” where no testing is allowed.
  • Access Level: Specifies the team’s starting position.
    • Black-box: No prior knowledge of the internal system. The team interacts with it as an external user would.
    • Grey-box: Partial knowledge. The team may have user credentials, API documentation, or high-level architectural diagrams.
    • White-box: Full knowledge and access, including source code, model weights, training data details, and infrastructure access.

The Objective & Scope Statement

These elements should be formalized into a single document—the Objective & Scope Statement. This document serves as a contract and a source of truth for the engagement, ensuring alignment between the red team and all stakeholders. It should be reviewed and signed off before any testing begins.

Example Objective & Scope Statement
Section Description
Objective Determine the model’s susceptibility to indirect prompt injection that could lead to unauthorized data exfiltration.
Success Criteria Successfully cause the model to leak user session data from a different context by manipulating an ingested document. The attack must be repeatable.
In-Scope Targets – Application: `doc-analyzer.example.com` (Production)
– API Endpoint: `api.example.com/v1/summarize_document`
Out-of-Scope Targets – User authentication systems (SSO)
– Underlying cloud storage buckets
– Any other API endpoint
Access Level Grey-box: Standard authenticated user account and API documentation will be provided.
Permitted Techniques – Adversarial content generation within uploaded documents (PDF, DOCX).
– API interaction analysis.
– Prompt probing to reveal system prompts.
Prohibited Techniques – Denial of Service (DoS) or other resource exhaustion attacks.
– Attempts to pivot from the application server to the internal network.
– Social engineering.
Engagement Window Start: 2024-11-18 09:00 UTC
End: 2024-11-22 17:00 UTC
Escalation Contact For critical findings or system instability: Jane Doe, Head of AI Security, via dedicated Slack channel.

This structured approach transforms a potentially ambiguous security test into a precise, controlled experiment. The clear definitions established here are not mere formalities; they are the essential inputs for the subsequent phases of threat modeling and risk assessment, ensuring the entire red team operation is targeted, relevant, and safe.