6.2.2 Red Team orchestration

2025.10.06.
AI Security Blog

Moving beyond single, manually crafted prompts is the hallmark of a systematic security assessment. Orchestration is the mechanism that transforms your red teaming efforts from isolated experiments into a coordinated, scalable campaign. With PyRIT, orchestration isn’t just about automation; it’s about intelligently managing the entire lifecycle of an attack, from generation to scoring and analysis.

The Engine of PyRIT: The RedTeamOrchestrator

At the heart of PyRIT’s operational capability is the RedTeamOrchestrator. Think of it as the conductor of an orchestra. It doesn’t play any single instrument, but it directs all the components to work in harmony to produce a result—in this case, a comprehensive security evaluation. The orchestrator connects your attack strategies to your target systems, manages the flow of data, and leverages scorers to interpret the outcomes.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

To understand orchestration, you first need to grasp its core components. These are the building blocks you assemble to create a red teaming run.

Key Orchestration Components

An orchestration process in PyRIT is built upon four primary pillars. Each plays a distinct and vital role in the automated testing workflow.

Component Role in Orchestration Example
Target The AI system under test. The orchestrator needs to know where to send the generated prompts. An Azure OpenAI chat endpoint, a local model served via an API, or any other LLM-based system.
Attack Strategy The logic for generating adversarial prompts. This is the “how” of the attack, defining the technique used. A strategy to create prompts that try to jailbreak the model by asking it to role-play as an unrestricted AI.
Scorer An automated system to evaluate the target’s responses. Scorers determine if an attack was successful. A classification model that flags responses containing harmful content, or a simple string-matching scorer looking for keywords like “I cannot fulfill this request.”
Memory A persistent database that stores all interactions: prompts sent, responses received, and scores assigned. An SQLite database on your local machine that logs every conversation turn for later analysis.

Visualizing the Orchestration Workflow

The interaction between these components follows a logical, cyclical process. The orchestrator manages this flow, ensuring each step feeds into the next. This automated loop allows you to test at a scale that is impossible to achieve manually.

Attack Strategy (e.g., Jailbreak prompts) Target System (Your AI Application) Scorer (Evaluates response) Memory (Logs results) Sends Prompts Sends Response Stores Score

As the diagram shows, the orchestrator begins with an attack strategy, uses it to generate prompts for the target, and then passes the target’s response to a scorer. The entire transaction—prompt, response, and score—is then logged to memory. This cycle can be repeated for thousands of prompts across multiple strategies.

Orchestration in Practice: A Code Example

Let’s translate this theory into a practical example. The following Python snippet demonstrates how to set up and run a basic orchestration with PyRIT. This assumes you have already configured your target endpoint as discussed in the previous chapter.

# Import necessary PyRIT components
from pyrit.orchestrator import RedTeamOrchestrator
from pyrit.prompt_target import AzureOpenAIChatTarget
from pyrit.attack_strategy import GandalfStrategy
from pyrit.memory import DuckDBMemory

# 1. Set up your target AI system
# Assumes environment variables for Azure OpenAI are set
chat_target = AzureOpenAIChatTarget()

# 2. Initialize memory to store the results
memory = DuckDBMemory()

# 3. Initialize the orchestrator with your target and memory
# Set a prompt converter if needed for the target's format
orchestrator = RedTeamOrchestrator(
    prompt_target=chat_target,
    memory=memory
)

# 4. Choose and apply an attack strategy
# Gandalf is a built-in strategy for eliciting secrets
gandalf_strategy = GandalfStrategy(level=1)
orchestrator.apply_attack_strategy(strategy=gandalf_strategy)

# After running, you can query the memory to see the results
print("Orchestration complete. Check memory for results.")

In this simple script, you define the four core components: the `AzureOpenAIChatTarget` is your target, `GandalfStrategy` is your attack strategy, `DuckDBMemory` is your memory, and `RedTeamOrchestrator` ties it all together. Calling `apply_attack_strategy` kicks off the entire workflow illustrated in the diagram above.

The Strategic Advantage of Orchestration

Why is this orchestrated approach so critical for effective AI red teaming? The benefits extend far beyond simple automation.

  • Scalability: You can execute thousands of tests across dozens of strategies in the time it would take a human to perform a handful of manual tests. This is essential for discovering non-obvious, “long-tail” vulnerabilities.
  • Reproducibility: By defining your tests in code, you create a repeatable process. This is crucial for verifying bug fixes and tracking the security posture of your AI system over time.
  • Efficiency: Orchestration automates the repetitive tasks of prompt generation, sending, and initial scoring. This frees up your human red teamers to focus on higher-value activities like analyzing results, designing novel attack strategies, and understanding the root causes of failures.
  • Systematic Coverage: Instead of relying on ad-hoc, intuitive testing, orchestration allows you to systematically apply a broad portfolio of attack strategies, ensuring more comprehensive coverage of potential vulnerability classes.

Key Takeaway: Red team orchestration elevates your security testing from a series of manual probes to a structured, data-driven, and scalable engineering discipline. It is the foundation for building a robust and continuous AI security validation program.