6.5.1 Functionality Matrix

2025.10.06.
AI Security Blog

Your team has been tasked with a critical red team engagement against a new, proprietary Large Language Model (LLM) powering a customer service application. The primary concerns are data exfiltration and the potential for the model to be jailbroken into generating harmful content. You have a dozen open-source tools, a few commercial platforms, and the option to build custom scripts. Where do you even begin? Wasting time on a tool that can’t perform the specific attacks you need is not an option.

This scenario highlights a fundamental challenge in AI red teaming: selecting the right tool for the job. The landscape is crowded and fast-moving. A tool that excels at generating adversarial patches for image classifiers may be useless against an LLM. This is where a Functionality Matrix becomes an indispensable strategic asset. It moves tool selection from a gut-feeling exercise to a data-driven decision.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

What is a Functionality Matrix?

A functionality matrix is a structured comparison chart that maps your candidate tools against the specific capabilities required for an engagement. At its core, it’s a table where rows represent necessary functions (like specific attack types, model compatibility, or reporting features) and columns represent the tools you are evaluating. The cells then indicate the level of support each tool offers for that function.

This isn’t just a feature checklist. It’s a method for aligning your toolkit with your threat model. By building this matrix before an operation, you force yourself to define your objectives clearly and evaluate tools based on their ability to meet those objectives, rather than on popularity or hype.

Constructing an Effective Matrix

The value of your matrix depends entirely on the relevance of its axes. A generic matrix is moderately useful; one tailored to your specific target and objectives is a powerful decision-making engine.

Step 1: Define Your Functional Requirements (The Rows)

Your rows should reflect the entire red teaming lifecycle for the target system. Don’t just list attacks; consider the entire workflow. Group them logically for clarity.

  • Attack Classes (LLM-focused): Prompt Injection (Direct, Indirect), Jailbreaking (Role-playing, Obfuscation), Data Leakage/Exfiltration, Denial of Service (Resource consumption), Hallucination Generation.
  • Attack Classes (ML General): Evasion (e.g., FGSM, PGD), Poisoning (Data or Model), Membership Inference, Model Extraction/Stealing.
  • Model & Framework Support: Compatibility with PyTorch, TensorFlow, JAX, ONNX. Support for API-based models (e.g., OpenAI, Anthropic) versus white-box access.
  • Operational Features: Automated scanning, result logging, report generation, integration with CI/CD pipelines, collaboration features for teams.
  • Defensive Evaluation: Ability to test defenses like input filters, output scanners, or adversarial training robustness.

Step 2: List Your Candidate Tools (The Columns)

Your columns should include all viable options. Be broad in your initial selection. This includes:

  • Major open-source libraries (e.g., ART (IBM), CleverHans, TextAttack).
  • Specialized tools (e.g., Garak for LLM vulnerability scanning).
  • Commercial platforms (e.g., HiddenLayer, Cranium, CalypsoAI).
  • In-house scripts or frameworks.

Step 3: Populate the Matrix

This is the research phase. For each tool and function, determine the level of support. A simple checkmark is often insufficient. Use a clear legend to capture nuance.

Example Functionality Matrix: LLM Red Team Tool Selection
Functionality / Capability Tool A (General ML) Tool B (LLM Specific) Custom Python Scripts
Core Attacks
Prompt Injection 〰️ ✔️ ✔️
Jailbreaking Payloads ✔️ 〰️
Membership Inference ✔️ 〰️
Model Support
White-box (PyTorch) ✔️ ✔️
Black-box (API) 〰️ ✔️ ✔️
Operational Features
Automated Reporting ✔️ 〰️
Legend:
✔️ Native Support
〰️ Partial / Requires Customization
No Support

Interpreting the Matrix: Beyond the Checkmarks

The matrix provides a high-level view, but the decision requires deeper analysis. A checkmark for “Prompt Injection” doesn’t tell the whole story.

Qualitative Assessment

  • Depth vs. Breadth: Does the tool offer one-click execution of a dozen attacks, or does it provide a flexible framework for crafting one sophisticated, novel attack? Tool B in our example is deep on LLM attacks, while Tool A is broad across general ML.
  • Ease of Use: How quickly can a new team member become effective with the tool? A powerful framework that requires a week of setup might be less valuable than a simpler tool that delivers results in an hour.
  • Extensibility: Your red team will inevitably develop new techniques. How easy is it to add a custom attack module to the tool? Relying on custom scripts offers maximum flexibility but requires more development effort.

Scenario-Based Decision Making

Let’s return to our initial scenario. The goal is to test a customer service LLM for data exfiltration and harmful content generation. Looking at the matrix, Tool B is the obvious primary choice. It has native support for the exact attack vectors we’re concerned about (Prompt Injection, Jailbreaking) and is designed for API-based models. While our custom scripts can also do this, Tool B likely has pre-built payloads and logging, saving significant time. Tool A, despite its strengths in other areas, is a poor fit for this specific task.

The matrix prevents you from choosing Tool A “because it’s a comprehensive library” and directs you to the most efficient tool for the mission at hand.

# A simple Python dict can represent a matrix for planning
tool_profiles = {
    "Tool B (LLM Specific)": {
        "primary_targets": ["LLM", "Generative AI"],
        "attacks": ["prompt_injection", "jailbreak", "data_leakage"],
        "access_level": ["black-box", "api"],
        "notes": "Excellent for out-of-the-box LLM scanning."
    },
    "Custom Python Scripts": {
        "primary_targets": ["any"],
        "attacks": ["custom", "highly_specific_injection"],
        "access_level": ["white-box", "black-box"],
        "notes": "Maximum flexibility, high development overhead."
    }
}

def recommend_tool(threat_model, profiles):
    # Logic to match threat model (e.g., ['LLM', 'prompt_injection']) to a tool
    if "LLM" in threat_model and "prompt_injection" in threat_model:
        return "Tool B is a strong candidate for its specialized features."
    return "Custom scripting may be required for this unique threat."

print(recommend_tool(["LLM", "prompt_injection"], tool_profiles))

Ultimately, the functionality matrix is a living document. As you discover new tools and techniques, and as your target systems evolve, you should update it. It is the foundational step in building a deliberate, efficient, and effective AI red teaming toolkit.