3.1.4 Prioritization Matrices

2025.10.06.
AI Security Blog

Your threat modeling and risk assessment efforts have produced exactly what you intended: a comprehensive list of potential vulnerabilities, attack vectors, and failure modes for your AI system. The list is long, perhaps dauntingly so. You have limited time, a finite budget, and a dedicated but not inexhaustible team. The critical question now is not “what could go wrong?” but “what should we test first?”

This is where prioritization matrices become an indispensable tool in your strategic planning toolkit. They transform a raw list of risks into an actionable, ranked roadmap for your red team engagement. A prioritization matrix is a visual decision-making tool used to rank options against a set of criteria, forcing you to make deliberate, defensible choices about where to focus your energy.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Classic Matrix: Impact vs. Likelihood

The most common and intuitive prioritization matrix plots risks along two axes: the potential Impact of a successful exploit and the Likelihood of that exploit occurring. By mapping each identified threat onto this grid, you can quickly categorize them and determine a course of action.

  • Impact: What is the severity of the damage if this vulnerability is exploited? This can range from minor reputational harm or incorrect outputs to catastrophic data breaches, model theft, or system-wide failure.
  • Likelihood: How probable is it that an attacker will attempt and succeed with this exploit? This considers factors like the required skill, available tools, and the attractiveness of the target.

This simple 2×2 grid creates four distinct quadrants, each suggesting a different strategic response.

Impact vs. Likelihood Prioritization Matrix Likelihood Impact Low High High Low CRITICAL Test Immediately MAJOR Plan & Monitor MINOR Test If Time Permits NEGLIGIBLE Accept Risk / De-scope

Beyond Two Dimensions: Multi-Factor Prioritization

While Impact and Likelihood are a great start, the nuances of AI systems often demand a more sophisticated approach. A highly impactful but low-likelihood attack might still be worth investigating if it’s incredibly easy to perform. Conversely, a high-likelihood attack might be deprioritized if it’s trivial to detect. You can create a more robust matrix by adding other relevant factors.

Common Factors for AI Red Teaming

  • Exploitability / Effort: How difficult is it for an attacker to develop and execute the attack? A simple prompt injection requires minimal effort, while a complex model inversion attack requires significant expertise and computational resources.
  • Detectability: How likely is it that current monitoring and defense systems (the “blue team”) would notice the attack? Evasive adversarial examples are designed to have low detectability, while a brute-force API attack is noisy and easily flagged.
  • Business Context: How does this threat align with key business objectives? A vulnerability that erodes user trust might be prioritized higher than one that merely increases operational costs, even if their technical impacts are similar.

By scoring each threat across these dimensions (e.g., on a scale of 1-5), you can calculate a final priority score. This transforms subjective discussion into a data-informed process.

Example Multi-Factor Prioritization Table
Threat Vector Impact (1-5) Likelihood (1-5) Exploitability (1=Hard, 5=Easy) Priority Score (I*L*E)
Jailbreak via role-playing prompt 4 (Policy bypass, harmful content) 5 (Widely known techniques) 5 (Requires only text input) 100
Targeted data poisoning (training set) 5 (Systemic bias, backdoors) 2 (Requires access to data pipeline) 2 (Complex, resource-intensive) 20
Model inversion to recover a single training record 3 (Privacy breach) 1 (Theoretically possible, rarely practical) 1 (Requires deep expertise, white-box access) 3

Structured Frameworks: The RICE Model

For even greater structure, you can adopt established prioritization frameworks like RICE, commonly used in product management but highly applicable to red teaming. RICE stands for Reach, Impact, Confidence, and Effort.

  • Reach: How many users or system components will be affected? (e.g., 100 users, 1 API endpoint, 10% of queries).
  • Impact: What is the effect on a single user/component? (Use a scale: 3 = massive, 2 = high, 1 = medium, 0.5 = low).
  • Confidence: How certain are you about your Reach and Impact estimates? (100% = high confidence, 80% = medium, 50% = low). This helps account for the speculative nature of some AI threats.
  • Effort: How much time will it take for your team to plan and execute this test? (Measured in “person-months” or “person-weeks”).

The RICE score is calculated with a simple formula that balances potential value against the cost of investigation.

# Pseudocode for calculating a RICE score

function calculate_rice_score(reach, impact, confidence, effort):
    # Ensure confidence is a percentage (e.g., 80% -> 0.8)
    confidence_factor = confidence / 100.0

    # Avoid division by zero for effort
    if effort == 0:
        return 0

    # The core RICE formula
    score = (reach * impact * confidence_factor) / effort
    return score

# Example:
# Threat: Evasive adversarial patch on a public-facing image classifier
reach = 5000 # users per day
impact = 2 # high impact (misclassification of critical items)
confidence = 80 # medium confidence in estimates
effort = 1.5 # person-weeks to develop and test

priority_score = calculate_rice_score(reach, impact, confidence, effort)
# priority_score would be (5000 * 2 * 0.8) / 1.5 = 5333
                

Using a framework like RICE provides a consistent, repeatable method for prioritization. It ensures that your team’s valuable time is spent on the threats that represent the most significant and plausible risk to the AI system, setting the stage for focused and efficient testing.