28.1.4 Triage and prioritization

2025.10.06.
AI Security Blog

What is Triage? In the context of a bug bounty program, triage is the crucial first step after receiving a submission. It is the process of validating, assessing, and categorizing incoming reports to determine their severity and urgency. An effective triage process ensures that critical vulnerabilities are addressed swiftly while managing the overall flow of submissions efficiently.

A flood of submissions is a sign of a healthy bug bounty program, but without a systematic way to sort the signal from the noise, your security team can quickly become overwhelmed. Triage isn’t just an administrative task; it’s your program’s first line of defense against chaos and your first opportunity to build a positive relationship with a security researcher.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

The Core Triage Workflow

A robust triage process can be broken down into three fundamental stages. While the details may vary, these steps provide a universal framework for handling any submission, from a minor output formatting issue to a critical model compromise.

Step 1: Validation and Deduplication

Before you invest time in deep analysis, you must perform a quick check on the submission’s eligibility. Ask yourself these questions:

  • Is it in scope? Does the reported issue fall within the assets and vulnerability types defined in your program policy? Submissions targeting out-of-scope models or reporting on known limitations you’ve explicitly excluded can be closed quickly.
  • Is it a duplicate? Has another researcher already reported this exact vulnerability? Most bug bounty platforms have built-in features to help identify duplicates. Responding promptly that a report is a duplicate manages researcher expectations.
  • Is there enough information? Does the report contain the necessary details to understand and reproduce the issue? A submission without a clear proof-of-concept (PoC), prompt, or explanation of the impact may need to be sent back to the researcher for more information.

Step 2: Reproduction

A vulnerability that cannot be reproduced is, for all practical purposes, not a vulnerability. Your team must independently verify the researcher’s findings. This is where a high-quality submission, as described in the previous chapter, becomes invaluable. A good PoC allows your team to confirm the behavior quickly. For AI systems, especially stochastic ones, reproduction can be tricky. You may need to run a prompt multiple times or ask the researcher for additional context, such as session information or the specific model version they were interacting with.

Step 3: Severity Assessment

Once validated and reproduced, you must determine the vulnerability’s impact. This is arguably the most critical part of triage, as it dictates the priority of the fix and the size of the bounty payout. While traditional software uses frameworks like CVSS, AI vulnerabilities often require a more nuanced approach that considers harm beyond technical exploitation.

AI Vulnerability Severity Matrix Example
Severity Impact on AI System Example Typical Response SLA
Critical Complete model control, sensitive data exfiltration (e.g., training data), persistent system manipulation, or large-scale harmful output generation. A prompt injection that allows an attacker to access and leak proprietary system prompts and user data from other sessions. Acknowledge: < 2 hours
Fix: < 24 hours
High Significant model manipulation, reliable generation of highly dangerous or illegal content, user session hijacking, or denial of service. A reliable jailbreak that consistently bypasses safety filters to produce detailed instructions for creating weapons. Acknowledge: < 12 hours
Fix: < 7 days
Medium Moderate model manipulation, inconsistent bypass of safety filters, generation of biased or moderately harmful content, or information leakage of non-sensitive data. A prompt that can occasionally trick the model into revealing its internal version number and high-level architecture. Acknowledge: < 24 hours
Fix: < 30 days
Low Minor unexpected behavior, low-impact content generation, or purely theoretical issues with no practical exploit. The model hallucinates a plausible but incorrect API endpoint for a fictional service. Acknowledge: < 3 days
Fix: Discretionary

Prioritization: From Severity to Action

Severity tells you how bad a vulnerability is. Prioritization tells you what to fix first. While high-severity issues are almost always high-priority, other factors come into play, such as the effort required for a fix. A simple impact/effort matrix can help you make strategic decisions, ensuring you’re not just fighting the biggest fires but also quickly extinguishing the easiest ones.

Prioritization Matrix based on Impact and Effort High Impact Low Impact Low Effort High Effort Quick Wins High Impact, Low Effort DO FIRST Major Projects High Impact, High Effort SCHEDULE Fill-ins Low Impact, Low Effort DO WHEN POSSIBLE Re-evaluate Low Impact, High Effort AVOID / DEFER

Unique Challenges in Triaging AI Vulnerabilities

Triaging AI model vulnerabilities introduces complexities not typically found in traditional software security.

  • Subjectivity of Harm: A biased output or offensive generation is clearly a vulnerability, but its severity can be highly subjective. Triage for these issues often requires a cross-functional team including members from legal, policy, and ethics departments, not just engineering.
  • The Stochasticity Problem: Due to the probabilistic nature of many models, a harmful output may not be 100% reproducible. Your triage process must account for this. Instead of a binary “reproducible/not reproducible” state, you might use a “reproducible with X% probability” assessment.
  • Systemic vs. Discrete Flaws: A prompt injection vulnerability isn’t a single line of bad code; it’s an emergent property of the model’s architecture. Fixing it may involve fine-tuning, data filtering, or architectural changes rather than a simple patch. This reality must be factored into the “effort” side of your prioritization matrix.

Automating Initial Triage

As your program scales, manual triage of every submission becomes untenable. You can use automation to handle the initial validation and categorization, freeing up your human experts for the more nuanced work of assessment and reproduction.

A simple script can parse submissions for keywords, check against a database of known duplicates, and assign a preliminary score to help route the report to the correct team.

# Pseudocode for an initial triage scoring script
function calculate_initial_priority(report):
    score = 0
    
    # Boost score based on keywords indicating high impact
    high_impact_keywords = ["PII", "data exfiltration", "RCE", "system prompt"]
    if any(keyword in report.description.lower() for keyword in high_impact_keywords):
        score += 50

    # Add points for common AI vulnerability types
    if report.type == "Prompt Injection":
        score += 20
    elif report.type == "Model Evasion":
        score += 15

    # Add points if a clear, reproducible PoC is provided
    if report.has_poc and report.is_reproducible:
        score += 25
    
    if score > 70:
        report.priority = "CRITICAL"
        alert_on_call_engineer()
    elif score > 40:
        report.priority = "HIGH"
    else:
        report.priority = "MEDIUM"

    return report.priority

Ultimately, triage is a human process augmented by tools. It requires technical acumen to understand the exploit, empathy to communicate effectively with researchers, and strategic thinking to prioritize fixes that best protect your users and your system. A well-oiled triage machine is the heart of a successful and scalable bug bounty program.