28.1.2 Reward structure design

2025.10.06.
AI Security Blog

A reward structure is more than a price list for vulnerabilities; it’s a strategic communication tool. It signals to the security research community which risks you prioritize, what assets are most critical, and the level of creativity you’re willing to reward. A poorly designed structure attracts low-effort reports, while a thoughtful one channels world-class talent toward hardening your most complex AI systems.

Guiding Principles for AI Bounty Rewards

Before defining specific payout amounts, you must establish the principles that govern your reward philosophy. For AI systems, these principles differ significantly from those for traditional software due to the novel and often probabilistic nature of the vulnerabilities.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • Align with Business Impact: The reward must be proportional to the potential harm. A prompt injection that leaks the system prompt is a concern, but one that extracts proprietary training data or bypasses critical safety controls for regulated industries is a crisis. Your reward structure must reflect this difference clearly.
  • Incentivize Novelty: The field of adversarial AI is evolving rapidly. Your program should explicitly reward novel attack techniques over repeated submissions of known issues. A higher payout for a new class of jailbreak or a previously undocumented data poisoning method encourages groundbreaking research that keeps you ahead of attackers.
  • Ensure Clarity and Predictability: Researchers invest significant time and resources. If your reward ranges are too wide or the criteria too subjective, it creates uncertainty and discourages participation. Provide clear definitions for severity levels and, where possible, concrete examples of what constitutes a “Critical” vs. a “High” impact finding.
  • Foster a Collaborative Spirit: Your reward structure should not feel purely transactional. Consider bonuses for exceptional report quality, detailed root cause analysis, or suggested mitigations. This transforms a simple bug submission into a collaborative security engagement.

Defining Vulnerability Severity for AI Models

Translating traditional severity models like CVSS to AI is challenging. You need a bespoke framework that captures the unique failure modes of machine learning systems. We recommend a matrix approach based on two primary axes: Impact and Exploitability.

AI Vulnerability Severity Matrix Business Impact → Exploitability → Low High Easy Hard MEDIUM Low Impact, Easy Exploit HIGH High Impact, Easy Exploit LOW Low Impact, Hard Exploit CRITICAL High Impact, Hard Exploit (Bonus for Novelty)

Key Factors for Severity Assessment

  • Impact:
    • Data Exfiltration: Can the model be manipulated to reveal training data, PII, or proprietary code/prompts?
    • System Integrity: Can the vulnerability lead to persistent model manipulation, poisoning, or unauthorized fine-tuning?
    • Harmful Content Generation: Does the exploit bypass safety filters to generate illegal, dangerous, or reputation-damaging content?
    • Resource Consumption: Can the vulnerability trigger a denial-of-service (DoS) attack by consuming excessive computational resources?
    • Model Theft: Does the vulnerability allow an attacker to extract model weights or reconstruct the model architecture?
  • Exploitability:
    • Access Required: Does it require privileged API access or is it exploitable by any public user?
    • Resources Needed: Does the attack require significant computational power, a large dataset, or specialized hardware?
    • Reproducibility: Is the outcome deterministic or probabilistic? Can the attack be reliably reproduced?
    • Expertise Level: Does the exploit require deep, domain-specific knowledge of machine learning, or can it be executed with simple scripts?

Structuring Reward Tiers

Once you have a clear severity framework, you can map it to a reward structure. A tiered, range-based system offers the best balance of predictability and flexibility.

Severity Level Typical AI Vulnerabilities Example Reward Range Key Considerations
Critical Remote code execution via model interaction; Full extraction of proprietary model weights; Persistent manipulation of model behavior for all users. $15,000 – $30,000+ Reserve for vulnerabilities with catastrophic business impact. Often involves a novel technique. Bonuses should apply here.
High Consistent bypass of critical safety filters (e.g., generating illegal content); Exfiltration of sensitive training data subsets; High-confidence prompt extraction. $5,000 – $14,999 Represents a direct and immediate threat to users or the business. The exploit is reliable and accessible to a motivated attacker.
Medium Unreliable safety filter bypasses; Model hallucination leading to misinformation; Resource-intensive denial-of-service attacks. $1,000 – $4,999 Vulnerabilities that are valid but have mitigating factors, such as high exploit complexity or limited impact.
Low Minor prompt leaking (non-sensitive); Generating slightly off-brand or biased content; Identifying minor data quality issues. $100 – $999 Acknowledges valid but low-impact findings. Good for encouraging community engagement and identifying edge cases.

Beyond Monetary Rewards

Top-tier security researchers are motivated by more than just money. Integrating non-monetary rewards can significantly boost engagement and loyalty.

  • Public Recognition: A “Hall of Fame” or leaderboard publicly credits researchers for their contributions. This is highly valued in the community.
  • Exclusive Access: Offer trusted, high-performing researchers early access to new models, features, or private programs. This provides them with fresh, challenging targets and makes them feel like valued partners.
  • SWAG and Invitations: High-quality, exclusive merchandise or invitations to private company events can build a strong sense of community.
  • Collaboration Opportunities: For truly groundbreaking findings, consider offering to co-author a research paper or present at a conference with the researcher. This elevates their professional standing and provides immense value.