21.1.2. Disclosure ethics

2025.10.06.
AI Security Blog

You’ve found it: a critical vulnerability in a widely deployed AI system. The elation of discovery quickly gives way to a heavy question: what now? The path from identifying a flaw to its resolution is fraught with ethical dilemmas. Disclosing your findings is not a simple act of reporting a bug; it’s a strategic decision that balances the immediate risk of arming adversaries against the long-term benefit of enabling defense. This is the core of disclosure ethics.

The Spectrum of Disclosure Models

In traditional cybersecurity, several models have emerged to handle vulnerability disclosure. These provide a useful starting point for AI red teaming, though as we’ll see, they require significant adaptation.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Private (or “Responsible”) Disclosure

You report the vulnerability exclusively to the affected vendor, providing them a reasonable amount of time to develop and deploy a fix before any public announcement. This is the most common and often preferred method.

  • Pro: Minimizes the risk of the vulnerability being exploited by attackers before a patch is available. Fosters a collaborative relationship with vendors.
  • Con: Relies on the vendor’s goodwill and competence. An unresponsive vendor can leave users indefinitely exposed. The timeline for a “reasonable” fix can be contentious.

Coordinated Disclosure

This is an evolution of private disclosure. It involves a third-party coordinator (like a national CERT) that helps manage communication between the researcher and the vendor. The coordinator helps negotiate timelines and ensures all stakeholders are informed before a public release.

  • Pro: Adds structure and accountability to the process. Ideal for vulnerabilities affecting multiple vendors or complex supply chains.
  • Con: Can add administrative overhead. Finding a suitable and trusted coordinator for novel AI vulnerabilities can be challenging.

Full Disclosure

The details of the vulnerability are made public immediately, without prior notification to the vendor. This approach is often seen as aggressive but is sometimes used to force an unresponsive vendor to act.

  • Pro: Maximum transparency. Empowers users to create their own mitigations and pressures the vendor to act quickly.
  • Con: Creates a “zero-day” situation where attackers and defenders receive the information simultaneously, potentially leading to widespread exploitation.

No Disclosure

The finder keeps the vulnerability secret. This is generally discouraged, as it leaves the system’s users vulnerable and assumes no one else will find the same flaw.

Why AI Disclosure is Different

Applying traditional disclosure models directly to AI systems is problematic. The nature of the technology introduces unique complexities that change the ethical calculus.

  1. The “Patch” is Not a Patch: Fixing a traditional software bug often involves changing a few lines of code. Fixing a fundamental flaw in an AI model might require full-scale data curation, architectural changes, and complete retraining—a process that can take months and enormous computational resources. Sometimes, a vulnerability (like a model’s inherent bias) may not be “patchable” at all, only mitigable.
  2. The Vulnerability is often an Idea: An exploit in AI is frequently a technique or a concept (e.g., a new style of prompt injection) rather than a specific piece of code. Once you disclose the technique, it’s out there forever and can be adapted to countless other models. The “patch” for one system becomes an attack vector for many others.
  3. Ambiguous Boundaries of Failure: Is it a “vulnerability” if a large language model can be prompted to generate harmful content? Or is that a misuse of its intended function? Unlike a buffer overflow, which is an unambiguous bug, many AI failures exist in a gray area, making it difficult to define what warrants a formal disclosure process.
  4. Difficulty of Verification: Reproducing a specific model failure can be hard. It may depend on subtle factors like inference parameters, random seeds, or the exact state of a conversational context. A vendor may struggle to validate a report if they cannot replicate the issue precisely.

A Framework for Ethical Disclosure Decisions

There is no universal answer, but you can navigate the dilemma with a structured decision-making process. Before disclosing, consider the following factors to guide your strategy.

1. Identify &Verify Vulnerability 2. Assess Impact (Severity & Scope) 3. Analyze Exploitability & Patch Feasibility 4. Check Vendor Policy (VDP / Bug Bounty?) Coordinated /Responsible Disclosure Limited Disclosure /Public Awareness Campaign High Impact, Patchable Low Feasibility, High Public Interest

This flowchart simplifies a complex process. The key is to weigh your duty to protect users by enabling a fix against the risk of enabling attackers if the fix is slow, difficult, or impossible.

Navigating the Disclosure Matrix

Another way to frame your decision is by using a matrix that maps the severity of the vulnerability against the feasibility of fixing it. This helps you choose a disclosure strategy that fits the specific situation.

High Patch Feasibility Medium Patch Feasibility Low / No Patch Feasibility
High Severity Strategy: Standard Coordinated Disclosure.
Action: Report to vendor with a firm but reasonable timeline (e.g., 90 days).
Strategy: Extended Coordinated Disclosure.
Action: Work closely with vendor, potentially involve a CERT. Longer timeline, focus on mitigations.
Strategy: Limited Disclosure / Policy Advocacy.
Action: Notify vendor and regulators. Public disclosure may focus on the *risk* and *type* of flaw, not the exact exploit.
Medium Severity Strategy: Responsible Disclosure.
Action: Report to vendor via their VDP/bug bounty. Follow their standard process.
Strategy: Cautious Responsible Disclosure.
Action: Report to vendor, but be prepared for a long fix cycle. Assess public interest over time.
Strategy: Academic / Community Disclosure.
Action: Publish findings in a research paper or share with a trusted research consortium after notifying vendor.
Low Severity Strategy: Vendor Notification.
Action: Inform the vendor as a best practice, but immediate public disclosure is less risky if they are unresponsive.
Strategy: Vendor Notification.
Action: Report and let the vendor prioritize. Low urgency for public disclosure.
Strategy: Document and Observe.
Action: Inform vendor. The flaw is likely an accepted limitation. Disclosure has minimal value.

Ultimately, ethical disclosure for AI systems is an exercise in judgment. There are no easy answers, only competing responsibilities: to the vendor, to the users, and to the wider public. Your goal as a red teamer is not just to find flaws but to be a responsible steward of that knowledge. The guiding principle should always be harm reduction—choosing the path most likely to decrease overall risk, even when that path is not the simplest or most direct.