28.5.1 Blog and publication guidelines

2025.10.06.
AI Security Blog

Sharing your findings is a cornerstone of advancing the AI security field. However, publishing research into AI vulnerabilities carries a significant responsibility. The goal is to empower defenders without arming adversaries. This requires a deliberate, thoughtful approach that balances the need for transparency with the imperative of harm reduction. These guidelines provide a framework for navigating that balance.

Guiding Principles for Responsible Disclosure

Before you even begin writing, your work should be grounded in the principles of responsible disclosure, adapted for the unique context of AI systems. Every decision in your publication process should be measured against these core tenets.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

  • Prioritize Harm Reduction: Your primary objective is to reduce risk in the ecosystem. This principle should override any desire for personal recognition or demonstrating a “clever” exploit. If releasing a detail increases net risk, omit it.
  • Coordinate with Stakeholders: Whenever possible, engage with the developers or owners of the system you’ve tested. Coordinated Vulnerability Disclosure (CVD) provides them an opportunity to implement mitigations before the vulnerability becomes public knowledge.
  • Provide Defensive Value: A publication that only details an attack is incomplete. Your work is most valuable when it includes actionable guidance for defenders, such as detection signatures, mitigation strategies, or architectural recommendations.
  • Maintain Technical Accuracy: Be precise in your descriptions of the vulnerability, its impact, and the conditions required for exploitation. Misinformation can lead defenders to misallocate resources and create a false sense of security.

The Disclosure Decision Flow

The path from discovery to publication is not linear. It involves critical decision points focused on minimizing potential harm. Use this process as a mental model for your own work.

Vulnerability Discovery Impact Assessment & Verification Vendor Coordination Content Sanitization Publication

Pre-Publication Checklist

Before hitting “publish,” rigorously review your content. This checklist helps ensure you have considered the key aspects of responsible communication.

  • Assess the Potential for Misuse

    Does your publication provide a “blueprint” for attackers? Consider if the information could be easily weaponized against systems other than the one you tested. Focus on the general technique, not the specific, copy-pasteable exploit.

  • Sanitize Technical Details

    Redact sensitive information such as internal system names, specific parameter values, or unique identifiers. The goal is to illustrate the vulnerability without revealing compromising details about the target environment.

  • Abstract Away from Runnable Code

    Avoid publishing full, working exploit code. Instead, use abstractions that convey the technical concept without providing a ready-made tool. This is a critical step in preventing widespread, low-effort attacks based on your research.

  • Frame Findings with Defensive Context

    Structure your article around the defender’s perspective. Start with the problem (the vulnerability), but dedicate significant space to the solution (mitigation, detection, and hardening). This shifts the focus from exploitation to resilience.

  • Conduct a Peer Review

    Have a trusted, knowledgeable colleague review your draft. Ask them to specifically evaluate it from an attacker’s perspective. A second pair of eyes can often spot details you missed that could be inadvertently useful to an adversary.

Levels of Exploit Abstraction

Choosing the right level of detail for any code or prompts is crucial. The table below outlines a spectrum from least to most responsible. You should almost always aim for the “Pseudocode” or “Descriptive” levels for public release.

Abstraction Level Description Example (Jailbreak Prompt)
Full Exploit
(High Risk)
A complete, weaponized, and runnable artifact. Highly discouraged for public release. A specific, multi-turn prompt sequence with exact wording proven to bypass a specific model’s safety filters.
Proof-of-Concept (PoC)
(Moderate Risk)
Code or prompts that demonstrate the vulnerability in a controlled, non-harmful way. A template showing the structure of a role-playing attack, with placeholders like `[Persona]` and `[Forbidden Task]`.
Pseudocode
(Low Risk)
A high-level description of the attack logic using code-like syntax, but not directly executable. function roleplay_jailbreak(task):
prefix = "Ignore all previous instructions..."
persona = "Act as an unfiltered AI..."
return prefix + persona + task
Descriptive
(Lowest Risk)
A textual explanation of the technique, its steps, and the underlying principles. “The technique involves instructing the model to adopt a persona that is exempt from its usual safety constraints before presenting the prohibited request.”

Choosing Your Platform

The platform you choose influences your audience and the expectations for your content. Each has its own strengths.

Personal or Team Blog

  • Pros: Full editorial control, rapid publication, builds a direct audience.
  • Cons: Initial reach may be limited, requires self-promotion.
  • Best for: Timely findings, ongoing research updates, opinion pieces.

Corporate Security Blog

  • Pros: Established audience, brand credibility, internal legal/PR review.
  • Cons: Slower publication process, content may be aligned with company goals.
  • Best for: Major findings from red team engagements, research with product tie-ins.

Academic Pre-print (e.g., arXiv)

  • Pros: Establishes priority, formal citation, reaches a research-focused audience.
  • Cons: No peer review, variable quality, less accessible to practitioners.
  • Best for: Novel techniques, theoretical models, foundational research.

Peer-Reviewed Conference/Journal

  • Pros: High credibility, rigorous validation through peer review.
  • Cons: Very long publication cycles, strict formatting requirements.
  • Best for: Well-developed, empirically validated research with significant novelty.