24.3.3 Visualization guidelines

2025.10.06.
AI Security Blog

Data without interpretation is noise. In an AI red teaming report, your findings are only as powerful as your ability to communicate them. A well-crafted visualization transforms complex test results into a clear, undeniable demonstration of risk. A poor one buries critical vulnerabilities under a layer of confusion. Your goal is not to create pretty charts; it is to forge compelling evidence.

This section provides guidelines for creating visualizations that are clear, honest, and impactful, ensuring your stakeholders—from engineers to executives—grasp the full implications of your findings.

Kapcsolati űrlap - EN

Do you have a question about AI Security? Reach out to us here:

Core Principles of Effective Visualization

Before selecting a chart type, internalize these foundational principles. They apply to every graph, diagram, and table you create.

  • Clarity Over Complexity: A simple bar chart that everyone understands is vastly superior to a complex, multi-axis plot that only you can decipher. Your objective is to reduce cognitive load for the reader, not showcase your data visualization skills.
  • Context is Non-Negotiable: A chart without a clear title, labeled axes (with units!), and a concise caption is an unfinished thought. Assume the reader has no prior context. Explain what is being measured and what the key takeaway is.
  • Audience-Centric Design: An executive summary might require a high-level bar chart showing “Attack Success Rate by Category.” The technical appendix, however, would benefit from a detailed scatter plot correlating perturbation epsilon with model confidence drop. Tailor the complexity to the audience.
  • Graphical Integrity: Your credibility is on the line. Never use misleading techniques like truncating a Y-axis to exaggerate a small change. Represent the data honestly, even if it makes the finding seem less dramatic. The technical truth is your strongest asset.

Choosing the Right Visualization for the Finding

The story your data tells should dictate the type of visualization you use. Here are common scenarios in AI red teaming and the appropriate visual tools.

Comparing Outcomes: Bar Charts

Use bar charts to compare discrete categories. They are excellent for showing things like attack success rates, performance degradation across different models, or the number of vulnerabilities found per component.

Jailbreak Success Rate by Technique 100% 50% 0% Success Rate Roleplay Base64 DAN 11.0 Code Gen

A simple bar chart clearly comparing the effectiveness of different jailbreaking techniques.

Showing Relationships: Scatter & Line Plots

When you need to show the relationship between two continuous variables, use a scatter plot or a line plot. This is perfect for illustrating how an attack’s success (or the model’s error) changes as a variable like perturbation magnitude, query count, or token length increases.

Example Use Case: Plotting model confidence score (Y-axis) against the L2 norm of an adversarial perturbation (X-axis). A sharp downward trend visually proves the model’s fragility.

Visualizing Process and Flow: Flowcharts & Sankey Diagrams

Complex, multi-step attacks are difficult to explain with text alone. A flowchart is an indispensable tool for mapping out the stages of a sophisticated prompt injection, data poisoning pipeline, or model evasion sequence.

1. User submits benign query 2. System appends hidden, malicious instruction 3. LLM processes full prompt

A simple flowchart illustrating a hidden instruction injection attack chain.

Demonstrating Misclassification: Confusion Matrices

A confusion matrix is essential for showing how a model’s classification behavior changes under attack. By presenting a “before” and “after” matrix, you can pinpoint exactly which categories become confused. Use a heatmap for intuitive visual encoding—darker colors indicate higher concentrations of predictions.

After Evasion Attack: Confusion Matrix
Predicted Class
Actual Class Benign Spam Phishing
Benign 1500 50 12
Spam 115 850 35
Phishing 250 40 510

The highlighted off-diagonal cells (e.g., 250 Phishing emails misclassified as Benign) immediately draw attention to the attack’s impact.

Common Pitfalls and How to Avoid Them

An adversarial mindset applies to your own reporting. Think about how your visualizations could be misinterpreted or unintentionally misleading.

  1. The Exaggerated Y-Axis: Starting a bar chart’s Y-axis at a value other than zero is a classic way to make small differences look monumental. Always start your magnitude axes at zero unless you have a compelling reason not to (e.g., plotting stock prices), and if you do, call it out explicitly.
  2. The “Rainbow Puke” Palette: Using too many colors, or colors without a logical meaning, creates visual noise. Use color purposefully. For example, use shades of a single color to show intensity, or use two contrasting colors (e.g., blue vs. orange) to compare two categories. Always check your charts with a colorblindness simulator.
  3. The Overloaded Chart: Resista the urge to put everything on one graph. A single chart trying to show four different metrics with two different Y-axes is unreadable. It is better to have two or three simple, clear charts than one “master” chart that no one can understand.
  4. The Unannotated Finding: Don’t make your reader hunt for the “aha!” moment. If a specific data point on a scatter plot represents the successful breach, circle it, add an arrow, and write a small text box explaining its significance right on the chart. Guide your reader’s eye to the most critical information.