Visualizing Red Team Results
Creating effective visualizations for AI red team reports: ASR charts, attack taxonomy heatmaps, defense coverage matrices, and Python visualization code.
Visualizing Red Team Results
A well-chosen visualization communicates findings faster and more memorably than paragraphs of text. This page covers the chart types most useful for AI red team reports, when to use each, and production-ready Python code to generate them.
Visualization Selection Guide
| Data Type | Best Visualization | Audience | Example |
|---|---|---|---|
| Success rates across techniques | Horizontal bar chart | Technical, executive | ASR by attack category |
| Coverage across attack surfaces | Heatmap | Technical, management | Attack taxonomy coverage |
| Finding severity distribution | Donut or stacked bar chart | Executive | Critical/High/Medium/Low breakdown |
| Defense effectiveness | Matrix / grid | Technical | Defense vs. attack type matrix |
| Trend over time | Line chart | Management, executive | ASR across quarterly assessments |
| Attack chain flow | Sankey diagram | Technical | Multi-step attack progression |
Attack Success Rate (ASR) Charts
Attack Success Rate is the most fundamental metric in AI red teaming.
Horizontal Bar Chart -- ASR by Category
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np
# Data
categories = [
"Direct Injection",
"Indirect Injection",
"Role-play Jailbreak",
"Encoding Bypass",
"System Prompt Extraction",
"Tool Abuse",
"Multi-turn Manipulation",
"Safety Bypass",
]
asr_values = [45, 20, 70, 15, 60, 10, 35, 55]
# Color by severity threshold
colors = []
for v in asr_values:
if v >= 50:
colors.append("#dc3545") # Red - critical
elif v >= 30:
colors.append("#fd7e14") # Orange - warning
else:
colors.append("#28a745") # Green - acceptable
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(categories, asr_values, color=colors, height=0.6)
# Add value labels
for bar, val in zip(bars, asr_values):
ax.text(bar.get_width() + 1.5, bar.get_y() + bar.get_height() / 2,
f"{val}%", va="center", fontsize=11, fontweight="bold")
# Threshold lines
ax.axvline(x=50, color="#dc3545", linestyle="--", alpha=0.5, label="Critical threshold")
ax.axvline(x=30, color="#fd7e14", linestyle="--", alpha=0.5, label="Warning threshold")
ax.set_xlabel("Attack Success Rate (%)", fontsize=12)
ax.set_title("Attack Success Rate by Category", fontsize=14, fontweight="bold")
ax.set_xlim(0, 100)
ax.xaxis.set_major_formatter(mticker.PercentFormatter())
ax.legend(loc="lower right")
ax.invert_yaxis()
plt.tight_layout()
plt.savefig("asr_by_category.png", dpi=150, bbox_inches="tight")
plt.show()Grouped Bar Chart -- Model Comparison
import matplotlib.pyplot as plt
import numpy as np
categories = ["Injection", "Jailbreak", "Extraction", "Tool Abuse", "Safety"]
model_a = [45, 70, 60, 10, 55]
model_b = [25, 40, 30, 5, 20]
x = np.arange(len(categories))
width = 0.35
fig, ax = plt.subplots(figsize=(10, 6))
bars1 = ax.bar(x - width / 2, model_a, width, label="Before Remediation",
color="#dc3545", alpha=0.8)
bars2 = ax.bar(x + width / 2, model_b, width, label="After Remediation",
color="#28a745", alpha=0.8)
ax.set_ylabel("Attack Success Rate (%)")
ax.set_title("Remediation Effectiveness by Attack Category")
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()
ax.set_ylim(0, 100)
# Add value labels
for bars in [bars1, bars2]:
for bar in bars:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2, height + 1,
f"{height}%", ha="center", va="bottom", fontsize=9)
plt.tight_layout()
plt.savefig("remediation_comparison.png", dpi=150, bbox_inches="tight")
plt.show()Attack Taxonomy Heatmap
A heatmap shows testing coverage and results across two dimensions (attack type vs. target component, for example).
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# Rows: attack types, Columns: target components
attack_types = [
"Direct Injection", "Indirect Injection", "Role-play Jailbreak",
"Encoding Bypass", "Prompt Extraction", "Tool Manipulation",
]
components = ["Chat API", "RAG Pipeline", "Agent Tools", "Safety Filter"]
# Values: ASR percentages (-1 = not tested)
data = np.array([
[45, 20, -1, 30],
[15, 55, 40, 10],
[70, -1, 25, 50],
[15, 5, -1, 20],
[60, 30, 10, -1],
[-1, -1, 35, 5],
])
# Mask untested cells
mask = data == -1
display_data = np.where(mask, 0, data)
fig, ax = plt.subplots(figsize=(10, 7))
sns.heatmap(
display_data, annot=True, fmt="d", mask=mask,
xticklabels=components, yticklabels=attack_types,
cmap="RdYlGn_r", vmin=0, vmax=100,
linewidths=1, linecolor="white",
cbar_kws={"label": "Attack Success Rate (%)"},
ax=ax,
)
# Mark untested cells
for i in range(data.shape[0]):
for j in range(data.shape[1]):
if mask[i, j]:
ax.text(j + 0.5, i + 0.5, "N/T", ha="center", va="center",
fontsize=10, color="gray", fontstyle="italic")
ax.set_title("Attack Taxonomy Coverage Heatmap", fontsize=14, fontweight="bold")
ax.set_ylabel("Attack Type")
ax.set_xlabel("Target Component")
plt.tight_layout()
plt.savefig("attack_heatmap.png", dpi=150, bbox_inches="tight")
plt.show()Defense Coverage Matrix
Shows which defenses are effective against which attack types:
| Input Filter | Output Filter | Safety Classifier | Rate Limiting | Prompt Hardening | |
|---|---|---|---|---|---|
| Direct Injection | Partial | Effective | Effective | No effect | Partial |
| Indirect Injection | No effect | Partial | Partial | No effect | No effect |
| Jailbreak | Partial | Partial | Effective | Partial | No effect |
| Prompt Extraction | No effect | Effective | No effect | No effect | Effective |
| Tool Abuse | No effect | No effect | No effect | Effective | Partial |
Legend: Effective = blocks >80% | Partial = blocks 30-80% | No effect = blocks <30%
Severity Distribution
A donut chart showing finding severity distribution is a quick executive-level summary:
import matplotlib.pyplot as plt
labels = ["Critical", "High", "Medium", "Low"]
sizes = [2, 4, 6, 3]
colors = ["#dc3545", "#fd7e14", "#ffc107", "#28a745"]
explode = (0.05, 0, 0, 0)
fig, ax = plt.subplots(figsize=(8, 8))
wedges, texts, autotexts = ax.pie(
sizes, explode=explode, labels=labels, colors=colors,
autopct=lambda pct: f"{pct:.0f}%\n({int(round(pct / 100 * sum(sizes)))})",
shadow=False, startangle=90, pctdistance=0.75,
textprops={"fontsize": 12},
)
# Draw center circle for donut effect
centre_circle = plt.Circle((0, 0), 0.50, fc="white")
ax.add_artist(centre_circle)
ax.text(0, 0, f"{sum(sizes)}\nFindings", ha="center", va="center",
fontsize=16, fontweight="bold")
ax.set_title("Finding Severity Distribution", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.savefig("severity_distribution.png", dpi=150, bbox_inches="tight")
plt.show()Visualization Best Practices
| Practice | Why |
|---|---|
| Use colorblind-safe palettes | ~8% of men have color vision deficiency |
| Include data labels on all bars | Readers should not have to estimate from axes |
| Export at 150+ DPI | Prevents pixelation in reports and presentations |
| Title every chart descriptively | "ASR by Attack Category" not "Figure 1" |
| Explain what "good" and "bad" look like | Add threshold lines or color coding |
| Keep it simple | One message per chart -- split complex data into multiple visuals |
Related Topics
- Writing Executive Summaries -- charts that support executive communication
- Report Templates & Examples -- where visualizations fit in the report
- Metrics, KPIs & Demonstrating ROI -- program-level metrics and dashboards
References
- "Data Visualization Best Practices for Security Reporting" - SANS Institute (2024) - Visualization techniques tailored to security assessment audiences
- "The Visual Display of Quantitative Information" - Edward Tufte (2001) - Foundational principles for effective data visualization applicable to red team reporting
- "D3.js: Data-Driven Documents" - Observable (2024) - Visualization library commonly used for interactive security dashboards and heatmaps
- "Storytelling with Data" - Cole Nussbaumer Knaflic (2015) - Communication-focused approach to data visualization for non-technical stakeholders
In an attack taxonomy heatmap, how should untested attack/component combinations be displayed?