AI Red Team Report Writing

expert10 min readUpdated 2026-03-11

Writing AI red team reports: executive summaries, finding templates, AI-adapted risk ratings, remediation recommendations, and common mistakes to avoid.

report-writing risk-rating remediation documentation executive-summary findings red-team

AI Red Team Report Writing

The report is the only tangible deliverable of a red team engagement. A technically brilliant assessment with a poorly written report has zero organizational impact. The report must communicate findings to executives who control budgets, engineers who implement fixes, and compliance teams who track risk -- each requiring different levels of detail.

Audience-Driven Structure

A single report must serve multiple audiences. Structure it so each can extract what they need:

Audience	Reads	Needs	Format
Executive leadership (CTO, CISO)	Executive summary, risk assessment	Business impact, investment decisions	1-2 pages, no code, dollar amounts
Engineering leadership	Summary, finding overviews, recommendations	Remediation priorities, architecture changes	3-5 pages, high-level technical
Security engineers	Full findings with technical details	Exact exploit steps, root cause, fix validation	Detailed technical, code, screenshots
Compliance / Legal	Risk assessment, compliance mapping	Regulatory implications, audit evidence	Risk framework language, regulatory refs

Report Structure

Executive Summary (1-2 pages)
Engagement overview, critical findings summary, overall risk rating, top 3 recommendations, estimated remediation investment.
Engagement Details (2-3 pages)
Scope, objectives, methodology, timeline, team composition.
Findings (bulk of report)
Organized by severity or attack chain. Each follows a standard template with evidence and reproduction steps.
Attack Narratives (3-5 pages)
End-to-end attack chains, real-world threat mapping, execution timeline.
Risk Assessment (2-3 pages)
Overall posture, industry benchmarks, trend analysis (for repeat engagements).
Recommendations (3-5 pages)
Prioritized remediation plan with quick wins and strategic improvements.
Appendices
Detailed evidence, tool outputs, full reproduction steps, methodology details.

Executive Summary

Writing Principles

Lead with impact: "Customer data is exposed" not "Prompt injection in the RAG pipeline"
Quantify: Dollar amounts, user counts, time to exploit
Avoid jargon: No "XSS," "SSRF," or "embedding space collision" without plain-language explanation
Be direct: "The system is at critical risk" not "The assessment revealed certain areas of concern"
Include the positive: Note what defenses worked well for credibility

Executive Summary Template

## Engagement Overview
[Organization] engaged [Red Team] to conduct a [type] assessment
of [system] from [date] to [date], targeting [scope] with the
objective of [goal].
 
## Key Findings
| Severity | Count | Top Example |
|----------|-------|-------------|
| Critical | N     | [1-line description] |
| High     | N     | [1-line description] |
 
## Critical Risk Summary
[2-3 sentences: most significant risk in business terms]
 
## Overall Risk Rating: [CRITICAL/HIGH/MEDIUM/LOW]
[1-2 sentences justification]
 
## Top Recommendations
1. [Action] - [impact]
2. [Action] - [impact]
3. [Action] - [impact]
 
## Estimated Remediation Investment
| Priority | Effort   | Timeline  | Cost     |
|----------|----------|-----------|----------|
| Critical | X days   | Immediate | $amount  |
| High     | X weeks  | 30 days   | $amount  |

Finding Documentation

Finding Template

Each finding follows a consistent structure:

Section	Content
Header	Finding ID, title, severity, status, category, OWASP LLM Top 10 mapping
Summary	2-3 sentences: what, what an attacker can do, business impact
Technical Details	Root cause analysis (WHY it exists), attack vector description
Reproduction Steps	Exact numbered steps with specific inputs
Evidence	Timestamped, SHA-256 hashed screenshots and captures
Impact Analysis	Confidentiality, Integrity, Availability + Business Impact
Affected Components	Component names with versions
Recommendations	Immediate (0-7d), Short-term (7-30d), Long-term (30-90d)
References	Standards, CVEs, internal docs

AI-Adapted Risk Rating

Standard CVSS does not adequately capture AI-specific risks. Supplement with three factors:

Rating Factors

Factor	Score Range	What to Assess
Exploitability	0-10	Attack complexity, access requirements, reproducibility, automation potential
Impact	0-10	Scope (single user vs. all users), data sensitivity, safety implications, persistence
Detectability	0-10 (inverted: higher = harder to detect)	Logging coverage, anomaly visibility, attribution difficulty, detection latency

Composite Score

Score = (Exploitability + Impact + Detectability) / 3
 
9.0-10.0: Critical    4.0-6.9: Medium
7.0-8.9:  High        0.0-3.9: Low

Rating Examples

Finding	Exploitability	Impact	Detectability	Score	Rating
Multi-turn jailbreak	7	4	5	5.3	Medium
RAG poisoning + data exfil	8	9	8	8.3	High
Pickle RCE via HuggingFace typosquat	6	10	7	7.7	High

Remediation Recommendations

Prioritization Template

Phase	Timeline	Finding	Action	Owner	Effort
Emergency	This week	AIRT-001	Deploy output filtering WAF rule	Security Ops	4 hours
Emergency	This week	AIRT-003	Disable `trust_remote_code` in model loader	ML Eng	2 hours
Critical	30 days	AIRT-001	Implement RAG content scanning	ML Platform	5 days
Critical	30 days	AIRT-002	Add auth to inference server endpoints	Infrastructure	3 days
Strategic	90 days	ALL	Deploy AI-specific WAF	Security	3 weeks
Strategic	90 days	ALL	Model provenance verification pipeline	ML Platform	2 weeks

Common Report Writing Mistakes

Mistake 1: Technical Narcissism

Writing to impress other security researchers instead of driving organizational action.

"We achieved a novel multi-turn adversarial prompt injection leveraging the model's in-context learning capabilities to construct a semantic jailbreak that circumvents the RLHF-trained refusal boundary through progressive context manipulation."

"We bypassed the AI safety controls through a series of messages that gradually shifted the conversation context. The attack requires no specialized tools and can be performed by any user. The result is that the AI assistant provides restricted information that could [specific business impact]."

Mistake 2: Findings Without Impact

"The model is susceptible to jailbreaking through DAN-style prompts."

"The model's safety controls can be bypassed in approximately 3 messages, allowing it to generate [specific content]. For a healthcare application serving N patients daily, this creates liability under [regulation] and could result in patient harm."

Mistake 3: Missing the Attack Chain

Finding 1: System prompt leakage (Medium)
Finding 2: API key in system prompt (High)
Finding 3: Order API lacks authorization (High)

"Findings 1, 2, and 3 combine to form a critical attack path: an unauthenticated user extracts the system prompt (F1), which contains the order API key (F2), enabling access to all customer orders via the unprotected API (F3). Individual findings are Medium-High, but the chain is Critical."

Report Quality Checklist

Content Quality

Every finding has clear reproduction steps
Evidence is timestamped and integrity-verified (hashes)
Business impact stated for every finding
Recommendations are specific and actionable
No technical jargon in the executive summary
Attack narratives tell a coherent story

Accuracy

All reproduction steps re-verified before submission
Severity ratings are consistent across findings
Scope boundaries accurately documented
Timeline matches evidence timestamps

Completeness

All in-scope systems addressed (even if no findings)
Positive observations included (what worked well)
Environmental limitations documented
Out-of-scope observations noted for future work

Security

Report encrypted at rest and in transit
Distribution list documented and restricted
No actual customer/user data in the report
Evidence files stored per ROE requirements

Knowledge Check

Which of the following is the most effective way to present three related vulnerabilities (system prompt leak, API key in prompt, unauthenticated API) in a red team report?

Full Engagement -- The end-to-end engagement methodology that produces reports
AI-Specific Threat Modeling -- Threat models that inform risk ratings and finding categorization
CART Pipelines -- Continuous testing produces findings that follow the same report templates
AI Application Security -- Application-level findings frequently documented in reports

References

OWASP Testing Guide — Report structure and methodology documentation
MITRE ATLAS — Standardized AI threat classification for reports
NIST AI Risk Management Framework (AI RMF 1.0, 2023) — Risk framework for AI findings classification

AI Red Team Report Writing

Executive Summary (1-2 pages)

Engagement Details (2-3 pages)

Findings (bulk of report)

Attack Narratives (3-5 pages)

Risk Assessment (2-3 pages)

Recommendations (3-5 pages)

Appendices

Related articles

AI Red Team Report Writing

Executive Summary (1-2 pages)

Engagement Details (2-3 pages)

Findings (bulk of report)

Attack Narratives (3-5 pages)

Risk Assessment (2-3 pages)

Recommendations (3-5 pages)

Appendices

Related articles