AI Red Team Report Writing
Writing AI red team reports: executive summaries, finding templates, AI-adapted risk ratings, remediation recommendations, and common mistakes to avoid.
AI Red Team Report Writing
The report is the only tangible deliverable of a red team engagement. A technically brilliant assessment with a poorly written report has zero organizational impact. The report must communicate findings to executives who control budgets, engineers who implement fixes, and compliance teams who track risk -- each requiring different levels of detail.
Audience-Driven Structure
A single report must serve multiple audiences. Structure it so each can extract what they need:
| Audience | Reads | Needs | Format |
|---|---|---|---|
| Executive leadership (CTO, CISO) | Executive summary, risk assessment | Business impact, investment decisions | 1-2 pages, no code, dollar amounts |
| Engineering leadership | Summary, finding overviews, recommendations | Remediation priorities, architecture changes | 3-5 pages, high-level technical |
| Security engineers | Full findings with technical details | Exact exploit steps, root cause, fix validation | Detailed technical, code, screenshots |
| Compliance / Legal | Risk assessment, compliance mapping | Regulatory implications, audit evidence | Risk framework language, regulatory refs |
Report Structure
Executive Summary (1-2 pages)
Engagement overview, critical findings summary, overall risk rating, top 3 recommendations, estimated remediation investment.
Engagement Details (2-3 pages)
Scope, objectives, methodology, timeline, team composition.
Findings (bulk of report)
Organized by severity or attack chain. Each follows a standard template with evidence and reproduction steps.
Attack Narratives (3-5 pages)
End-to-end attack chains, real-world threat mapping, execution timeline.
Risk Assessment (2-3 pages)
Overall posture, industry benchmarks, trend analysis (for repeat engagements).
Recommendations (3-5 pages)
Prioritized remediation plan with quick wins and strategic improvements.
Appendices
Detailed evidence, tool outputs, full reproduction steps, methodology details.
Executive Summary
Writing Principles
- Lead with impact: "Customer data is exposed" not "Prompt injection in the RAG pipeline"
- Quantify: Dollar amounts, user counts, time to exploit
- Avoid jargon: No "XSS," "SSRF," or "embedding space collision" without plain-language explanation
- Be direct: "The system is at critical risk" not "The assessment revealed certain areas of concern"
- Include the positive: Note what defenses worked well for credibility
Executive Summary Template
## Engagement Overview
[Organization] engaged [Red Team] to conduct a [type] assessment
of [system] from [date] to [date], targeting [scope] with the
objective of [goal].
## Key Findings
| Severity | Count | Top Example |
|----------|-------|-------------|
| Critical | N | [1-line description] |
| High | N | [1-line description] |
## Critical Risk Summary
[2-3 sentences: most significant risk in business terms]
## Overall Risk Rating: [CRITICAL/HIGH/MEDIUM/LOW]
[1-2 sentences justification]
## Top Recommendations
1. [Action] - [impact]
2. [Action] - [impact]
3. [Action] - [impact]
## Estimated Remediation Investment
| Priority | Effort | Timeline | Cost |
|----------|----------|-----------|----------|
| Critical | X days | Immediate | $amount |
| High | X weeks | 30 days | $amount |Finding Documentation
Finding Template
Each finding follows a consistent structure:
| Section | Content |
|---|---|
| Header | Finding ID, title, severity, status, category, OWASP LLM Top 10 mapping |
| Summary | 2-3 sentences: what, what an attacker can do, business impact |
| Technical Details | Root cause analysis (WHY it exists), attack vector description |
| Reproduction Steps | Exact numbered steps with specific inputs |
| Evidence | Timestamped, SHA-256 hashed screenshots and captures |
| Impact Analysis | Confidentiality, Integrity, Availability + Business Impact |
| Affected Components | Component names with versions |
| Recommendations | Immediate (0-7d), Short-term (7-30d), Long-term (30-90d) |
| References | Standards, CVEs, internal docs |
AI-Adapted Risk Rating
Standard CVSS does not adequately capture AI-specific risks. Supplement with three factors:
Rating Factors
| Factor | Score Range | What to Assess |
|---|---|---|
| Exploitability | 0-10 | Attack complexity, access requirements, reproducibility, automation potential |
| Impact | 0-10 | Scope (single user vs. all users), data sensitivity, safety implications, persistence |
| Detectability | 0-10 (inverted: higher = harder to detect) | Logging coverage, anomaly visibility, attribution difficulty, detection latency |
Composite Score
Score = (Exploitability + Impact + Detectability) / 3
9.0-10.0: Critical 4.0-6.9: Medium
7.0-8.9: High 0.0-3.9: LowRating Examples
| Finding | Exploitability | Impact | Detectability | Score | Rating |
|---|---|---|---|---|---|
| Multi-turn jailbreak | 7 | 4 | 5 | 5.3 | Medium |
| RAG poisoning + data exfil | 8 | 9 | 8 | 8.3 | High |
| Pickle RCE via HuggingFace typosquat | 6 | 10 | 7 | 7.7 | High |
Remediation Recommendations
Prioritization Template
| Phase | Timeline | Finding | Action | Owner | Effort |
|---|---|---|---|---|---|
| Emergency | This week | AIRT-001 | Deploy output filtering WAF rule | Security Ops | 4 hours |
| Emergency | This week | AIRT-003 | Disable trust_remote_code in model loader | ML Eng | 2 hours |
| Critical | 30 days | AIRT-001 | Implement RAG content scanning | ML Platform | 5 days |
| Critical | 30 days | AIRT-002 | Add auth to inference server endpoints | Infrastructure | 3 days |
| Strategic | 90 days | ALL | Deploy AI-specific WAF | Security | 3 weeks |
| Strategic | 90 days | ALL | Model provenance verification pipeline | ML Platform | 2 weeks |
Common Report Writing Mistakes
Mistake 1: Technical Narcissism
Writing to impress other security researchers instead of driving organizational action.
"We achieved a novel multi-turn adversarial prompt injection leveraging the model's in-context learning capabilities to construct a semantic jailbreak that circumvents the RLHF-trained refusal boundary through progressive context manipulation."
"We bypassed the AI safety controls through a series of messages that gradually shifted the conversation context. The attack requires no specialized tools and can be performed by any user. The result is that the AI assistant provides restricted information that could [specific business impact]."
Mistake 2: Findings Without Impact
"The model is susceptible to jailbreaking through DAN-style prompts."
"The model's safety controls can be bypassed in approximately 3 messages, allowing it to generate [specific content]. For a healthcare application serving N patients daily, this creates liability under [regulation] and could result in patient harm."
Mistake 3: Missing the Attack Chain
- Finding 1: System prompt leakage (Medium)
- Finding 2: API key in system prompt (High)
- Finding 3: Order API lacks authorization (High)
"Findings 1, 2, and 3 combine to form a critical attack path: an unauthenticated user extracts the system prompt (F1), which contains the order API key (F2), enabling access to all customer orders via the unprotected API (F3). Individual findings are Medium-High, but the chain is Critical."
Report Quality Checklist
Content Quality
- Every finding has clear reproduction steps
- Evidence is timestamped and integrity-verified (hashes)
- Business impact stated for every finding
- Recommendations are specific and actionable
- No technical jargon in the executive summary
- Attack narratives tell a coherent story
Accuracy
- All reproduction steps re-verified before submission
- Severity ratings are consistent across findings
- Scope boundaries accurately documented
- Timeline matches evidence timestamps
Completeness
- All in-scope systems addressed (even if no findings)
- Positive observations included (what worked well)
- Environmental limitations documented
- Out-of-scope observations noted for future work
Security
- Report encrypted at rest and in transit
- Distribution list documented and restricted
- No actual customer/user data in the report
- Evidence files stored per ROE requirements
Which of the following is the most effective way to present three related vulnerabilities (system prompt leak, API key in prompt, unauthenticated API) in a red team report?
Related Topics
- Full Engagement -- The end-to-end engagement methodology that produces reports
- AI-Specific Threat Modeling -- Threat models that inform risk ratings and finding categorization
- CART Pipelines -- Continuous testing produces findings that follow the same report templates
- AI Application Security -- Application-level findings frequently documented in reports
References
- OWASP Testing Guide — Report structure and methodology documentation
- MITRE ATLAS — Standardized AI threat classification for reports
- NIST AI Risk Management Framework (AI RMF 1.0, 2023) — Risk framework for AI findings classification