Full Red Team Engagement: End-to-End

expert9 min readUpdated 2026-03-11

Complete guide to AI red team engagements from scoping through attack execution, evidence collection, impact assessment, report delivery, and remediation validation.

engagement methodology rules-of-engagement attack-narrative business-impact remediation red-team

Full Red Team Engagement: End-to-End

An AI red team engagement is a structured adversarial assessment that evaluates the full AI system -- model behavior, application integration, infrastructure security, and organizational process -- against realistic threat scenarios. This page covers the complete lifecycle from scoping through remediation validation.

Engagement Lifecycle

Scoping & Planning → Rules of Engagement → Reconnaissance
       │                                        │
       ▼                                        ▼
   Threat Modeling ────▶ Attack Execution ◀─────┘
                              │
                              ▼
                Evidence Collection & Documentation
                              │
                              ▼
                Analysis & Impact Assessment
                              │
                              ▼
                Report Writing & Delivery
                              │
                              ▼
                Remediation Validation & Retest

Phase 1: Scoping and Planning

Scope Definition Checklist

Scope Element	Questions to Answer
Model layer	Which models? Fine-tuned or API-based? Are weights accessible?
Application layer	Web interface? API? Mobile? Internal tools?
Data layer	RAG knowledge base? Training pipeline? User data stores?
Infrastructure layer	Cloud provider? On-prem GPUs? Kubernetes? Model serving?
Integration layer	What tools can the LLM invoke? What external services?
Human layer	Social engineering of AI operators in scope?

Engagement Types

Type	Objective	Duration	Team	Deliverable
Safety Assessment	Evaluate model behavior against safety policies	1-2 weeks	1-2 researchers	Safety evaluation with failure modes
App Security Assessment	Find vulns in the LLM-powered application	2-4 weeks	2-3 appsec + 1 LLM specialist	Vulnerability report with PoC exploits
Full Red Team	Simulate realistic adversary against complete system	4-8 weeks	3-5 cross-functional	Attack narrative with business impact
Continuous Red Team	Ongoing adversarial testing in dev cycle	Ongoing (quarterly)	2-3 embedded	Continuous findings + quarterly reports

Stakeholder Alignment

Before starting, align on these four questions:

Success criteria -- Is a jailbreak sufficient, or must you demonstrate downstream business impact?
Risk tolerance -- Demonstrate actual data exfiltration, or stop at proving the vulnerability exists?
Communication cadence -- How often for interim reports? What is the emergency escalation process?
Environmental constraints -- Production vs. staging? Peak hours to avoid? Data sensitivity levels?

Phase 2: Rules of Engagement

Standard ROE Sections

Authorization
Engagement sponsor, authorization date, legal review confirmation, specific authorized scope.
Scope boundaries
Explicit in-scope and out-of-scope lists. AI-specific exclusions: production customer data (use synthetic), third-party API services not owned by the org.
Testing constraints
Hours of operation, rate limits (avoid DoS), data handling rules, persistence and lateral movement authorization.
Communication
Primary contacts, defensive team contact for deconfliction, emergency contact for critical findings.
Critical finding protocol
Severity threshold definition, notification timeline (e.g., 4 hours for critical), encrypted communication method.
Evidence handling
Encrypted storage, access controls, retention period, destruction process after report delivery.

AI-Specific ROE Provisions

Provision	What to Specify
Token budget	Maximum tokens per session to prevent cost runaway
Prohibited prompts	Categories that are off-limits (e.g., CSAM-related content)
Output recording	All model outputs must be logged for review
Knowledge base injection	May/may not inject documents; all test data must be removable
Training data poisoning	May/may not submit poisoned data; synthetic data requirements
Model integrity	May/may not modify weights or upload LoRA adapters; rollback plan
Harmful content handling	Authorized categories only; store encrypted, destroy after report

Phase 3: Reconnaissance

Use the methodology from the Advanced Reconnaissance module:

Model identification -- behavioral fingerprinting, identity probes
System prompt extraction -- techniques from the extraction module
Tool/function enumeration -- probe for available capabilities
RAG system detection -- identify knowledge base sources
Infrastructure fingerprinting -- timing analysis, error messages, header inspection

Phase 4: Attack Execution

Attack Priority Matrix

	Low Feasibility	High Feasibility
High Impact	Strategic: sleeper agents, training poison, weight backdoors	Priority: prompt injection, system prompt leak, output injection, RAG poisoning
Low Impact	Deprioritize: side-channel attacks, timing attacks	Quick Wins: jailbreaking, model fingerprinting, bias detection

Building Attack Chains

The most impactful findings are multi-step chains demonstrating realistic threat scenarios:

Example: Data Exfiltration via RAG Poisoning
 
1. INITIAL ACCESS: RAG ingests from shared Confluence wiki
2. INJECTION: Create wiki page with embedded adversarial prompt
3. TRIGGER: Craft queries that retrieve the adversarial document
4. HIJACK: Retrieved content overrides system prompt, instructs
   LLM to include sensitive data as a "reference link"
5. EXFIL: LLM outputs image tag with query params containing
   sensitive data to attacker's server
6. IMPACT: Customer PII exfiltrated through AI assistant

Evidence Collection

For each action, log:

Timestamp (UTC)
Action type (recon, injection, extraction, etc.)
Target and technique used
Input data (what you sent)
Output data (what came back)
Evidence files (screenshots, captures)
Success/failure determination
SHA-256 hash of the log entry for integrity verification

Phase 5: Building Attack Narratives

From Findings to Stories

Business Impact Framework

Impact Category	What to Quantify
Financial	Remediation cost, regulatory fines, legal liability, revenue impact
Operational	Service disruption, recovery time, retraining costs
Reputational	Media coverage risk, customer trust erosion
Compliance	GDPR/CCPA/AI Act violations, audit findings
Safety	User harm from incorrect medical/legal/financial advice

Phase 6: Report Delivery

See the dedicated Report Writing guide for templates and structure.

Delivery Protocol

Draft delivery
Encrypted draft to engagement sponsor.
Technical review
1-2 hour meeting with the security team.
Factual validation
3-5 business days for accuracy review.
Executive briefing
30-60 minute presentation to leadership.
Final report
Delivered with all corrections incorporated.
Remediation workshop
Half-day collaborative planning session.
Retest scheduling
Schedule validation of implemented fixes.

Phase 7: Remediation Validation

Retest Methodology

Replay original exploits
Attempt the exact original exploit from the report. Verify it is now blocked.
Test bypass variations
Generate variations of the original attack (different encoding, phrasing, multi-turn). Verify that the fix is robust, not just pattern-matched to the specific payload.
Classify remediation status
REMEDIATED = original + all bypasses blocked. PARTIALLY REMEDIATED = original blocked but bypasses succeed. NOT REMEDIATED = original exploit still works.
Establish regression tests
Convert successful exploits into automated regression tests. Integrate with CART pipeline for continuous monitoring.

Report Writing — Document findings using structured report templates
Threat Modeling — Threat models drive engagement planning

Knowledge Check

Why should remediation validation include bypass variations rather than only replaying the original exploit?

References

MITRE ATLAS — AI threat knowledge base for engagement planning
OWASP Top 10 for LLM Applications — Vulnerability classification for findings
NIST AI Risk Management Framework (AI RMF 1.0, 2023) — Risk framework for AI engagement scoping

Full Red Team Engagement: End-to-End

Authorization

Scope boundaries

Testing constraints

Communication

Critical finding protocol

Evidence handling

Draft delivery

Technical review

Factual validation

Executive briefing

Final report

Remediation workshop

Retest scheduling

Replay original exploits

Test bypass variations

Classify remediation status

Establish regression tests

Learning Path

Related articles

Full Red Team Engagement: End-to-End

Authorization

Scope boundaries

Testing constraints

Communication

Critical finding protocol

Evidence handling

Draft delivery

Technical review

Factual validation

Executive briefing

Final report

Remediation workshop

Retest scheduling

Replay original exploits

Test bypass variations

Classify remediation status

Establish regression tests

Learning Path

Related articles