Methodology Walkthroughs
Step-by-step walkthroughs for each phase of an AI red team engagement: kickoff, reconnaissance, attack execution, and report writing.
An AI red team engagement is not a hacking free-for-all. It is a structured process with defined phases, clear deliverables, and professional standards. Following a consistent methodology ensures comprehensive coverage, reproducible results, and reports that drive meaningful security improvements.
The Four Phases
Every AI red team engagement follows four phases, regardless of scope or target:
Kickoff → Reconnaissance → Attack Execution → Reporting
| Phase | Duration | Key Output |
|---|---|---|
| Engagement Kickoff | 1-3 days | Signed rules of engagement, scope document, environment access |
| Reconnaissance | 2-5 days | System architecture map, capability inventory, attack surface analysis |
| Attack Execution | 5-15 days | Documented findings with evidence, severity ratings, reproduction steps |
| Report Writing | 3-5 days | Executive summary, technical findings, remediation recommendations |
Engagement Types
The methodology adapts to different engagement types:
Prompt-level assessment -- Focus on prompt injection, jailbreaking, and content policy bypass. Primarily uses promptfoo and garak. Typical duration: 1-2 weeks.
Full-stack AI security assessment -- Covers prompts, API security, infrastructure, data pipeline, and access controls. Uses all tools in the toolkit. Typical duration: 3-4 weeks.
Agentic system assessment -- Focuses on function calling, tool use, multi-agent interactions, and autonomous behavior. Requires deep understanding of the agent architecture. Typical duration: 2-3 weeks.
Training pipeline assessment -- Evaluates data ingestion, fine-tuning processes, model artifacts, and deployment pipelines. Requires environment access beyond the model API. Typical duration: 3-4 weeks.
Continuous red teaming -- Ongoing assessment integrated into the development lifecycle. CI/CD integration, periodic manual testing, and monitoring. Duration: ongoing.
Adapting to Client Maturity
How you execute the methodology depends on the client's security maturity:
| Client Maturity | Approach | Focus |
|---|---|---|
| No AI security program | Education-heavy, focus on fundamentals | Basic injection resistance, system prompt protection, content policy |
| Basic guardrails deployed | Test guardrail effectiveness | Bypass techniques, edge cases, configuration errors |
| Mature AI security program | Advanced techniques, novel attacks | Multi-turn attacks, cross-modal exploitation, training pipeline |
Walkthrough Index
- Engagement Kickoff -- Initial meetings, scope definition, rules of engagement, environment setup
- Reconnaissance Workflow -- System prompt extraction, model identification, capability mapping, attack surface analysis
- Attack Execution -- Selecting techniques, building attack chains, real-time documentation, evidence management
- Report Writing -- Structure, executive summary, technical findings, risk ratings, remediation, peer review