AI Red Teaming Methodology
A structured methodology for AI red teaming engagements, covering reconnaissance, target profiling, attack planning, and the tradecraft that distinguishes professional assessments.
Effective AI red teaming follows a structured methodology, just like traditional penetration testing. Random prompt injection attempts are far less effective than a systematic approach that starts with thorough reconnaissance and progressively builds toward targeted exploitation.
The AI Red Teaming Lifecycle
1. Scope & Planning → 2. Reconnaissance → 3. Target Profiling →
4. Attack Planning → 5. Exploitation → 6. Post-Exploitation → 7. Reporting
Phase 1: Scope and Planning
Define what is in scope, what success looks like, and what rules of engagement apply. AI-specific scoping concerns include:
- Which models and applications are in scope?
- Is fine-tuning/training data testing permitted?
- Are supply chain attacks (model registries, dependencies) in scope?
- What constitutes a "successful" jailbreak or injection?
- How will stochastic results be evaluated?
Phase 2: Reconnaissance
Gather information about the target without directly interacting with the AI system. See Target Profiling.
Phase 3: Target Analysis
Interact with the system to understand its behavior:
- System Prompt Extraction — Discover the system's instructions and constraints
- Capability Mapping — Map what the system can do, including tools and integrations
Phase 4-7: Attack and Report
Plan attacks based on reconnaissance, execute them, document results, and report findings. See the Capstone section for full engagement methodology.
Key Tradecraft Principles
| Principle | Description |
|---|---|
| Profile before you attack | Invest time in understanding the target before attempting exploits |
| Test systematically | Vary one parameter at a time to understand what works and why |
| Document everything | AI behavior is stochastic — record exact prompts, responses, and success rates |
| Use open models as proxies | Test techniques on open-weight models before targeting production systems |
| Respect rate limits | Aggressive testing triggers rate limiting and may alert defenders |
Reconnaissance Depth
The depth of reconnaissance determines the quality of your attacks:
| Recon Depth | What You Learn | Attack Quality |
|---|---|---|
| None | "It's a chatbot" | Random injection attempts |
| Basic | Model family, visible features | Generic attacks for that model type |
| Moderate | System prompt, tools, safety rules | Targeted attacks against specific defenses |
| Deep | Architecture, training data sources, deployment details | Custom exploits targeting specific weaknesses |
Start with the pages in this section to build your reconnaissance capabilities, then apply them in the context of a full engagement using the Capstone methodology.
Related Topics
- Advanced Recon Techniques -- deeper reconnaissance and system prompt extraction methods
- Capstone: Full Engagement -- applying reconnaissance in the context of a full professional engagement
- Defense Evasion -- bypassing defenses identified during recon
- Agent Exploitation -- leveraging capability mapping to exploit agent tools
- Target Profiling -- detailed model fingerprinting and profiling techniques
References
- Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023) -- reconnaissance-informed indirect injection
- Perez & Ribeiro, "Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs" (2023) -- systematic approach to discovering LLM weaknesses
- MITRE, "ATLAS: Adversarial Threat Landscape for AI Systems" (2023) -- structured reconnaissance framework for AI systems
Why is reconnaissance important before attempting prompt injection attacks?