Engagement Planning and Scoping
How to plan and scope an AI red teaming engagement, including defining objectives, rules of engagement, success criteria, and methodology selection.
Proper planning and scoping is the difference between a productive assessment and an unfocused exercise. AI red teaming engagements require specific scoping considerations — including rules of engagement and stochastic measurement standards — that do not exist in traditional penetration testing.
Defining Objectives
Before scoping the technical assessment, clarify the business objectives:
| Objective Type | Example | Engagement Focus |
|---|---|---|
| Compliance | "Validate OWASP Top 10 for LLMs coverage" | Systematic checklist-based testing |
| Pre-deployment | "Is this chatbot safe to launch?" | Broad attack surface assessment |
| Incident response | "We found a jailbreak in production" | Focused reproduction and variant testing |
| Continuous | "Ongoing security monitoring" | CART pipeline development |
| Competitive | "How do we compare to peers?" | Benchmarking against industry standards |
Scoping Checklist
In-Scope Systems
## Scope Definition
### Target Systems
- [ ] Primary LLM application (chat interface, API)
- [ ] Model(s) used (identify specific models and versions)
- [ ] System prompt and application logic
- [ ] RAG pipeline and knowledge base
- [ ] Tool integrations (MCP servers, function calls)
- [ ] Supporting infrastructure (API gateway, auth, CDN)
- [ ] Fine-tuning pipeline and training data
- [ ] Model supply chain (sources, registries, dependencies)
### Attack Categories
- [ ] Prompt injection (direct and indirect)
- [ ] Jailbreaking and safety bypass
- [ ] Data extraction (training data, system prompt, user data)
- [ ] Tool abuse and agent exploitation
- [ ] RAG poisoning
- [ ] Infrastructure exploitation
- [ ] Supply chain assessment
- [ ] Denial of service / cost exhaustionRules of Engagement
AI-specific rules of engagement must address:
## Rules of Engagement
### Permitted
- Manual prompt injection testing via the standard user interface
- Automated testing at [X] requests per minute
- System prompt extraction attempts
- Jailbreak testing across [specified] content categories
### Not Permitted
- Testing against production systems during peak hours
- Actual data exfiltration (test with canary data only)
- Attacks that could corrupt production training data
- Social engineering of operations staff
- Supply chain attacks against third-party model registries
### Handling of Sensitive Outputs
- Any PII discovered in model outputs must be reported immediately
- Jailbreak outputs must not be stored in plaintext
- All test data must be deleted within [X] days of engagement endSuccess Criteria
Define what constitutes a finding and how severity is assessed:
Severity Framework
| Severity | Criteria | Example |
|---|---|---|
| Critical | >50% success rate + high impact | System prompt with credentials extracted reliably |
| High | >20% success rate + medium-high impact | Jailbreak bypasses safety on restricted categories |
| Medium | >5% success rate + medium impact | Indirect injection via RAG with limited impact |
| Low | <5% success rate or low impact | Safety bypass requiring unrealistic conditions |
| Informational | No direct exploit but security-relevant | Model fingerprinting reveals version information |
Stochastic Measurement Standards
## Measurement Standards
- Minimum 20 trials per finding for success rate calculation
- 95% confidence interval must be reported alongside point estimate
- Environmental conditions (temperature, model version, session state)
must be documented for each finding
- Findings must be reproducible by a second tester using documented payloadsMethodology Selection
Choose the testing approach based on objectives and constraints:
| Approach | Best For | Coverage | Depth |
|---|---|---|---|
| Checklist-based | Compliance, baseline assessment | Broad | Shallow |
| Threat-model-driven | Specific risk scenarios | Focused | Deep |
| Exploratory | Novel attack discovery | Unpredictable | Variable |
| Automated (CART) | Continuous monitoring, regression | Broad | Medium |
| Hybrid (recommended) | Most engagements | Broad + focused | Deep where needed |
The recommended approach for most engagements is hybrid: use automated tools (Automation Frameworks) for broad coverage and checklist validation, then invest manual effort in the areas with the highest risk or the most interesting attack surface.
Try It Yourself
Related Topics
- Full Engagement Methodology -- overview of the complete engagement lifecycle
- Execution & Reporting -- executing against the scope defined here
- Recon & Tradecraft -- the reconnaissance phase that follows scoping
- Automation Frameworks -- CART methodology selection for continuous testing
- Threat Modeling -- structured threat analysis that informs scoping decisions
References
- NIST, "AI Risk Management Framework" (2023) -- federal framework for structuring AI risk assessments
- OWASP, "LLM AI Security & Governance Checklist" (2025) -- scoping checklist for LLM security assessments
- Casper et al., "Black-Box Access is Insufficient for Rigorous AI Audits" (2024) -- scoping considerations for access levels in AI assessments
Why do AI red teaming rules of engagement need specific policies for handling model outputs?