Engagement Planning and Scoping

intermediate8 min readUpdated 2026-03-12

How to plan and scope an AI red teaming engagement, including defining objectives, rules of engagement, success criteria, and methodology selection.

planning scoping engagement methodology capstone

Proper planning and scoping is the difference between a productive assessment and an unfocused exercise. AI red teaming engagements require specific scoping considerations — including rules of engagement and stochastic measurement standards — that do not exist in traditional penetration testing.

Defining Objectives

Before scoping the technical assessment, clarify the business objectives:

Objective Type	Example	Engagement Focus
Compliance	"Validate OWASP Top 10 for LLMs coverage"	Systematic checklist-based testing
Pre-deployment	"Is this chatbot safe to launch?"	Broad attack surface assessment
Incident response	"We found a jailbreak in production"	Focused reproduction and variant testing
Continuous	"Ongoing security monitoring"	CART pipeline development
Competitive	"How do we compare to peers?"	Benchmarking against industry standards

Scoping Checklist

In-Scope Systems

## Scope Definition
 
### Target Systems
- [ ] Primary LLM application (chat interface, API)
- [ ] Model(s) used (identify specific models and versions)
- [ ] System prompt and application logic
- [ ] RAG pipeline and knowledge base
- [ ] Tool integrations (MCP servers, function calls)
- [ ] Supporting infrastructure (API gateway, auth, CDN)
- [ ] Fine-tuning pipeline and training data
- [ ] Model supply chain (sources, registries, dependencies)
 
### Attack Categories
- [ ] Prompt injection (direct and indirect)
- [ ] Jailbreaking and safety bypass
- [ ] Data extraction (training data, system prompt, user data)
- [ ] Tool abuse and agent exploitation
- [ ] RAG poisoning
- [ ] Infrastructure exploitation
- [ ] Supply chain assessment
- [ ] Denial of service / cost exhaustion

Rules of Engagement

AI-specific rules of engagement must address:

## Rules of Engagement
 
### Permitted
- Manual prompt injection testing via the standard user interface
- Automated testing at [X] requests per minute
- System prompt extraction attempts
- Jailbreak testing across [specified] content categories
 
### Not Permitted
- Testing against production systems during peak hours
- Actual data exfiltration (test with canary data only)
- Attacks that could corrupt production training data
- Social engineering of operations staff
- Supply chain attacks against third-party model registries
 
### Handling of Sensitive Outputs
- Any PII discovered in model outputs must be reported immediately
- Jailbreak outputs must not be stored in plaintext
- All test data must be deleted within [X] days of engagement end

Success Criteria

Define what constitutes a finding and how severity is assessed:

Severity Framework

Severity	Criteria	Example
Critical	>50% success rate + high impact	System prompt with credentials extracted reliably
High	>20% success rate + medium-high impact	Jailbreak bypasses safety on restricted categories
Medium	>5% success rate + medium impact	Indirect injection via RAG with limited impact
Low	<5% success rate or low impact	Safety bypass requiring unrealistic conditions
Informational	No direct exploit but security-relevant	Model fingerprinting reveals version information

Stochastic Measurement Standards

## Measurement Standards
 
- Minimum 20 trials per finding for success rate calculation
- 95% confidence interval must be reported alongside point estimate
- Environmental conditions (temperature, model version, session state)
  must be documented for each finding
- Findings must be reproducible by a second tester using documented payloads

Methodology Selection

Choose the testing approach based on objectives and constraints:

Approach	Best For	Coverage	Depth
Checklist-based	Compliance, baseline assessment	Broad	Shallow
Threat-model-driven	Specific risk scenarios	Focused	Deep
Exploratory	Novel attack discovery	Unpredictable	Variable
Automated (CART)	Continuous monitoring, regression	Broad	Medium
Hybrid (recommended)	Most engagements	Broad + focused	Deep where needed

The recommended approach for most engagements is hybrid: use automated tools (Automation Frameworks) for broad coverage and checklist validation, then invest manual effort in the areas with the highest risk or the most interesting attack surface.

Try It Yourself

Practice

Exercise: Write a Scope Document for a Hypothetical AI Chatbot Red Team Engagement

Draft a one-page scoping document for a fictional AI customer support chatbot. This exercise builds the planning discipline needed before any technical testing begins.

Step 1
Define the hypothetical target: a customer support chatbot for an e-commerce company called "ShopAssist AI." It uses a GPT-4-class model with a RAG pipeline connected to a product database and has tool access to look up orders and initiate refunds. Write a 2-3 sentence target description.
Step 2
Complete the scoping checklist. List the in-scope systems (chat interface, API, RAG pipeline, tool integrations), the attack categories to test (prompt injection, system prompt extraction, tool abuse, data extraction), and define rules of engagement including rate limits, testing hours, and handling of sensitive outputs.
Step 3
Define success criteria and the severity framework. Specify the minimum number of trials per finding (at least 20), the confidence interval standard (95%), and map example findings to severity levels (e.g., "extracting order data for other users" = Critical, "system prompt leak" = High).
Step 4
Select a methodology (hybrid recommended) and outline a rough timeline: Days 1-2 for recon, Days 3-5 for systematic testing, Day 6 for deep exploitation, Day 7 for reporting.

Success criteria: A complete one-page scope document that includes target description, in-scope systems, attack categories, rules of engagement, severity framework, measurement standards, methodology, and timeline. A peer should be able to execute an engagement from your document alone.

Full Engagement Methodology -- overview of the complete engagement lifecycle
Execution & Reporting -- executing against the scope defined here
Recon & Tradecraft -- the reconnaissance phase that follows scoping
Automation Frameworks -- CART methodology selection for continuous testing
Threat Modeling -- structured threat analysis that informs scoping decisions

References

NIST, "AI Risk Management Framework" (2023) -- federal framework for structuring AI risk assessments
OWASP, "LLM AI Security & Governance Checklist" (2025) -- scoping checklist for LLM security assessments
Casper et al., "Black-Box Access is Insufficient for Rigorous AI Audits" (2024) -- scoping considerations for access levels in AI assessments

Knowledge Check

Why do AI red teaming rules of engagement need specific policies for handling model outputs?

Engagement Planning and Scoping

Related articles

Engagement Planning and Scoping

Related articles