Scoping & Rules of Engagement

intermediate10 min readUpdated 2026-03-15

Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.

scoping rules-of-engagement authorization methodology tradecraft engagement

Scoping and rules of engagement (ROE) define what an AI red team is authorized to test, how they may test it, and what constraints apply. Poor scoping leads to either incomplete testing (too narrow) or operational incidents (too broad). For AI systems, scoping must address concerns that do not exist in traditional penetration testing: the stochastic nature of model behavior, the risk of generating genuinely harmful content during testing, rate limits on inference APIs, and the distinction between model-level and application-level testing.

Scope Definition

Scope Components

An AI red team scope must specify each of the following:

Component	What to Define	Example
Target system	The specific AI application, model, and version under test	"Customer-facing chatbot on support.example.com using GPT-4o through Azure OpenAI"
Attack vectors	Which injection types, jailbreak categories, and exploitation techniques are in scope	"Direct injection, indirect injection via RAG, tool abuse; gradient-based attacks out of scope"
Harm categories	Which types of harmful output the team should attempt to elicit	"CSAM, violence, self-harm, PII leakage, bias — per OWASP LLM Top 10"
Infrastructure	Whether underlying infrastructure (APIs, databases, cloud services) is in scope	"Application-layer only; cloud infrastructure out of scope"
Data sensitivity	Whether real user data, PII, or production data may be accessed or generated	"Synthetic test data only; no real customer data in prompts"
User roles	Which user roles and access levels will be tested	"Anonymous user, authenticated standard user, admin user"
Integrations	Which tool integrations, plugins, and external services are in scope	"Knowledge base RAG, ticket creation API; email integration out of scope"

Scope Levels

AI red team engagements typically fall into one of three scope levels:

Level 1: Model Assessment. Testing the model's safety alignment and behavioral boundaries. Focuses on prompt injection, jailbreaking, and harmful content generation. Does not test application-layer controls.

Level 2: Application Assessment. Testing the complete application including the model, system prompt, input/output filters, tool integrations, and user interface. This is the most common engagement type.

Level 3: Full-Stack Assessment. Testing the entire AI system stack from infrastructure to model to application. Includes API security, authentication, infrastructure configuration, training pipeline, and deployment security in addition to model-level testing.

Rules of Engagement

AI-Specific ROE Considerations

Beyond standard penetration testing ROE, AI red teaming requires additional rules:

Rate limiting and cost controls. API inference calls cost money. Define:

Maximum API calls per hour/day
Maximum tokens consumed per engagement
Who pays for inference costs (client or testing firm)
Whether throttling or batching is required

Harmful content handling. Testing for harmful content generation creates a dilemma: the red team must demonstrate the harm but also handle the generated content responsibly. Define:

How harmful outputs are captured and stored
Who has access to evidence containing harmful content
Retention and destruction schedules for harmful content
Whether harmful content can be included in reports (redacted or full)
Specific content categories that should not be generated even during testing

Model interaction boundaries. Define:

Whether the team may attempt to modify model behavior (fine-tuning, RLHF manipulation)
Whether adversarial training data or poisoning attempts are in scope
Whether weight extraction or model stealing attempts are authorized
Maximum conversation length and number of concurrent sessions

Production vs. staging. Define:

Whether testing occurs on production or staging systems
Whether real users may be affected during testing
Time windows for testing (business hours, off-hours)
Rollback procedures if testing causes unexpected behavior changes

Authorization Chain

AI red team authorization requires sign-off from additional stakeholders compared to traditional pen testing:

Stakeholder	What They Authorize	Why They Are Needed
System owner	Overall testing authorization	Standard pen test authorization
AI/ML team lead	Model-level testing, safety boundary probing	Understands model capabilities and risks
Legal counsel	Harmful content generation, data handling	Liability for generating/storing harmful content
Privacy officer	PII handling, data flows during testing	User data may flow through AI systems under test
API provider (if applicable)	Testing against their API	Provider ToS may restrict adversarial testing
CISO / Security lead	Infrastructure scope, escalation procedures	Security governance

Escalation Procedures

Define clear escalation triggers:

Trigger	Action	Who to Notify
Critical vulnerability (data exfiltration, code execution via tools)	Immediate halt and notification	Engagement lead, system owner, CISO
Harmful content generated that poses real-world risk	Stop testing in that category, secure evidence	Engagement lead, legal counsel
Unexpected model behavior (persistent personality changes, recursive tool calls)	Pause and assess	AI/ML team lead
Rate limit exceeded or unexpected cost spike	Pause testing	System owner, engagement lead
Access to real user data during testing	Stop immediately, document exposure	Privacy officer, system owner, legal

Success Criteria

Defining "Success" for AI Red Teaming

Unlike traditional pen testing where success is often binary (did you compromise the target?), AI red teaming operates on a spectrum. Define success criteria along multiple dimensions:

Coverage metrics:

Percentage of OWASP LLM Top 10 categories tested
Number of attack vectors tested per scope component
Number of unique techniques attempted per vector

Finding metrics:

Number and severity of findings
Bypass rates for each technique category
Percentage of safety boundaries successfully breached

Quality metrics:

All findings reproducible with documented evidence
Remediation recommendations provided for each finding
Severity ratings calibrated to business impact

Severity Framework

Agree on severity classification before testing begins:

Severity	Criteria	Example
Critical	Reliable exploitation with high impact; data exfiltration, code execution, or safety bypass affecting all users	System prompt injection that reliably causes the chatbot to execute arbitrary tool calls, exfiltrating customer data
High	Moderate reliability with significant impact, or high reliability with moderate impact	Jailbreak technique that bypasses content filters 60% of the time, generating harmful instructions
Medium	Low reliability with significant impact, or moderate reliability with limited impact	Encoding bypass that works 20% of the time to extract the system prompt
Low	Low reliability and limited impact; informational findings	Model reveals its provider and version when asked directly
Informational	No direct exploitation but indicates defense weakness	System prompt does not include anti-override language

Engagement Documentation

Test Plan Template

A test plan should include:

Engagement overview: Target system, scope level, timeline, team composition
Scope details: In-scope components, out-of-scope exclusions, authorized attack vectors
Rules of engagement: Rate limits, harmful content handling, escalation procedures
Test matrix: Specific test cases organized by attack vector and target component
Success criteria: Coverage, finding, and quality metrics
Evidence requirements: What must be captured for each finding
Reporting deliverables: Report format, audience, delivery timeline

Pre-Engagement Checklist

Before testing begins, verify:

Try It Yourself

Practice

Exercise: Draft a Rules of Engagement Document

Create an ROE document for a hypothetical AI red team engagement.

Step 1
Define a hypothetical target: an enterprise AI assistant that helps employees search company documents, summarize meetings, and draft emails, powered by a commercial LLM API with RAG over an internal knowledge base.
Step 2
Draft a scope definition covering all seven scope components (target system, attack vectors, harm categories, infrastructure, data sensitivity, user roles, integrations). Explicitly state what is in and out of scope.
Step 3
Write the AI-specific ROE sections: rate limiting and cost controls, harmful content handling, model interaction boundaries, and escalation procedures with specific triggers and contacts.

Success criteria: You have a complete ROE document that a stakeholder could review and approve, covering scope, constraints, escalation procedures, and success criteria.

Red Team Methodology - The overall engagement lifecycle
AI-Specific Threat Modeling - Threat models that inform scope
Evidence Collection - Standards for capturing and preserving findings
Continuous Red Teaming - How scoping adapts for ongoing programs

References

NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
PTES (2024). Penetration Testing Execution Standard - Pre-Engagement
CREST (2024). CREST Penetration Testing Guide

Knowledge Check

Why does AI red team scoping require authorization from an API provider in addition to the system owner?

Scoping & Rules of Engagement

Related articles

Scoping & Rules of Engagement

Related articles