Scoping & Rules of Engagement

Intermediate10 min readUpdated 2026-03-15

Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.

scoping rules-of-engagement authorization methodology tradecraft engagement

Scoping and rules of engagement (ROE) define what an AI 紅隊 is authorized to 測試, how they may 測試 it, and what constraints apply. Poor scoping leads to either incomplete 測試 (too narrow) or operational incidents (too broad). For AI systems, scoping must address concerns that do not exist in traditional penetration 測試: the stochastic nature of model behavior, the risk of generating genuinely harmful content during 測試, rate limits on 推論 APIs, and the distinction between model-level and application-level 測試.

Scope Definition

Scope Components

An AI 紅隊 scope must specify each of the following:

Component	What to Define	範例
Target system	The specific AI application, model, and version under 測試	"Customer-facing chatbot on support.example.com using GPT-4o through Azure OpenAI"
攻擊 vectors	Which injection types, 越獄 categories, and 利用 techniques are in scope	"Direct injection, indirect injection via RAG, tool abuse; gradient-based attacks out of scope"
Harm categories	Which types of harmful 輸出 the team should attempt to elicit	"CSAM, violence, self-harm, PII leakage, bias — per OWASP LLM Top 10"
Infrastructure	Whether underlying infrastructure (APIs, databases, 雲端 services) is in scope	"Application-layer only; 雲端 infrastructure out of scope"
Data sensitivity	Whether real user data, PII, or production data may be accessed or generated	"Synthetic 測試 data only; no real customer data in prompts"
User roles	Which user roles and access levels will be tested	"Anonymous user, authenticated standard user, admin user"
Integrations	Which tool integrations, plugins, and external services are in scope	"知識庫 RAG, ticket creation API; email integration out of scope"

Scope Levels

AI 紅隊 engagements typically fall into one of three scope levels:

Level 1: Model 評估. 測試模型's 安全對齊 and behavioral boundaries. Focuses on 提示詞注入, 越獄, and harmful content generation. Does not 測試 application-layer controls.

Level 2: Application 評估. 測試 the complete application including 模型, 系統提示詞, 輸入/輸出 filters, tool integrations, and user interface. 這是 the most common engagement type.

Level 3: Full-Stack 評估. 測試 the entire AI system stack from infrastructure to model to application. Includes API 安全, 認證, infrastructure configuration, 訓練 pipeline, and deployment 安全此外 to model-level 測試.

Rules of Engagement

AI-Specific ROE Considerations

Beyond standard penetration 測試 ROE, AI 紅隊演練 requires additional rules:

Rate limiting and cost controls. API 推論 calls cost money. Define:

Maximum API calls per hour/day
Maximum 符元 consumed per engagement
Who pays for 推論 costs (client or 測試 firm)
Whether throttling or batching is required

Harmful content handling. 測試 for harmful content generation creates a dilemma: the 紅隊 must demonstrate the harm but also handle the generated content responsibly. Define:

How harmful outputs are captured and stored
Who has access to evidence containing harmful content
Retention and destruction schedules for harmful content
Whether harmful content can be included in reports (redacted or full)
Specific content categories that should not be generated even during 測試

Model interaction boundaries. Define:

Whether the team may attempt to modify model behavior (微調, RLHF manipulation)
Whether 對抗性訓練資料 or 投毒 attempts are in scope
Whether weight extraction or model stealing attempts are authorized
Maximum conversation length and number of concurrent sessions

Production vs. staging. Define:

Whether 測試 occurs on production or staging systems
Whether real users may be affected during 測試
Time windows for 測試 (business hours, off-hours)
Rollback procedures if 測試 causes unexpected behavior changes

Authorization Chain

AI 紅隊授權 requires sign-off from additional stakeholders compared to traditional pen 測試:

Stakeholder	What They Authorize	Why They Are Needed
System owner	Overall 測試授權	Standard pen 測試授權
AI/ML team lead	Model-level 測試, 安全 boundary probing	Understands model capabilities and risks
Legal counsel	Harmful content generation, data handling	Liability for generating/storing harmful content
Privacy officer	PII handling, data flows during 測試	User data may flow through AI systems under 測試
API provider (if applicable)	測試 against their API	Provider ToS may restrict 對抗性測試
CISO / 安全 lead	Infrastructure scope, escalation procedures	安全 governance

Escalation Procedures

Define clear escalation triggers:

Trigger	Action	Who to Notify
Critical 漏洞 (data exfiltration, code execution via tools)	Immediate halt and notification	Engagement lead, system owner, CISO
Harmful content generated that poses real-world risk	Stop 測試 in that category, secure evidence	Engagement lead, legal counsel
Unexpected model behavior (persistent personality changes, recursive tool calls)	Pause and 評估	AI/ML team lead
Rate limit exceeded or unexpected cost spike	Pause 測試	System owner, engagement lead
Access to real user data during 測試	Stop immediately, document exposure	Privacy officer, system owner, legal

Success Criteria

Defining "Success" for AI 紅隊演練

Unlike traditional pen 測試 where success is often binary (did you compromise the target?), AI 紅隊演練 operates on a spectrum. Define success criteria along multiple dimensions:

Coverage metrics:

Percentage of OWASP LLM Top 10 categories tested
Number of attack vectors tested per scope component
Number of unique techniques attempted per vector

Finding metrics:

Number and severity of findings
Bypass rates 對每個 technique category
Percentage of 安全 boundaries successfully breached

Quality metrics:

All findings reproducible with documented evidence
Remediation recommendations provided 對每個 finding
Severity ratings calibrated to business impact

Severity Framework

Agree on severity classification before 測試 begins:

Severity	Criteria	範例
Critical	Reliable 利用 with high impact; data exfiltration, code execution, or 安全 bypass affecting all users	System 提示詞注入 that reliably causes the chatbot to execute arbitrary tool calls, exfiltrating customer data
High	Moderate reliability with significant impact, or high reliability with moderate impact	越獄 technique that bypasses content filters 60% of the time, generating harmful instructions
Medium	Low reliability with significant impact, or moderate reliability with limited impact	Encoding bypass that works 20% of the time to extract the 系統提示詞
Low	Low reliability and limited impact; informational findings	Model reveals its provider and version when asked directly
Informational	No direct 利用 but indicates 防禦 weakness	系統提示詞 does not include anti-override language

Engagement Documentation

測試 Plan Template

A 測試 plan should include:

Engagement overview: Target system, scope level, timeline, team composition
Scope details: In-scope components, out-of-scope exclusions, authorized attack vectors
Rules of engagement: Rate limits, harmful content handling, escalation procedures
測試 matrix: Specific 測試 cases organized by attack vector and target component
Success criteria: Coverage, finding, and quality metrics
Evidence requirements: What must be captured 對每個 finding
Reporting deliverables: Report format, audience, delivery timeline

Pre-Engagement Checklist

Before 測試 begins, verify:

Try It Yourself

Practice

Exercise: Draft a Rules of Engagement Document

Create an ROE document for a hypothetical AI 紅隊 engagement.

Step 1
Define a hypothetical target: an enterprise AI assistant that helps employees search company documents, summarize meetings, and draft emails, powered by a commercial LLM API with RAG over an internal 知識庫.
Step 2
Draft a scope definition covering all seven scope components (target system, attack vectors, harm categories, infrastructure, data sensitivity, user roles, integrations). Explicitly state what is in and out of scope.
Step 3
Write the AI-specific ROE sections: rate limiting and cost controls, harmful content handling, model interaction boundaries, and escalation procedures with specific triggers and contacts.

Success criteria: You have a complete ROE document that a stakeholder could review and approve, covering scope, constraints, escalation procedures, and success criteria.

參考文獻

NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
PTES (2024). Penetration 測試 Execution Standard - Pre-Engagement
CREST (2024). CREST Penetration 測試 Guide

Knowledge Check

Why does AI 紅隊 scoping require 授權 from an API provider 此外 to 系統 owner?

Scoping & Rules of Engagement

Intermediate10 min readUpdated 2026-03-15

Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.

scoping rules-of-engagement authorization methodology tradecraft engagement

Scope Definition

Scope Components

An AI 紅隊 scope must specify each of the following:

Component	What to Define	範例
Target system	The specific AI application, model, and version under 測試	"Customer-facing chatbot on support.example.com using GPT-4o through Azure OpenAI"
攻擊 vectors	Which injection types, 越獄 categories, and 利用 techniques are in scope	"Direct injection, indirect injection via RAG, tool abuse; gradient-based attacks out of scope"
Harm categories	Which types of harmful 輸出 the team should attempt to elicit	"CSAM, violence, self-harm, PII leakage, bias — per OWASP LLM Top 10"
Infrastructure	Whether underlying infrastructure (APIs, databases, 雲端 services) is in scope	"Application-layer only; 雲端 infrastructure out of scope"
Data sensitivity	Whether real user data, PII, or production data may be accessed or generated	"Synthetic 測試 data only; no real customer data in prompts"
User roles	Which user roles and access levels will be tested	"Anonymous user, authenticated standard user, admin user"
Integrations	Which tool integrations, plugins, and external services are in scope	"知識庫 RAG, ticket creation API; email integration out of scope"

Scope Levels

AI 紅隊 engagements typically fall into one of three scope levels:

Level 1: Model 評估. 測試模型's 安全對齊 and behavioral boundaries. Focuses on 提示詞注入, 越獄, and harmful content generation. Does not 測試 application-layer controls.

Rules of Engagement

AI-Specific ROE Considerations

Beyond standard penetration 測試 ROE, AI 紅隊演練 requires additional rules:

Rate limiting and cost controls. API 推論 calls cost money. Define:

Maximum API calls per hour/day
Maximum 符元 consumed per engagement
Who pays for 推論 costs (client or 測試 firm)
Whether throttling or batching is required

Harmful content handling. 測試 for harmful content generation creates a dilemma: the 紅隊 must demonstrate the harm but also handle the generated content responsibly. Define:

How harmful outputs are captured and stored
Who has access to evidence containing harmful content
Retention and destruction schedules for harmful content
Whether harmful content can be included in reports (redacted or full)
Specific content categories that should not be generated even during 測試

Model interaction boundaries. Define:

Whether the team may attempt to modify model behavior (微調, RLHF manipulation)
Whether 對抗性訓練資料 or 投毒 attempts are in scope
Whether weight extraction or model stealing attempts are authorized
Maximum conversation length and number of concurrent sessions

Production vs. staging. Define:

Whether 測試 occurs on production or staging systems
Whether real users may be affected during 測試
Time windows for 測試 (business hours, off-hours)
Rollback procedures if 測試 causes unexpected behavior changes

Authorization Chain

AI 紅隊授權 requires sign-off from additional stakeholders compared to traditional pen 測試:

Stakeholder	What They Authorize	Why They Are Needed
System owner	Overall 測試授權	Standard pen 測試授權
AI/ML team lead	Model-level 測試, 安全 boundary probing	Understands model capabilities and risks
Legal counsel	Harmful content generation, data handling	Liability for generating/storing harmful content
Privacy officer	PII handling, data flows during 測試	User data may flow through AI systems under 測試
API provider (if applicable)	測試 against their API	Provider ToS may restrict 對抗性測試
CISO / 安全 lead	Infrastructure scope, escalation procedures	安全 governance

Escalation Procedures

Define clear escalation triggers:

Trigger	Action	Who to Notify
Critical 漏洞 (data exfiltration, code execution via tools)	Immediate halt and notification	Engagement lead, system owner, CISO
Harmful content generated that poses real-world risk	Stop 測試 in that category, secure evidence	Engagement lead, legal counsel
Unexpected model behavior (persistent personality changes, recursive tool calls)	Pause and 評估	AI/ML team lead
Rate limit exceeded or unexpected cost spike	Pause 測試	System owner, engagement lead
Access to real user data during 測試	Stop immediately, document exposure	Privacy officer, system owner, legal

Success Criteria

Defining "Success" for AI 紅隊演練

Unlike traditional pen 測試 where success is often binary (did you compromise the target?), AI 紅隊演練 operates on a spectrum. Define success criteria along multiple dimensions:

Coverage metrics:

Percentage of OWASP LLM Top 10 categories tested
Number of attack vectors tested per scope component
Number of unique techniques attempted per vector

Finding metrics:

Number and severity of findings
Bypass rates 對每個 technique category
Percentage of 安全 boundaries successfully breached

Quality metrics:

All findings reproducible with documented evidence
Remediation recommendations provided 對每個 finding
Severity ratings calibrated to business impact

Severity Framework

Agree on severity classification before 測試 begins:

Severity	Criteria	範例
Critical	Reliable 利用 with high impact; data exfiltration, code execution, or 安全 bypass affecting all users	System 提示詞注入 that reliably causes the chatbot to execute arbitrary tool calls, exfiltrating customer data
High	Moderate reliability with significant impact, or high reliability with moderate impact	越獄 technique that bypasses content filters 60% of the time, generating harmful instructions
Medium	Low reliability with significant impact, or moderate reliability with limited impact	Encoding bypass that works 20% of the time to extract the 系統提示詞
Low	Low reliability and limited impact; informational findings	Model reveals its provider and version when asked directly
Informational	No direct 利用 but indicates 防禦 weakness	系統提示詞 does not include anti-override language

Engagement Documentation

測試 Plan Template

A 測試 plan should include:

Engagement overview: Target system, scope level, timeline, team composition
Scope details: In-scope components, out-of-scope exclusions, authorized attack vectors
Rules of engagement: Rate limits, harmful content handling, escalation procedures
測試 matrix: Specific 測試 cases organized by attack vector and target component
Success criteria: Coverage, finding, and quality metrics
Evidence requirements: What must be captured 對每個 finding
Reporting deliverables: Report format, audience, delivery timeline

Pre-Engagement Checklist

Before 測試 begins, verify:

Try It Yourself

Practice

Exercise: Draft a Rules of Engagement Document

Create an ROE document for a hypothetical AI 紅隊 engagement.

Step 1
Define a hypothetical target: an enterprise AI assistant that helps employees search company documents, summarize meetings, and draft emails, powered by a commercial LLM API with RAG over an internal 知識庫.
Step 2
Draft a scope definition covering all seven scope components (target system, attack vectors, harm categories, infrastructure, data sensitivity, user roles, integrations). Explicitly state what is in and out of scope.
Step 3
Write the AI-specific ROE sections: rate limiting and cost controls, harmful content handling, model interaction boundaries, and escalation procedures with specific triggers and contacts.

Success criteria: You have a complete ROE document that a stakeholder could review and approve, covering scope, constraints, escalation procedures, and success criteria.

參考文獻

NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
PTES (2024). Penetration 測試 Execution Standard - Pre-Engagement
CREST (2024). CREST Penetration 測試 Guide

Knowledge Check

Why does AI 紅隊 scoping require 授權 from an API provider 此外 to 系統 owner?

Scoping & Rules of Engagement

Scope Definition

Scope Components

Scope Levels

Rules of Engagement

AI-Specific ROE Considerations

Authorization Chain

Escalation Procedures

Success Criteria

Defining "Success" for AI 紅隊演練

Severity Framework

Engagement Documentation

測試 Plan Template

Pre-Engagement Checklist

Try It Yourself

相關主題

參考文獻

Scoping & Rules of Engagement

Scope Definition

Scope Components

Scope Levels

Rules of Engagement

AI-Specific ROE Considerations

Authorization Chain

Escalation Procedures

Success Criteria

Defining "Success" for AI 紅隊演練

Severity Framework

Engagement Documentation

測試 Plan Template

Pre-Engagement Checklist

Try It Yourself

相關主題

參考文獻

Scoping & Rules of Engagement

Related articles

Scoping & Rules of Engagement

Related articles