Scoping & Rules of Engagement
Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.
Scoping and rules of engagement (ROE) define what an AI 紅隊 is authorized to 測試, how they may 測試 it, and what constraints apply. Poor scoping leads to either incomplete 測試 (too narrow) or operational incidents (too broad). For AI systems, scoping must address concerns that do not exist in traditional penetration 測試: the stochastic nature of model behavior, the risk of generating genuinely harmful content during 測試, rate limits on 推論 APIs, and the distinction between model-level and application-level 測試.
Scope Definition
Scope Components
An AI 紅隊 scope must specify each of the following:
| Component | What to Define | 範例 |
|---|---|---|
| Target system | The specific AI application, model, and version under 測試 | "Customer-facing chatbot on support.example.com using GPT-4o through Azure OpenAI" |
| 攻擊 vectors | Which injection types, 越獄 categories, and 利用 techniques are in scope | "Direct injection, indirect injection via RAG, tool abuse; gradient-based attacks out of scope" |
| Harm categories | Which types of harmful 輸出 the team should attempt to elicit | "CSAM, violence, self-harm, PII leakage, bias — per OWASP LLM Top 10" |
| Infrastructure | Whether underlying infrastructure (APIs, databases, 雲端 services) is in scope | "Application-layer only; 雲端 infrastructure out of scope" |
| Data sensitivity | Whether real user data, PII, or production data may be accessed or generated | "Synthetic 測試 data only; no real customer data in prompts" |
| User roles | Which user roles and access levels will be tested | "Anonymous user, authenticated standard user, admin user" |
| Integrations | Which tool integrations, plugins, and external services are in scope | "知識庫 RAG, ticket creation API; email integration out of scope" |
Scope Levels
AI 紅隊 engagements typically fall into one of three scope levels:
Level 1: Model 評估. 測試 模型's 安全 對齊 and behavioral boundaries. Focuses on 提示詞注入, 越獄, and harmful content generation. Does not 測試 application-layer controls.
Level 2: Application 評估. 測試 the complete application including 模型, 系統提示詞, 輸入/輸出 filters, tool integrations, and user interface. 這是 the most common engagement type.
Level 3: Full-Stack 評估. 測試 the entire AI system stack from infrastructure to model to application. Includes API 安全, 認證, infrastructure configuration, 訓練 pipeline, and deployment 安全 此外 to model-level 測試.
Rules of Engagement
AI-Specific ROE Considerations
Beyond standard penetration 測試 ROE, AI 紅隊演練 requires additional rules:
Rate limiting and cost controls. API 推論 calls cost money. Define:
- Maximum API calls per hour/day
- Maximum 符元 consumed per engagement
- Who pays for 推論 costs (client or 測試 firm)
- Whether throttling or batching is required
Harmful content handling. 測試 for harmful content generation creates a dilemma: the 紅隊 must demonstrate the harm but also handle the generated content responsibly. Define:
- How harmful outputs are captured and stored
- Who has access to evidence containing harmful content
- Retention and destruction schedules for harmful content
- Whether harmful content can be included in reports (redacted or full)
- Specific content categories that should not be generated even during 測試
Model interaction boundaries. Define:
- Whether the team may attempt to modify model behavior (微調, RLHF manipulation)
- Whether 對抗性 訓練資料 or 投毒 attempts are in scope
- Whether weight extraction or model stealing attempts are authorized
- Maximum conversation length and number of concurrent sessions
Production vs. staging. Define:
- Whether 測試 occurs on production or staging systems
- Whether real users may be affected during 測試
- Time windows for 測試 (business hours, off-hours)
- Rollback procedures if 測試 causes unexpected behavior changes
Authorization Chain
AI 紅隊 授權 requires sign-off from additional stakeholders compared to traditional pen 測試:
| Stakeholder | What They Authorize | Why They Are Needed |
|---|---|---|
| System owner | Overall 測試 授權 | Standard pen 測試 授權 |
| AI/ML team lead | Model-level 測試, 安全 boundary probing | Understands model capabilities and risks |
| Legal counsel | Harmful content generation, data handling | Liability for generating/storing harmful content |
| Privacy officer | PII handling, data flows during 測試 | User data may flow through AI systems under 測試 |
| API provider (if applicable) | 測試 against their API | Provider ToS may restrict 對抗性 測試 |
| CISO / 安全 lead | Infrastructure scope, escalation procedures | 安全 governance |
Escalation Procedures
Define clear escalation triggers:
| Trigger | Action | Who to Notify |
|---|---|---|
| Critical 漏洞 (data exfiltration, code execution via tools) | Immediate halt and notification | Engagement lead, system owner, CISO |
| Harmful content generated that poses real-world risk | Stop 測試 in that category, secure evidence | Engagement lead, legal counsel |
| Unexpected model behavior (persistent personality changes, recursive tool calls) | Pause and 評估 | AI/ML team lead |
| Rate limit exceeded or unexpected cost spike | Pause 測試 | System owner, engagement lead |
| Access to real user data during 測試 | Stop immediately, document exposure | Privacy officer, system owner, legal |
Success Criteria
Defining "Success" for AI 紅隊演練
Unlike traditional pen 測試 where success is often binary (did you compromise the target?), AI 紅隊演練 operates on a spectrum. Define success criteria along multiple dimensions:
Coverage metrics:
- Percentage of OWASP LLM Top 10 categories tested
- Number of attack vectors tested per scope component
- Number of unique techniques attempted per vector
Finding metrics:
- Number and severity of findings
- Bypass rates 對每個 technique category
- Percentage of 安全 boundaries successfully breached
Quality metrics:
- All findings reproducible with documented evidence
- Remediation recommendations provided 對每個 finding
- Severity ratings calibrated to business impact
Severity Framework
Agree on severity classification before 測試 begins:
| Severity | Criteria | 範例 |
|---|---|---|
| Critical | Reliable 利用 with high impact; data exfiltration, code execution, or 安全 bypass affecting all users | System 提示詞注入 that reliably causes the chatbot to execute arbitrary tool calls, exfiltrating customer data |
| High | Moderate reliability with significant impact, or high reliability with moderate impact | 越獄 technique that bypasses content filters 60% of the time, generating harmful instructions |
| Medium | Low reliability with significant impact, or moderate reliability with limited impact | Encoding bypass that works 20% of the time to extract the 系統提示詞 |
| Low | Low reliability and limited impact; informational findings | Model reveals its provider and version when asked directly |
| Informational | No direct 利用 but indicates 防禦 weakness | 系統提示詞 does not include anti-override language |
Engagement Documentation
測試 Plan Template
A 測試 plan should include:
- Engagement overview: Target system, scope level, timeline, team composition
- Scope details: In-scope components, out-of-scope exclusions, authorized attack vectors
- Rules of engagement: Rate limits, harmful content handling, escalation procedures
- 測試 matrix: Specific 測試 cases organized by attack vector and target component
- Success criteria: Coverage, finding, and quality metrics
- Evidence requirements: What must be captured 對每個 finding
- Reporting deliverables: Report format, audience, delivery timeline
Pre-Engagement Checklist
Before 測試 begins, verify:
- Authorization document signed by all required stakeholders
- API provider terms of service reviewed and compliant
- 測試 accounts and credentials provisioned
- Staging environment available (if applicable)
- Rate limits and cost caps configured
- Evidence storage with appropriate access controls provisioned
- Escalation contacts verified and reachable
- Communication channel established with system owner
- Harmful content handling procedures documented and acknowledged
- Team briefed on scope boundaries and prohibited actions
Try It Yourself
相關主題
- 紅隊 Methodology - The overall engagement lifecycle
- AI-Specific Threat Modeling - Threat models that inform scope
- Evidence Collection - Standards for capturing and preserving findings
- Continuous 紅隊演練 - How scoping adapts for ongoing programs
參考文獻
- NIST (2024). AI Risk Management Framework (AI RMF 1.0)
- OWASP (2025). OWASP Top 10 for LLM Applications
- PTES (2024). Penetration 測試 Execution Standard - Pre-Engagement
- CREST (2024). CREST Penetration 測試 Guide
Why does AI 紅隊 scoping require 授權 from an API provider 此外 to 系統 owner?