How to Scope an AI 紅隊 Engagement
Comprehensive walkthrough for scoping AI red team engagements from initial client contact through statement of work, covering target enumeration, risk-based prioritization, resource estimation, boundary definition, and legal considerations.
Scoping is where the engagement succeeds or fails. An under-scoped engagement misses critical attack surfaces and produces an incomplete 評估. An over-scoped engagement burns budget on low-value targets and delivers findings the client cannot act on. AI systems introduce scoping complexities that traditional penetration 測試 does not encounter: model endpoints have different risk profiles than web applications, 提示詞注入 測試 requires different time estimates than API fuzzing, and the boundary between "測試 the application" and "測試 模型" is often unclear. This walkthrough provides a systematic process for getting the scope right.
Step 1: Initial Discovery Meeting
The scoping process begins with a structured discovery meeting. Prepare a questionnaire that covers the information you need without overwhelming the client:
# AI 紅隊 Scoping Questionnaire
## System 概覽
1. What does your AI/ML system do? (One-paragraph description)
2. Who are the end users? (Internal employees, customers, general public)
3. What AI/ML models does it use? (OpenAI, Claude, open-source, custom-trained)
4. How are the models accessed? (API, self-hosted, 雲端-hosted)
5. What data does 模型 have access to? (User data, internal documents, databases)
## Architecture
6. Is there a 系統提示詞 or instruction set? Can we see it?
7. Does 模型 use tools/函式呼叫? If so, which tools?
8. Is there a RAG (檢索增強生成) pipeline?
9. What 輸入 preprocessing/validation exists?
10. What 輸出 filtering/監控 exists?
## Business Context
11. What is the worst-case scenario if the AI system is compromised?
12. Have you experienced any AI-related 安全 incidents?
13. Are there regulatory requirements (HIPAA, GDPR, financial regulations)?
14. Who are the stakeholders for the 評估 results?
## Technical Environment
15. What environments are available for 測試? (Production, staging, dev)
16. Are there rate limits or usage quotas we should be aware of?
17. Can we get 測試 accounts and API credentials?
18. Is there documentation (API docs, architecture diagrams)?
## Constraints
19. What is off-limits? (Specific systems, data, techniques)
20. Are there time-of-day or day-of-week restrictions?
21. Are there third-party services we need 授權 to 測試?
22. What is the desired timeline for the engagement?During the meeting, take detailed notes and ask follow-up questions. The answers to these questions directly determine the scope, timeline, and cost of the engagement.
Step 2: Target Enumeration
After the discovery meeting, enumerate every testable component. AI systems typically have more components than clients initially realize:
# scoping/target_enumeration.py
"""Systematic target enumeration for AI 紅隊 engagements."""
from dataclasses import dataclass
from enum import Enum
class ComponentType(Enum):
MODEL_ENDPOINT = "Model API Endpoint"
SYSTEM_PROMPT = "System Prompt / Instructions"
RAG_PIPELINE = "RAG / Retrieval Pipeline"
TOOL_INTEGRATION = "Tool / Function Calling"
MEMORY_SYSTEM = "Conversation Memory"
INPUT_PROCESSING = "輸入 Preprocessing"
OUTPUT_FILTERING = "輸出 Filtering"
AUTHENTICATION = "Authentication / Authorization"
DATA_STORE = "Data Store / Knowledge Base"
MONITORING = "監控 / Logging"
WEB_INTERFACE = "Web / Chat Interface"
API_GATEWAY = "API Gateway / Rate Limiting"
ORCHESTRATION = "代理 Orchestration Layer"
TRAINING_PIPELINE = "Model Training / 微調 Pipeline"
EMBEDDING_SERVICE = "嵌入向量 Service"
@dataclass
class ScopeTarget:
name: str
component_type: ComponentType
description: str
risk_level: str # Critical, High, Medium, Low
estimated_hours: float
dependencies: list[str]
notes: str = ""
def enumerate_targets(discovery_notes: dict) -> list[ScopeTarget]:
"""Generate a target list based on discovery meeting notes.
This function demonstrates the logic; in practice, you would
fill these in based on the actual client conversation.
"""
targets = []
# Every AI engagement includes these core targets
targets.append(ScopeTarget(
name="Primary LLM Endpoint",
component_type=ComponentType.MODEL_ENDPOINT,
description="The main model API endpoint serving user queries",
risk_level="High",
estimated_hours=16,
dependencies=[],
notes="Core 攻擊面 for 提示詞注入 and 越獄 測試",
))
targets.append(ScopeTarget(
name="System Prompt Configuration",
component_type=ComponentType.SYSTEM_PROMPT,
description="The instruction set that defines model behavior",
risk_level="High",
estimated_hours=8,
dependencies=["Primary LLM Endpoint"],
notes="測試 for extraction, override, and manipulation",
))
# Conditional targets based on architecture
if discovery_notes.get("has_rag"):
targets.append(ScopeTarget(
name="RAG Retrieval Pipeline",
component_type=ComponentType.RAG_PIPELINE,
description="Document retrieval and context injection system",
risk_level="High",
estimated_hours=16,
dependencies=["Primary LLM Endpoint"],
notes="測試 for retrieval 投毒, context injection, access control bypass",
))
if discovery_notes.get("has_tools"):
targets.append(ScopeTarget(
name="Tool / Function Calling",
component_type=ComponentType.TOOL_INTEGRATION,
description="External tools 模型 can invoke",
risk_level="Critical",
estimated_hours=24,
dependencies=["Primary LLM Endpoint"],
notes="Highest risk: tool abuse can lead to data exfiltration, unauthorized actions",
))
if discovery_notes.get("has_memory"):
targets.append(ScopeTarget(
name="Conversation Memory",
component_type=ComponentType.MEMORY_SYSTEM,
description="System that stores and recalls conversation history",
risk_level="Medium",
estimated_hours=8,
dependencies=["Primary LLM Endpoint"],
notes="測試 for cross-session data leakage, memory 投毒",
))
if discovery_notes.get("has_input_filtering"):
targets.append(ScopeTarget(
name="輸入 Validation / Filtering",
component_type=ComponentType.INPUT_PROCESSING,
description="Preprocessing that sanitizes or filters 使用者輸入",
risk_level="Medium",
estimated_hours=12,
dependencies=[],
notes="測試 filter bypass techniques",
))
if discovery_notes.get("has_output_filtering"):
targets.append(ScopeTarget(
name="輸出 Content Filtering",
component_type=ComponentType.OUTPUT_FILTERING,
description="Post-processing that filters model responses",
risk_level="Medium",
estimated_hours=12,
dependencies=["Primary LLM Endpoint"],
notes="測試 for filter evasion, encoding-based bypass",
))
if discovery_notes.get("has_web_interface"):
targets.append(ScopeTarget(
name="Web Chat Interface",
component_type=ComponentType.WEB_INTERFACE,
description="Browser-based interface for interacting with the AI",
risk_level="Medium",
estimated_hours=8,
dependencies=[],
notes="Standard web app 測試 plus AI-specific UI attacks",
))
if discovery_notes.get("has_auth"):
targets.append(ScopeTarget(
name="API Authentication",
component_type=ComponentType.AUTHENTICATION,
description="Authentication and 授權 mechanisms",
risk_level="High",
estimated_hours=8,
dependencies=[],
notes="測試 API key management, 符元 handling, session isolation",
))
return targetsStep 3: Risk-Based Prioritization
Not all targets deserve equal 測試 time. Prioritize based on a combination of business impact and exploitability:
# Risk Prioritization Matrix
| Component | Business Impact | Exploitability | Priority |
|-----------|----------------|----------------|----------|
| Tool/Function Calling | Critical (can perform actions) | High (提示詞注入 is well-understood) | **P0 - Must 測試** |
| RAG Pipeline | High (data access) | High (retrieval 投毒 is practical) | **P0 - Must 測試** |
| LLM Endpoint | High (core functionality) | High (many known attack vectors) | **P0 - Must 測試** |
| System Prompt | High (controls behavior) | Medium (model-dependent) | **P1 - Should 測試** |
| Authentication | High (access control) | Medium (standard web 安全) | **P1 - Should 測試** |
| 輸入 Filtering | Medium (防禦 layer) | Medium (bypass techniques exist) | **P1 - Should 測試** |
| 輸出 Filtering | Medium (防禦 layer) | Medium (encoding attacks) | **P2 - Nice to 測試** |
| Memory System | Medium (data leakage) | Low (requires multi-session) | **P2 - Nice to 測試** |
| Web Interface | Low (standard web) | Medium (well-understood) | **P3 - If time permits** |
| 監控 | Low (passive) | Low (hard to attack) | **P3 - If time permits** |The priority determines the 測試 order and the minimum time allocation. P0 targets must be tested for the engagement to be valuable. P1 targets should be included in a standard engagement. P2 and P3 are included in comprehensive engagements.
Step 4: Engagement Type Selection
Based on the target list and client needs, recommend an engagement type:
# Engagement Types
## Focused 評估 (40-60 hours)
- **Scope**: P0 targets only
- **Best for**: Organizations new to AI 紅隊演練, specific concern validation
- **Deliverable**: Executive summary + detailed findings for tested targets
- **Team**: 1 senior 紅隊員
- **Timeline**: 1-2 weeks
## Standard 評估 (80-120 hours)
- **Scope**: P0 + P1 targets
- **Best for**: Production AI applications, compliance requirements
- **Deliverable**: Full report with executive summary, technical findings, remediation guidance
- **Team**: 1 senior + 1 junior 紅隊員
- **Timeline**: 2-3 weeks
## Comprehensive 評估 (160-240 hours)
- **Scope**: All targets (P0 through P3)
- **Best for**: High-risk applications (healthcare, finance, government), M&A due diligence
- **Deliverable**: Full report + 威脅模型 + remediation roadmap + retest plan
- **Team**: 2 senior + 1 junior 紅隊員
- **Timeline**: 4-6 weeks
## Continuous 評估 (ongoing)
- **Scope**: P0 targets, automated + periodic manual 測試
- **Best for**: Organizations with rapid deployment cycles
- **Deliverable**: Weekly automated reports + quarterly manual assessments
- **Team**: Part-time senior oversight + automated tooling
- **Timeline**: Ongoing, minimum 3-month commitmentStep 5: Resource Estimation
Calculate the total effort, cost, and timeline:
# scoping/resource_estimation.py
"""Estimate resources for an AI 紅隊 engagement."""
from dataclasses import dataclass
@dataclass
class ResourceEstimate:
total_hours: float
senior_hours: float
junior_hours: float
tool_costs: float
api_costs: float
calendar_days: int
report_hours: float
def estimate_resources(targets: list, engagement_type: str) -> ResourceEstimate:
"""Calculate resource requirements based on targets and engagement type."""
# Base 測試 hours from target enumeration
testing_hours = sum(t.estimated_hours for t in targets)
# Add overhead multipliers
overhead = {
"focused": {"setup": 0.1, "reporting": 0.25, "review": 0.1},
"standard": {"setup": 0.15, "reporting": 0.3, "review": 0.15},
"comprehensive": {"setup": 0.2, "reporting": 0.35, "review": 0.2},
}
multipliers = overhead.get(engagement_type, overhead["standard"])
setup_hours = testing_hours * multipliers["setup"]
report_hours = testing_hours * multipliers["reporting"]
review_hours = testing_hours * multipliers["review"]
total_hours = testing_hours + setup_hours + report_hours + review_hours
# Staffing split (senior does critical work, junior does standard 測試)
senior_ratio = 0.7 if engagement_type == "focused" else 0.5
senior_hours = total_hours * senior_ratio
junior_hours = total_hours * (1 - senior_ratio)
# Tool and API costs
# Estimate based on typical usage patterns
tool_costs = 0 # Most tools are open source
api_query_count = testing_hours * 50 # ~50 queries per 測試 hour
api_cost_per_query = 0.01 # Average across models
api_costs = api_query_count * api_cost_per_query
# Calendar days (assuming 6 productive hours per day)
productive_hours_per_day = 6
if engagement_type == "focused":
team_size = 1
elif engagement_type == "standard":
team_size = 2
else:
team_size = 3
calendar_days = int(total_hours / (productive_hours_per_day * team_size)) + 5 # +5 for scheduling buffer
return ResourceEstimate(
total_hours=round(total_hours, 1),
senior_hours=round(senior_hours, 1),
junior_hours=round(junior_hours, 1),
tool_costs=round(tool_costs, 2),
api_costs=round(api_costs, 2),
calendar_days=calendar_days,
report_hours=round(report_hours, 1),
)Step 6: Defining Scope Boundaries
Clear boundaries prevent scope creep and legal issues. Document what is explicitly in scope, out of scope, and conditionally in scope:
# Scope Boundary Definition Template
## In Scope
- [ ] LLM API endpoint at [URL]
- [ ] 系統提示詞 and configuration review
- [ ] Prompt injection and 越獄 測試
- [ ] Tool/函式呼叫 安全 測試
- [ ] RAG pipeline 投毒 測試
- [ ] 輸入 and 輸出 filter bypass 測試
- [ ] Authentication and 授權 測試
- [ ] API rate limiting and abuse 測試
- [ ] Conversation memory isolation 測試
## Out of Scope
- [ ] Physical infrastructure and network 安全
- [ ] Social engineering of employees (unless specifically authorized)
- [ ] Denial-of-service attacks against production systems
- [ ] 測試 of third-party AI model providers directly (OpenAI, Anthropic APIs)
- [ ] Source code review (unless specifically authorized)
- [ ] Model weight extraction or model stealing attacks
- [ ] 測試 outside the defined time window
- [ ] Any 測試 in production environment (unless specifically authorized)
## Conditionally In Scope (requires separate 授權)
- [ ] 測試 in production environment during low-traffic windows
- [ ] Automated scanning with high request volumes (>1000 req/min)
- [ ] 測試 involving PII or sensitive data
- [ ] Cross-tenant isolation 測試 in multi-tenant systems
- [ ] 測試 third-party integrations (requires third-party 授權)Step 7: Legal and Authorization Framework
AI 紅隊演練 introduces legal considerations beyond traditional penetration 測試:
# Legal Considerations Checklist
## Authorization
- [ ] Written 授權 from 系統 owner
- [ ] Authorization covers all environments being tested
- [ ] Time boundaries explicitly specified
- [ ] Techniques and tools explicitly authorized
- [ ] Data handling agreement in place
## Third-Party Considerations
- [ ] Reviewed AI provider's acceptable use policy
- [ ] Reviewed AI provider's penetration 測試 policy
- [ ] Obtained necessary third-party authorizations
- [ ] Documented compliance with provider terms of service
## Data Handling
- [ ] Agreement on how 測試 data will be stored
- [ ] Agreement on how findings containing sensitive data will be handled
- [ ] Agreement on data retention period
- [ ] Agreement on secure destruction after engagement
## AI-Specific Legal Considerations
- [ ] Clarified whether generating harmful model outputs during 測試
constitutes a policy violation under the provider's TOS
- [ ] Confirmed that 對抗性 測試 is authorized under the
organization's AI governance framework
- [ ] Documented that any harmful outputs generated during 測試
are part of authorized 安全 測試
- [ ] Established protocol for handling truly dangerous model outputs
(e.g., instructions for real-world harm)
## Incident Protocol
- [ ] Defined what constitutes a "critical finding" requiring immediate reporting
- [ ] Established escalation path for critical findings
- [ ] Defined communication channel for urgent issues
- [ ] Agreed on the protocol if actual malicious activity is discoveredStep 8: Building the Statement of Work
Compile all scoping decisions into a formal Statement of Work:
# AI 紅隊 Engagement - Statement of Work
## 1. Engagement 概覽
- **Client**: [Client Name]
- **System Under 測試**: [System Name and Description]
- **Engagement Type**: [Focused / Standard / Comprehensive]
- **Start Date**: [Date]
- **End Date**: [Date]
- **Primary Contact**: [Name, Email, Phone]
## 2. Objectives
The objective of this engagement is to 識別 安全 漏洞
in [System Name] through systematic 對抗性 測試. Specifically:
1. 評估 系統's resilience to 提示詞注入 attacks
2. 評估 access control mechanisms for AI-accessible tools and data
3. 測試 the effectiveness of 輸入/輸出 filtering mechanisms
4. 識別 data leakage and information disclosure risks
5. Provide actionable remediation recommendations
## 3. Scope
### In Scope
[List from Step 6]
### Out of Scope
[List from Step 6]
### 測試 Environment
- **Primary**: [Staging/Dev environment URL]
- **Secondary**: [Production environment, if authorized]
## 4. Methodology
測試 will follow the [methodology reference] framework, incorporating:
- OWASP LLM Top 10 漏洞 categories
- MITRE ATLAS 對抗性 ML taxonomy
- NIST AI Risk Management Framework considerations
## 5. Resource Requirements
- **Total Hours**: [from Resource Estimation]
- **Team Composition**: [Senior/Junior split]
- **Estimated API Costs**: [from Resource Estimation]
- **Calendar Duration**: [from Resource Estimation]
## 6. Deliverables
1. Executive 總結 (1-2 pages, non-technical)
2. Technical Findings Report (detailed, with evidence)
3. Remediation Roadmap (prioritized recommendations)
4. Raw 測試 Data (sanitized, optional)
## 7. Communication Plan
- Kickoff meeting: [Date]
- Weekly status updates: [Day/Time]
- Draft report delivery: [Date]
- Final report delivery: [Date]
- Findings walkthrough: [Date]
## 8. Authorization and Legal
[Reference the completed 授權 documents]
## 9. Acceptance
[Signature blocks]Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Scope creep during engagement | Boundaries not clearly defined | Reference the SOW for every out-of-scope request, document change requests formally |
| Client underestimates complexity | AI 紅隊演練 is unfamiliar | Share this scoping framework early, explain why each component needs 測試 |
| Third-party 授權 delays | Provider requires advance notice | Start the 授權 process 2-4 weeks before the engagement |
| Budget insufficient for comprehensive 測試 | Client has limited budget | Recommend a focused 評估 and provide a roadmap for future engagements |
| Stakeholder misalignment | Different expectations among client teams | Conduct a stakeholder 對齊 meeting before the engagement starts |
| Production 測試 restrictions | Client unwilling to 測試 in production | Negotiate staging environment access, document the gap in coverage |
關鍵要點
Effective scoping is the foundation of a valuable AI 紅隊 engagement. The critical practices are:
- Enumerate before estimating -- you cannot estimate effort without a complete target list. The discovery questionnaire and systematic enumeration prevent missed components.
- Prioritize by risk -- not all components need the same level of 測試. Focus effort on tool integrations, RAG pipelines, and core model endpoints where the business impact of 利用 is highest.
- Define boundaries explicitly -- ambiguity in scope leads to disputes, wasted effort, and legal risk. Document in-scope, out-of-scope, and conditionally-in-scope items separately.
- Account for AI-specific legal complexity -- third-party 授權, AI governance frameworks, and data handling requirements add overhead that traditional penetration 測試 scoping does not include.
- Match engagement type to maturity -- an organization's first AI 紅隊 engagement should be focused, not comprehensive. Build a relationship and expand scope over time.
Advanced Considerations
Evolving 攻擊 Landscape
The AI 安全 landscape evolves rapidly as both offensive techniques and defensive measures advance. Several trends shape the current state of play:
Increasing model capabilities create new attack surfaces. As models gain access to tools, code execution, web browsing, and computer use, each new capability introduces potential 利用 vectors that did not exist in earlier, text-only systems. The principle of least privilege becomes increasingly important as model capabilities expand.
安全 訓練 improvements are necessary but not sufficient. Model providers invest heavily in 安全 訓練 through RLHF, DPO, constitutional AI, and other 對齊 techniques. These improvements raise the bar for successful attacks but do not eliminate the fundamental 漏洞: models cannot reliably distinguish legitimate instructions from 對抗性 ones 因為 this distinction is not represented in the architecture.
Automated 紅隊演練 tools democratize 測試. Tools like NVIDIA's Garak, Microsoft's PyRIT, and Promptfoo enable organizations to conduct automated 安全 測試 without deep AI 安全 expertise. 然而, automated tools catch known patterns; novel attacks and business logic 漏洞 still require human creativity and domain knowledge.
Regulatory pressure drives organizational investment. The EU AI Act, NIST AI RMF, and industry-specific regulations increasingly require organizations to 評估 and mitigate AI-specific risks. This regulatory pressure is driving investment in AI 安全 programs, but many organizations are still in the early stages of building mature AI 安全 practices.
Cross-Cutting 安全 Principles
Several 安全 principles apply across all topics covered 在本 curriculum:
-
防禦-in-depth: No single defensive measure is sufficient. Layer multiple independent 防禦 so that failure of any single layer does not result in system compromise. 輸入 classification, 輸出 filtering, behavioral 監控, and architectural controls should all be present.
-
Assume breach: Design systems assuming that any individual component can be compromised. This mindset leads to better isolation, 監控, and incident response capabilities. When a 提示詞注入 succeeds, the blast radius should be minimized through architectural controls.
-
Least privilege: Grant models and 代理 only the minimum capabilities needed for their intended function. A customer service chatbot does not need file system access or code execution. Excessive capabilities magnify the impact of successful 利用.
-
Continuous 測試: AI 安全 is not a one-time 評估. Models change, 防禦 evolve, and new attack techniques are discovered regularly. 實作 continuous 安全 測試 as part of the development and deployment lifecycle.
-
Secure by default: Default configurations should be secure. Require explicit opt-in for risky capabilities, use allowlists rather than denylists, and err on the side of restriction rather than permissiveness.
Integration with Organizational 安全
AI 安全 does not exist in isolation — it must integrate with the organization's broader 安全 program:
| 安全 Domain | AI-Specific Integration |
|---|---|
| Identity and Access | API key management, model access controls, user 認證 for AI features |
| Data Protection | 訓練資料 classification, PII in prompts, data residency for model calls |
| Application 安全 | AI feature threat modeling, 提示詞注入 in SAST/DAST, secure AI design patterns |
| Incident Response | AI-specific playbooks, model behavior 監控, 提示詞注入 forensics |
| Compliance | AI regulatory mapping (EU AI Act, NIST), AI audit trails, model documentation |
| Supply Chain | Model provenance, dependency 安全, adapter/weight integrity verification |
class OrganizationalIntegration:
"""Framework for integrating AI 安全 with organizational 安全 programs."""
def __init__(self, org_config: dict):
self.config = org_config
self.gaps = []
def assess_maturity(self) -> dict:
"""評估 the organization's AI 安全 maturity."""
domains = {
"governance": self._check_governance(),
"technical_controls": self._check_technical(),
"監控": self._check_monitoring(),
"incident_response": self._check_ir(),
"訓練": self._check_training(),
}
overall = sum(d["score"] for d in domains.values()) / len(domains)
return {"domains": domains, "overall_maturity": round(overall, 1)}
def _check_governance(self) -> dict:
has_policy = self.config.get("ai_security_policy", False)
has_framework = self.config.get("risk_framework", False)
score = (int(has_policy) + int(has_framework)) * 2.5
return {"score": score, "max": 5.0}
def _check_technical(self) -> dict:
controls = ["input_classification", "output_filtering", "rate_limiting", "sandboxing"]
active = sum(1 for c in controls if self.config.get(c, False))
return {"score": active * 1.25, "max": 5.0}
def _check_monitoring(self) -> dict:
has_monitoring = self.config.get("ai_monitoring", False)
has_alerting = self.config.get("ai_alerting", False)
score = (int(has_monitoring) + int(has_alerting)) * 2.5
return {"score": score, "max": 5.0}
def _check_ir(self) -> dict:
has_playbook = self.config.get("ai_ir_playbook", False)
return {"score": 5.0 if has_playbook else 0.0, "max": 5.0}
def _check_training(self) -> dict:
has_training = self.config.get("ai_security_training", False)
return {"score": 5.0 if has_training else 0.0, "max": 5.0}Future Directions
Several research and industry trends will shape the evolution of this field:
- Formal methods for AI 安全: Development of mathematical frameworks that can provide bounded guarantees about model behavior under 對抗性 conditions
- Automated 紅隊演練 at scale: Continued improvement of automated 測試 tools that can discover novel 漏洞 without human guidance
- AI-assisted 防禦: Using AI systems to detect and respond to attacks on other AI systems, creating a dynamic attack-防禦 ecosystem
- Standardized 評估: Growing adoption of standardized benchmarks (HarmBench, JailbreakBench) that enable consistent measurement of progress
- Regulatory harmonization: Convergence of AI regulatory frameworks across jurisdictions, providing clearer requirements for organizations