How to Scope an AI Red Team Engagement
Comprehensive walkthrough for scoping AI red team engagements from initial client contact through statement of work, covering target enumeration, risk-based prioritization, resource estimation, boundary definition, and legal considerations.
Scoping is where the engagement succeeds or fails. An under-scoped engagement misses critical attack surfaces and produces an incomplete assessment. An over-scoped engagement burns budget on low-value targets and delivers findings the client cannot act on. AI systems introduce scoping complexities that traditional penetration testing does not encounter: model endpoints have different risk profiles than web applications, prompt injection testing requires different time estimates than API fuzzing, and the boundary between "testing the application" and "testing the model" is often unclear. This walkthrough provides a systematic process for getting the scope right.
Step 1: Initial Discovery Meeting
The scoping process begins with a structured discovery meeting. Prepare a questionnaire that covers the information you need without overwhelming the client:
# AI Red Team Scoping Questionnaire
## System Overview
1. What does your AI/ML system do? (One-paragraph description)
2. Who are the end users? (Internal employees, customers, general public)
3. What AI/ML models does it use? (OpenAI, Claude, open-source, custom-trained)
4. How are the models accessed? (API, self-hosted, cloud-hosted)
5. What data does the model have access to? (User data, internal documents, databases)
## Architecture
6. Is there a system prompt or instruction set? Can we see it?
7. Does the model use tools/function calling? If so, which tools?
8. Is there a RAG (Retrieval-Augmented Generation) pipeline?
9. What input preprocessing/validation exists?
10. What output filtering/monitoring exists?
## Business Context
11. What is the worst-case scenario if the AI system is compromised?
12. Have you experienced any AI-related security incidents?
13. Are there regulatory requirements (HIPAA, GDPR, financial regulations)?
14. Who are the stakeholders for the assessment results?
## Technical Environment
15. What environments are available for testing? (Production, staging, dev)
16. Are there rate limits or usage quotas we should be aware of?
17. Can we get test accounts and API credentials?
18. Is there documentation (API docs, architecture diagrams)?
## Constraints
19. What is off-limits? (Specific systems, data, techniques)
20. Are there time-of-day or day-of-week restrictions?
21. Are there third-party services we need authorization to test?
22. What is the desired timeline for the engagement?During the meeting, take detailed notes and ask follow-up questions. The answers to these questions directly determine the scope, timeline, and cost of the engagement.
Step 2: Target Enumeration
After the discovery meeting, enumerate every testable component. AI systems typically have more components than clients initially realize:
# scoping/target_enumeration.py
"""Systematic target enumeration for AI red team engagements."""
from dataclasses import dataclass
from enum import Enum
class ComponentType(Enum):
MODEL_ENDPOINT = "Model API Endpoint"
SYSTEM_PROMPT = "System Prompt / Instructions"
RAG_PIPELINE = "RAG / Retrieval Pipeline"
TOOL_INTEGRATION = "Tool / Function Calling"
MEMORY_SYSTEM = "Conversation Memory"
INPUT_PROCESSING = "Input Preprocessing"
OUTPUT_FILTERING = "Output Filtering"
AUTHENTICATION = "Authentication / Authorization"
DATA_STORE = "Data Store / Knowledge Base"
MONITORING = "Monitoring / Logging"
WEB_INTERFACE = "Web / Chat Interface"
API_GATEWAY = "API Gateway / Rate Limiting"
ORCHESTRATION = "Agent Orchestration Layer"
TRAINING_PIPELINE = "Model Training / Fine-tuning Pipeline"
EMBEDDING_SERVICE = "Embedding Service"
@dataclass
class ScopeTarget:
name: str
component_type: ComponentType
description: str
risk_level: str # Critical, High, Medium, Low
estimated_hours: float
dependencies: list[str]
notes: str = ""
def enumerate_targets(discovery_notes: dict) -> list[ScopeTarget]:
"""Generate a target list based on discovery meeting notes.
This function demonstrates the logic; in practice, you would
fill these in based on the actual client conversation.
"""
targets = []
# Every AI engagement includes these core targets
targets.append(ScopeTarget(
name="Primary LLM Endpoint",
component_type=ComponentType.MODEL_ENDPOINT,
description="The main model API endpoint serving user queries",
risk_level="High",
estimated_hours=16,
dependencies=[],
notes="Core attack surface for prompt injection and jailbreak testing",
))
targets.append(ScopeTarget(
name="System Prompt Configuration",
component_type=ComponentType.SYSTEM_PROMPT,
description="The instruction set that defines model behavior",
risk_level="High",
estimated_hours=8,
dependencies=["Primary LLM Endpoint"],
notes="Test for extraction, override, and manipulation",
))
# Conditional targets based on architecture
if discovery_notes.get("has_rag"):
targets.append(ScopeTarget(
name="RAG Retrieval Pipeline",
component_type=ComponentType.RAG_PIPELINE,
description="Document retrieval and context injection system",
risk_level="High",
estimated_hours=16,
dependencies=["Primary LLM Endpoint"],
notes="Test for retrieval poisoning, context injection, access control bypass",
))
if discovery_notes.get("has_tools"):
targets.append(ScopeTarget(
name="Tool / Function Calling",
component_type=ComponentType.TOOL_INTEGRATION,
description="External tools the model can invoke",
risk_level="Critical",
estimated_hours=24,
dependencies=["Primary LLM Endpoint"],
notes="Highest risk: tool abuse can lead to data exfiltration, unauthorized actions",
))
if discovery_notes.get("has_memory"):
targets.append(ScopeTarget(
name="Conversation Memory",
component_type=ComponentType.MEMORY_SYSTEM,
description="System that stores and recalls conversation history",
risk_level="Medium",
estimated_hours=8,
dependencies=["Primary LLM Endpoint"],
notes="Test for cross-session data leakage, memory poisoning",
))
if discovery_notes.get("has_input_filtering"):
targets.append(ScopeTarget(
name="Input Validation / Filtering",
component_type=ComponentType.INPUT_PROCESSING,
description="Preprocessing that sanitizes or filters user input",
risk_level="Medium",
estimated_hours=12,
dependencies=[],
notes="Test filter bypass techniques",
))
if discovery_notes.get("has_output_filtering"):
targets.append(ScopeTarget(
name="Output Content Filtering",
component_type=ComponentType.OUTPUT_FILTERING,
description="Post-processing that filters model responses",
risk_level="Medium",
estimated_hours=12,
dependencies=["Primary LLM Endpoint"],
notes="Test for filter evasion, encoding-based bypass",
))
if discovery_notes.get("has_web_interface"):
targets.append(ScopeTarget(
name="Web Chat Interface",
component_type=ComponentType.WEB_INTERFACE,
description="Browser-based interface for interacting with the AI",
risk_level="Medium",
estimated_hours=8,
dependencies=[],
notes="Standard web app testing plus AI-specific UI attacks",
))
if discovery_notes.get("has_auth"):
targets.append(ScopeTarget(
name="API Authentication",
component_type=ComponentType.AUTHENTICATION,
description="Authentication and authorization mechanisms",
risk_level="High",
estimated_hours=8,
dependencies=[],
notes="Test API key management, token handling, session isolation",
))
return targetsStep 3: Risk-Based Prioritization
Not all targets deserve equal testing time. Prioritize based on a combination of business impact and exploitability:
# Risk Prioritization Matrix
| Component | Business Impact | Exploitability | Priority |
|-----------|----------------|----------------|----------|
| Tool/Function Calling | Critical (can perform actions) | High (prompt injection is well-understood) | **P0 - Must test** |
| RAG Pipeline | High (data access) | High (retrieval poisoning is practical) | **P0 - Must test** |
| LLM Endpoint | High (core functionality) | High (many known attack vectors) | **P0 - Must test** |
| System Prompt | High (controls behavior) | Medium (model-dependent) | **P1 - Should test** |
| Authentication | High (access control) | Medium (standard web security) | **P1 - Should test** |
| Input Filtering | Medium (defense layer) | Medium (bypass techniques exist) | **P1 - Should test** |
| Output Filtering | Medium (defense layer) | Medium (encoding attacks) | **P2 - Nice to test** |
| Memory System | Medium (data leakage) | Low (requires multi-session) | **P2 - Nice to test** |
| Web Interface | Low (standard web) | Medium (well-understood) | **P3 - If time permits** |
| Monitoring | Low (passive) | Low (hard to attack) | **P3 - If time permits** |The priority determines the testing order and the minimum time allocation. P0 targets must be tested for the engagement to be valuable. P1 targets should be included in a standard engagement. P2 and P3 are included in comprehensive engagements.
Step 4: Engagement Type Selection
Based on the target list and client needs, recommend an engagement type:
# Engagement Types
## Focused Assessment (40-60 hours)
- **Scope**: P0 targets only
- **Best for**: Organizations new to AI red teaming, specific concern validation
- **Deliverable**: Executive summary + detailed findings for tested targets
- **Team**: 1 senior red teamer
- **Timeline**: 1-2 weeks
## Standard Assessment (80-120 hours)
- **Scope**: P0 + P1 targets
- **Best for**: Production AI applications, compliance requirements
- **Deliverable**: Full report with executive summary, technical findings, remediation guidance
- **Team**: 1 senior + 1 junior red teamer
- **Timeline**: 2-3 weeks
## Comprehensive Assessment (160-240 hours)
- **Scope**: All targets (P0 through P3)
- **Best for**: High-risk applications (healthcare, finance, government), M&A due diligence
- **Deliverable**: Full report + threat model + remediation roadmap + retest plan
- **Team**: 2 senior + 1 junior red teamer
- **Timeline**: 4-6 weeks
## Continuous Assessment (ongoing)
- **Scope**: P0 targets, automated + periodic manual testing
- **Best for**: Organizations with rapid deployment cycles
- **Deliverable**: Weekly automated reports + quarterly manual assessments
- **Team**: Part-time senior oversight + automated tooling
- **Timeline**: Ongoing, minimum 3-month commitmentStep 5: Resource Estimation
Calculate the total effort, cost, and timeline:
# scoping/resource_estimation.py
"""Estimate resources for an AI red team engagement."""
from dataclasses import dataclass
@dataclass
class ResourceEstimate:
total_hours: float
senior_hours: float
junior_hours: float
tool_costs: float
api_costs: float
calendar_days: int
report_hours: float
def estimate_resources(targets: list, engagement_type: str) -> ResourceEstimate:
"""Calculate resource requirements based on targets and engagement type."""
# Base testing hours from target enumeration
testing_hours = sum(t.estimated_hours for t in targets)
# Add overhead multipliers
overhead = {
"focused": {"setup": 0.1, "reporting": 0.25, "review": 0.1},
"standard": {"setup": 0.15, "reporting": 0.3, "review": 0.15},
"comprehensive": {"setup": 0.2, "reporting": 0.35, "review": 0.2},
}
multipliers = overhead.get(engagement_type, overhead["standard"])
setup_hours = testing_hours * multipliers["setup"]
report_hours = testing_hours * multipliers["reporting"]
review_hours = testing_hours * multipliers["review"]
total_hours = testing_hours + setup_hours + report_hours + review_hours
# Staffing split (senior does critical work, junior does standard testing)
senior_ratio = 0.7 if engagement_type == "focused" else 0.5
senior_hours = total_hours * senior_ratio
junior_hours = total_hours * (1 - senior_ratio)
# Tool and API costs
# Estimate based on typical usage patterns
tool_costs = 0 # Most tools are open source
api_query_count = testing_hours * 50 # ~50 queries per testing hour
api_cost_per_query = 0.01 # Average across models
api_costs = api_query_count * api_cost_per_query
# Calendar days (assuming 6 productive hours per day)
productive_hours_per_day = 6
if engagement_type == "focused":
team_size = 1
elif engagement_type == "standard":
team_size = 2
else:
team_size = 3
calendar_days = int(total_hours / (productive_hours_per_day * team_size)) + 5 # +5 for scheduling buffer
return ResourceEstimate(
total_hours=round(total_hours, 1),
senior_hours=round(senior_hours, 1),
junior_hours=round(junior_hours, 1),
tool_costs=round(tool_costs, 2),
api_costs=round(api_costs, 2),
calendar_days=calendar_days,
report_hours=round(report_hours, 1),
)Step 6: Defining Scope Boundaries
Clear boundaries prevent scope creep and legal issues. Document what is explicitly in scope, out of scope, and conditionally in scope:
# Scope Boundary Definition Template
## In Scope
- [ ] LLM API endpoint at [URL]
- [ ] System prompt and configuration review
- [ ] Prompt injection and jailbreak testing
- [ ] Tool/function calling security testing
- [ ] RAG pipeline poisoning testing
- [ ] Input and output filter bypass testing
- [ ] Authentication and authorization testing
- [ ] API rate limiting and abuse testing
- [ ] Conversation memory isolation testing
## Out of Scope
- [ ] Physical infrastructure and network security
- [ ] Social engineering of employees (unless specifically authorized)
- [ ] Denial-of-service attacks against production systems
- [ ] Testing of third-party AI model providers directly (OpenAI, Anthropic APIs)
- [ ] Source code review (unless specifically authorized)
- [ ] Model weight extraction or model stealing attacks
- [ ] Testing outside the defined time window
- [ ] Any testing in production environment (unless specifically authorized)
## Conditionally In Scope (requires separate authorization)
- [ ] Testing in production environment during low-traffic windows
- [ ] Automated scanning with high request volumes (>1000 req/min)
- [ ] Testing involving PII or sensitive data
- [ ] Cross-tenant isolation testing in multi-tenant systems
- [ ] Testing third-party integrations (requires third-party authorization)Step 7: Legal and Authorization Framework
AI red teaming introduces legal considerations beyond traditional penetration testing:
# Legal Considerations Checklist
## Authorization
- [ ] Written authorization from the system owner
- [ ] Authorization covers all environments being tested
- [ ] Time boundaries explicitly specified
- [ ] Techniques and tools explicitly authorized
- [ ] Data handling agreement in place
## Third-Party Considerations
- [ ] Reviewed AI provider's acceptable use policy
- [ ] Reviewed AI provider's penetration testing policy
- [ ] Obtained necessary third-party authorizations
- [ ] Documented compliance with provider terms of service
## Data Handling
- [ ] Agreement on how test data will be stored
- [ ] Agreement on how findings containing sensitive data will be handled
- [ ] Agreement on data retention period
- [ ] Agreement on secure destruction after engagement
## AI-Specific Legal Considerations
- [ ] Clarified whether generating harmful model outputs during testing
constitutes a policy violation under the provider's TOS
- [ ] Confirmed that adversarial testing is authorized under the
organization's AI governance framework
- [ ] Documented that any harmful outputs generated during testing
are part of authorized security testing
- [ ] Established protocol for handling truly dangerous model outputs
(e.g., instructions for real-world harm)
## Incident Protocol
- [ ] Defined what constitutes a "critical finding" requiring immediate reporting
- [ ] Established escalation path for critical findings
- [ ] Defined communication channel for urgent issues
- [ ] Agreed on the protocol if actual malicious activity is discoveredStep 8: Building the Statement of Work
Compile all scoping decisions into a formal Statement of Work:
# AI Red Team Engagement - Statement of Work
## 1. Engagement Overview
- **Client**: [Client Name]
- **System Under Test**: [System Name and Description]
- **Engagement Type**: [Focused / Standard / Comprehensive]
- **Start Date**: [Date]
- **End Date**: [Date]
- **Primary Contact**: [Name, Email, Phone]
## 2. Objectives
The objective of this engagement is to identify security vulnerabilities
in [System Name] through systematic adversarial testing. Specifically:
1. Assess the system's resilience to prompt injection attacks
2. Evaluate access control mechanisms for AI-accessible tools and data
3. Test the effectiveness of input/output filtering mechanisms
4. Identify data leakage and information disclosure risks
5. Provide actionable remediation recommendations
## 3. Scope
### In Scope
[List from Step 6]
### Out of Scope
[List from Step 6]
### Testing Environment
- **Primary**: [Staging/Dev environment URL]
- **Secondary**: [Production environment, if authorized]
## 4. Methodology
Testing will follow the [methodology reference] framework, incorporating:
- OWASP LLM Top 10 vulnerability categories
- MITRE ATLAS adversarial ML taxonomy
- NIST AI Risk Management Framework considerations
## 5. Resource Requirements
- **Total Hours**: [from Resource Estimation]
- **Team Composition**: [Senior/Junior split]
- **Estimated API Costs**: [from Resource Estimation]
- **Calendar Duration**: [from Resource Estimation]
## 6. Deliverables
1. Executive Summary (1-2 pages, non-technical)
2. Technical Findings Report (detailed, with evidence)
3. Remediation Roadmap (prioritized recommendations)
4. Raw Testing Data (sanitized, optional)
## 7. Communication Plan
- Kickoff meeting: [Date]
- Weekly status updates: [Day/Time]
- Draft report delivery: [Date]
- Final report delivery: [Date]
- Findings walkthrough: [Date]
## 8. Authorization and Legal
[Reference the completed authorization documents]
## 9. Acceptance
[Signature blocks]Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Scope creep during engagement | Boundaries not clearly defined | Reference the SOW for every out-of-scope request, document change requests formally |
| Client underestimates complexity | AI red teaming is unfamiliar | Share this scoping framework early, explain why each component needs testing |
| Third-party authorization delays | Provider requires advance notice | Start the authorization process 2-4 weeks before the engagement |
| Budget insufficient for comprehensive testing | Client has limited budget | Recommend a focused assessment and provide a roadmap for future engagements |
| Stakeholder misalignment | Different expectations among client teams | Conduct a stakeholder alignment meeting before the engagement starts |
| Production testing restrictions | Client unwilling to test in production | Negotiate staging environment access, document the gap in coverage |
Key Takeaways
Effective scoping is the foundation of a valuable AI red team engagement. The critical practices are:
- Enumerate before estimating -- you cannot estimate effort without a complete target list. The discovery questionnaire and systematic enumeration prevent missed components.
- Prioritize by risk -- not all components need the same level of testing. Focus effort on tool integrations, RAG pipelines, and core model endpoints where the business impact of exploitation is highest.
- Define boundaries explicitly -- ambiguity in scope leads to disputes, wasted effort, and legal risk. Document in-scope, out-of-scope, and conditionally-in-scope items separately.
- Account for AI-specific legal complexity -- third-party authorization, AI governance frameworks, and data handling requirements add overhead that traditional penetration test scoping does not include.
- Match engagement type to maturity -- an organization's first AI red team engagement should be focused, not comprehensive. Build a relationship and expand scope over time.
Advanced Considerations
Evolving Attack Landscape
The AI security landscape evolves rapidly as both offensive techniques and defensive measures advance. Several trends shape the current state of play:
Increasing model capabilities create new attack surfaces. As models gain access to tools, code execution, web browsing, and computer use, each new capability introduces potential exploitation vectors that did not exist in earlier, text-only systems. The principle of least privilege becomes increasingly important as model capabilities expand.
Safety training improvements are necessary but not sufficient. Model providers invest heavily in safety training through RLHF, DPO, constitutional AI, and other alignment techniques. These improvements raise the bar for successful attacks but do not eliminate the fundamental vulnerability: models cannot reliably distinguish legitimate instructions from adversarial ones because this distinction is not represented in the architecture.
Automated red teaming tools democratize testing. Tools like NVIDIA's Garak, Microsoft's PyRIT, and Promptfoo enable organizations to conduct automated security testing without deep AI security expertise. However, automated tools catch known patterns; novel attacks and business logic vulnerabilities still require human creativity and domain knowledge.
Regulatory pressure drives organizational investment. The EU AI Act, NIST AI RMF, and industry-specific regulations increasingly require organizations to assess and mitigate AI-specific risks. This regulatory pressure is driving investment in AI security programs, but many organizations are still in the early stages of building mature AI security practices.
Cross-Cutting Security Principles
Several security principles apply across all topics covered in this curriculum:
-
Defense-in-depth: No single defensive measure is sufficient. Layer multiple independent defenses so that failure of any single layer does not result in system compromise. Input classification, output filtering, behavioral monitoring, and architectural controls should all be present.
-
Assume breach: Design systems assuming that any individual component can be compromised. This mindset leads to better isolation, monitoring, and incident response capabilities. When a prompt injection succeeds, the blast radius should be minimized through architectural controls.
-
Least privilege: Grant models and agents only the minimum capabilities needed for their intended function. A customer service chatbot does not need file system access or code execution. Excessive capabilities magnify the impact of successful exploitation.
-
Continuous testing: AI security is not a one-time assessment. Models change, defenses evolve, and new attack techniques are discovered regularly. Implement continuous security testing as part of the development and deployment lifecycle.
-
Secure by default: Default configurations should be secure. Require explicit opt-in for risky capabilities, use allowlists rather than denylists, and err on the side of restriction rather than permissiveness.
Integration with Organizational Security
AI security does not exist in isolation — it must integrate with the organization's broader security program:
| Security Domain | AI-Specific Integration |
|---|---|
| Identity and Access | API key management, model access controls, user authentication for AI features |
| Data Protection | Training data classification, PII in prompts, data residency for model calls |
| Application Security | AI feature threat modeling, prompt injection in SAST/DAST, secure AI design patterns |
| Incident Response | AI-specific playbooks, model behavior monitoring, prompt injection forensics |
| Compliance | AI regulatory mapping (EU AI Act, NIST), AI audit trails, model documentation |
| Supply Chain | Model provenance, dependency security, adapter/weight integrity verification |
class OrganizationalIntegration:
"""Framework for integrating AI security with organizational security programs."""
def __init__(self, org_config: dict):
self.config = org_config
self.gaps = []
def assess_maturity(self) -> dict:
"""Assess the organization's AI security maturity."""
domains = {
"governance": self._check_governance(),
"technical_controls": self._check_technical(),
"monitoring": self._check_monitoring(),
"incident_response": self._check_ir(),
"training": self._check_training(),
}
overall = sum(d["score"] for d in domains.values()) / len(domains)
return {"domains": domains, "overall_maturity": round(overall, 1)}
def _check_governance(self) -> dict:
has_policy = self.config.get("ai_security_policy", False)
has_framework = self.config.get("risk_framework", False)
score = (int(has_policy) + int(has_framework)) * 2.5
return {"score": score, "max": 5.0}
def _check_technical(self) -> dict:
controls = ["input_classification", "output_filtering", "rate_limiting", "sandboxing"]
active = sum(1 for c in controls if self.config.get(c, False))
return {"score": active * 1.25, "max": 5.0}
def _check_monitoring(self) -> dict:
has_monitoring = self.config.get("ai_monitoring", False)
has_alerting = self.config.get("ai_alerting", False)
score = (int(has_monitoring) + int(has_alerting)) * 2.5
return {"score": score, "max": 5.0}
def _check_ir(self) -> dict:
has_playbook = self.config.get("ai_ir_playbook", False)
return {"score": 5.0 if has_playbook else 0.0, "max": 5.0}
def _check_training(self) -> dict:
has_training = self.config.get("ai_security_training", False)
return {"score": 5.0 if has_training else 0.0, "max": 5.0}Future Directions
Several research and industry trends will shape the evolution of this field:
- Formal methods for AI safety: Development of mathematical frameworks that can provide bounded guarantees about model behavior under adversarial conditions
- Automated red teaming at scale: Continued improvement of automated testing tools that can discover novel vulnerabilities without human guidance
- AI-assisted defense: Using AI systems to detect and respond to attacks on other AI systems, creating a dynamic attack-defense ecosystem
- Standardized evaluation: Growing adoption of standardized benchmarks (HarmBench, JailbreakBench) that enable consistent measurement of progress
- Regulatory harmonization: Convergence of AI regulatory frameworks across jurisdictions, providing clearer requirements for organizations