What is Scoping & Rules of Engagement?

Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.

What is AI-Specific Threat Modeling?

Adapting STRIDE for AI systems, building attack trees for LLM applications, identifying AI-specific threat categories, and producing actionable threat models that drive red team test plans.

What is Evidence Collection & Chain of Custody?

Standards for capturing, preserving, and documenting AI red team findings: conversation logs, API traces, bypass rate measurement, and evidence packaging for reproducible reporting.

What is Purple Teaming for AI?

Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.

What is Continuous Red Teaming Programs?

Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.

Red Team Methodology Overview

intermediate10 min readUpdated 2026-03-15

A structured methodology for AI red team engagements: phases, deliverables, role definitions, and how AI-specific testing differs from traditional penetration testing.

methodology red-team engagement phases tradecraft process

AI red teaming adapts traditional adversarial security testing to the unique characteristics of AI systems. Where a traditional penetration test probes for software vulnerabilities and misconfigurations, an AI red team evaluates whether a system can be manipulated through its natural language interface, its training data pipeline, its tool integrations, or its agentic decision-making. The methodology must account for stochastic behavior, prompt-level attack surfaces, and harms that are semantic rather than technical.

How AI Red Teaming Differs

Before diving into methodology, it helps to understand what makes AI red teaming distinct from traditional penetration testing:

Dimension	Traditional Pen Test	AI Red Team
Attack surface	Code, configs, network	Prompts, training data, tool integrations, model behavior
Vulnerability type	CVEs, misconfigurations	Alignment failures, injection, hallucination, bias
Reproducibility	Usually deterministic	Stochastic: same input may produce different outputs
Exploit format	Code, scripts, payloads	Natural language prompts, adversarial inputs
Impact measurement	Confidentiality, integrity, availability	Plus: fairness, safety, truthfulness, alignment
Tools	Scanners, fuzzers, exploit frameworks	Prompt libraries, behavioral testing suites, eval frameworks
Success criteria	Binary (exploited or not)	Gradient (bypass rate, severity spectrum, harm taxonomy)

Engagement Phases

An AI red team engagement follows a structured sequence of phases. Each phase produces specific deliverables that feed into the next.

Phase 1: Scoping and Rules of Engagement

Objective: Define what will be tested, how, and what constraints apply.

Activities:

Identify the AI system under test: model provider, deployment architecture, user-facing interfaces, tool integrations, data sources
Define scope boundaries: which components, attack vectors, and harm categories are in scope
Establish rules of engagement: rate limits, forbidden techniques, escalation procedures, data handling requirements
Agree on success criteria and severity classification with stakeholders

Deliverable: Signed rules of engagement document and detailed test plan.

See Scoping & Rules of Engagement for detailed guidance.

Phase 2: Reconnaissance

Objective: Map the AI system's attack surface before attempting exploitation.

Activities:

Fingerprint the model provider and version through behavioral analysis
Extract or infer the system prompt
Enumerate available tools, plugins, and external integrations
Identify data sources (RAG knowledge bases, web access, file uploads)
Map trust boundaries between components
Discover shadow AI deployments if scope includes organizational discovery

Deliverable: Attack surface map documenting all identified components, trust boundaries, and potential attack vectors.

See Advanced Reconnaissance for techniques.

Phase 3: Threat Modeling

Objective: Prioritize testing based on risk analysis.

Activities:

Build attack trees for high-value objectives
Map ATLAS tactics and techniques to the identified attack surface
Analyze trust boundaries and data flows for injection opportunities
Estimate cost and probability for each attack path
Prioritize paths by risk (probability multiplied by impact)

Deliverable: Prioritized test plan derived from the threat model.

See AI-Specific Threat Modeling for frameworks.

Phase 4: Exploitation

Objective: Execute the prioritized test plan and attempt to demonstrate identified risks.

Activities:

Test prompt injection vectors (direct, indirect, multi-turn, encoding)
Attempt safety alignment bypasses (jailbreaking)
Probe tool and function calling for parameter injection, unauthorized access
Test for data exfiltration through model-mediated channels
Evaluate bias, fairness, and harmful output generation
Test agent autonomy boundaries and escalation paths
Document each test with inputs, outputs, bypass rates, and reproducibility data

Deliverable: Raw findings log with full evidence chain for each discovered vulnerability.

Phase 5: Evidence Collection and Documentation

Objective: Produce a complete, reproducible evidence package.

Activities:

Capture full conversation logs for each finding
Record API requests and responses with timestamps
Document bypass rates across multiple attempts (minimum 10 trials per technique)
Preserve model version, temperature settings, and other configuration details
Create proof-of-concept payloads that can be re-run for verification

Deliverable: Evidence package meeting chain-of-custody requirements.

See Evidence Collection for standards.

Phase 6: Reporting

Objective: Communicate findings to stakeholders in actionable form.

Activities:

Classify findings by severity using the agreed framework
Write finding descriptions with clear impact statements
Provide remediation recommendations for each finding
Produce an executive summary for non-technical stakeholders
Conduct a findings walkthrough with the technical team

Deliverable: Final report with executive summary, detailed findings, and remediation roadmap.

Team Composition

Effective AI red teaming requires a combination of skills that rarely exists in a single person:

Role	Responsibilities	Key Skills
Engagement lead	Scoping, stakeholder management, reporting	Project management, communication, risk assessment
Prompt engineer / injection specialist	Prompt injection testing, jailbreaking, safety bypass	Creative writing, LLM behavior intuition, NLP knowledge
ML engineer	Model-level attacks, gradient-based methods, training pipeline assessment	Machine learning, Python, adversarial ML research
Application security tester	API testing, infrastructure assessment, tool integration security	Web security, API security, traditional pen testing
Domain expert	Bias evaluation, harm assessment, policy compliance	Domain-specific knowledge (healthcare, finance, legal, etc.)
Automation engineer	Test harness development, scale testing, metrics	Software engineering, data analysis, scripting

Engagement Types

Different engagement types require different methodology emphases:

Safety Evaluation

Focus on alignment bypasses, harmful content generation, and bias testing. Heavy emphasis on creative prompt engineering and edge-case discovery. Often performed before model deployment.

Application Security Assessment

Focus on prompt injection, data exfiltration, tool abuse, and privilege escalation in a deployed application. Combines AI-specific testing with traditional application security testing.

Agentic System Assessment

Focus on tool-use boundaries, autonomous decision-making, multi-agent trust, and escalation paths. Requires understanding of the agent architecture and tool integrations. See Agent Exploitation for techniques.

Continuous Red Teaming

Ongoing testing program rather than a point-in-time engagement. Automated testing combined with periodic manual assessments. See Continuous Red Teaming for program design.

Methodology Anti-Patterns

Common mistakes that reduce AI red team effectiveness:

Ad-hoc testing without threat modeling. Jumping straight to "try some prompts" without understanding the system architecture, trust boundaries, or highest-risk paths. This leads to shallow coverage and missed critical vulnerabilities.

Binary pass/fail reporting. Reporting that a technique "doesn't work" after a few failed attempts. AI systems are stochastic; a technique that fails 9 times and succeeds once represents a real vulnerability that must be reported with its bypass rate.

Ignoring the application layer. Focusing exclusively on the model and missing traditional web vulnerabilities in the application wrapping it. API authentication, rate limiting, input validation, and session management all affect the AI system's security posture.

Testing only the happy path of attacks. Trying the most common injection techniques and stopping when they fail. Sophisticated real-world attackers combine techniques, adapt to defenses, and persist through failures.

Skipping reconnaissance. Attempting exploitation without first understanding what model, version, system prompt, and tools are in use. Recon findings dramatically change which attack techniques are most effective.

Try It Yourself

Practice

Exercise: Plan a Mini-Engagement

Design a test plan for a hypothetical AI chatbot application.

Step 1
Define a target: an AI chatbot that answers customer questions about a software product, has access to a knowledge base via RAG, and can create support tickets through a tool integration. Define the scope and rules of engagement.
Step 2
Conduct reconnaissance by listing all components, trust boundaries, and data flows. Build a simple attack tree with at least three attack paths.
Step 3
Create a prioritized test plan with at least 10 specific test cases, each linked to a threat from your model. Include the attack technique, expected outcome, and severity if successful.

Success criteria: You have a complete mini-engagement plan covering scoping, reconnaissance, threat modeling, and a prioritized test plan with specific test cases.

Scoping & Rules of Engagement - Detailed scoping guidance
Advanced Reconnaissance - Recon techniques for AI targets
AI-Specific Threat Modeling - Threat modeling frameworks
Evidence Collection - Evidence standards
Continuous Red Teaming - Ongoing program design

References

MITRE (2024). ATLAS - Adversarial Threat Landscape for AI Systems
NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
Anthropic (2024). "Challenges in Red Teaming AI Systems"
Microsoft (2024). "AI Red Team Best Practices"

Knowledge Check

What is the most critical difference between AI red team reporting and traditional penetration test reporting?

Red Team Methodology Overview

How AI Red Teaming Differs

Engagement Phases

Phase 1: Scoping and Rules of Engagement

Phase 2: Reconnaissance

Phase 3: Threat Modeling

Phase 4: Exploitation

Phase 5: Evidence Collection and Documentation

Phase 6: Reporting

Team Composition

Engagement Types

Safety Evaluation

Application Security Assessment

Agentic System Assessment

Continuous Red Teaming

Methodology Anti-Patterns

Try It Yourself

References

Learning Path

Red Team Methodology Overview

How AI Red Teaming Differs

Engagement Phases

Phase 1: Scoping and Rules of Engagement

Phase 2: Reconnaissance

Phase 3: Threat Modeling

Phase 4: Exploitation

Phase 5: Evidence Collection and Documentation

Phase 6: Reporting

Team Composition

Engagement Types

Safety Evaluation

Application Security Assessment

Agentic System Assessment

Continuous Red Teaming

Methodology Anti-Patterns

Try It Yourself

References

Learning Path

Red Team Methodology Overview

Learning Path

Related articles

Red Team Methodology Overview

Learning Path

Related articles