What is Scoping & Rules of Engagement?

Defining scope, rules of engagement, authorization boundaries, and success criteria for AI red team engagements, with templates and checklists for common engagement types.

What is AI-Specific Threat 模型ing?

Adapting STRIDE for AI systems, building attack trees for LLM applications, identifying AI-specific threat categories, and producing actionable threat models that drive red team test plans.

What is Evidence Collection & Chain of Custody (Tradecraft)?

Standards for capturing, preserving, and documenting AI red team findings: conversation logs, API traces, bypass rate measurement, and evidence packaging for reproducible reporting.

What is Purple Teaming for AI?

Collaborative attack-defense exercises for AI systems: structuring purple team engagements, real-time knowledge transfer, joint attack simulation, and measuring defensive improvement through iterative testing.

What is Continuous 紅隊演練 Programs?

Designing and operating ongoing AI red team programs with automated testing pipelines, metrics dashboards, KPI frameworks, alert-driven assessments, and integration with CI/CD and model deployment workflows.

紅隊 Methodology 概覽

Intermediate9 min readUpdated 2026-03-15

A structured methodology for AI red team engagements: phases, deliverables, role definitions, and how AI-specific testing differs from traditional penetration testing.

methodology red-team engagement phases tradecraft process

AI 紅隊演練 adapts traditional 對抗性安全測試 to the unique characteristics of AI systems. Where a traditional penetration 測試 probes for software 漏洞 and misconfigurations, an AI 紅隊 evaluates whether a system can be manipulated through its natural language interface, its 訓練資料 pipeline, its tool integrations, or its 代理式 decision-making. The methodology must account for stochastic behavior, prompt-level attack surfaces, and harms that are semantic rather than technical.

How AI 紅隊演練 Differs

Before diving into methodology, it helps to 理解 what makes AI 紅隊演練 distinct from traditional penetration 測試:

Dimension	Traditional Pen 測試	AI 紅隊
攻擊面	Code, configs, network	Prompts, 訓練資料, tool integrations, model behavior
漏洞 type	CVEs, misconfigurations	Alignment failures, injection, hallucination, bias
Reproducibility	Usually deterministic	Stochastic: same 輸入 may produce different outputs
利用 format	Code, scripts, payloads	Natural language prompts, 對抗性 inputs
Impact measurement	Confidentiality, integrity, availability	Plus: fairness, 安全, truthfulness, 對齊
Tools	Scanners, fuzzers, 利用 frameworks	Prompt libraries, behavioral 測試 suites, eval frameworks
Success criteria	Binary (exploited or not)	Gradient (bypass rate, severity spectrum, harm taxonomy)

Engagement Phases

An AI 紅隊 engagement follows a structured sequence of phases. Each phase produces specific deliverables that feed into the next.

Phase 1: Scoping and Rules of Engagement

Objective: Define what will be tested, how, and what constraints apply.

Activities:

識別 the AI system under 測試: model provider, deployment architecture, user-facing interfaces, tool integrations, data sources
Define scope boundaries: which components, attack vectors, and harm categories are in scope
Establish rules of engagement: rate limits, forbidden techniques, escalation procedures, data handling requirements
Agree on success criteria and severity classification with stakeholders

Deliverable: Signed rules of engagement document and detailed 測試 plan.

See Scoping & Rules of Engagement for detailed guidance.

Phase 2: Reconnaissance

Objective: Map the AI system's 攻擊面 before attempting 利用.

Activities:

Fingerprint 模型 provider and version through behavioral analysis
Extract or infer the 系統提示詞
Enumerate available tools, plugins, and external integrations
識別 data sources (RAG knowledge bases, web access, file uploads)
Map trust boundaries between components
Discover shadow AI deployments if scope includes organizational discovery

Deliverable: 攻擊面 map documenting all identified components, trust boundaries, and potential attack vectors.

See Advanced Reconnaissance for techniques.

Phase 3: Threat Modeling

Objective: Prioritize 測試 based on risk analysis.

Activities:

Build attack trees for high-value objectives
Map ATLAS tactics and techniques to the identified 攻擊面
Analyze trust boundaries and data flows for injection opportunities
Estimate cost and probability 對每個 attack path
Prioritize paths by risk (probability multiplied by impact)

Deliverable: Prioritized 測試 plan derived from the 威脅模型.

See AI-Specific Threat Modeling for frameworks.

Phase 4: 利用

Objective: Execute the prioritized 測試 plan and attempt to demonstrate identified risks.

Activities:

測試提示詞注入 vectors (direct, indirect, multi-turn, encoding)
Attempt 安全對齊 bypasses (越獄)
Probe tool and 函式呼叫 for parameter injection, unauthorized access
測試 for data exfiltration through model-mediated channels
評估 bias, fairness, and harmful 輸出 generation
測試代理 autonomy boundaries and escalation paths
Document each 測試 with inputs, outputs, bypass rates, and reproducibility data

Deliverable: Raw findings log with full evidence chain 對每個 discovered 漏洞.

Phase 5: Evidence Collection and Documentation

Objective: Produce a complete, reproducible evidence package.

Activities:

Capture full conversation logs 對每個 finding
Record API requests and responses with timestamps
Document bypass rates across multiple attempts (minimum 10 trials per technique)
Preserve model version, temperature settings, and other configuration details
Create proof-of-concept payloads that can be re-run for verification

Deliverable: Evidence package meeting chain-of-custody requirements.

See Evidence Collection for standards.

Phase 6: Reporting

Objective: Communicate findings to stakeholders in actionable form.

Activities:

Classify findings by severity using the agreed framework
Write finding descriptions with clear impact statements
Provide remediation recommendations 對每個 finding
Produce an executive summary for non-technical stakeholders
Conduct a findings walkthrough with the technical team

Deliverable: Final report with executive summary, detailed findings, and remediation roadmap.

Team Composition

Effective AI 紅隊演練 requires a combination of skills that rarely exists in a single person:

Role	Responsibilities	Key Skills
Engagement lead	Scoping, stakeholder management, reporting	Project management, communication, risk 評估
Prompt engineer / injection specialist	Prompt injection 測試, 越獄, 安全 bypass	Creative writing, LLM behavior intuition, NLP knowledge
ML engineer	Model-level attacks, gradient-based methods, 訓練 pipeline 評估	Machine learning, Python, 對抗性 ML research
Application 安全 tester	API 測試, infrastructure 評估, tool integration 安全	Web 安全, API 安全, traditional pen 測試
Domain expert	Bias 評估, harm 評估, policy compliance	Domain-specific knowledge (healthcare, finance, legal, etc.)
Automation engineer	測試 harness development, scale 測試, metrics	Software engineering, data analysis, scripting

Engagement Types

Different engagement types require different methodology emphases:

安全評估

Focus on 對齊 bypasses, harmful content generation, and bias 測試. Heavy emphasis on creative prompt engineering and edge-case discovery. Often performed before model deployment.

Application 安全評估

Focus on 提示詞注入, data exfiltration, tool abuse, and privilege escalation in a deployed application. Combines AI-specific 測試 with traditional application 安全測試.

代理式 System 評估

Focus on tool-use boundaries, autonomous decision-making, multi-代理 trust, and escalation paths. Requires 理解 of the 代理 architecture and tool integrations. See 代理利用 for techniques.

Continuous 紅隊演練

Ongoing 測試 program rather than a point-in-time engagement. Automated 測試 combined with periodic manual assessments. See Continuous 紅隊演練 for program design.

Methodology Anti-Patterns

Common mistakes that reduce AI 紅隊 effectiveness:

Ad-hoc 測試 without threat modeling. Jumping straight to "try some prompts" without 理解系統 architecture, trust boundaries, or highest-risk paths. This leads to shallow coverage and missed critical 漏洞.

Binary pass/fail reporting. Reporting that a technique "doesn't work" after a few failed attempts. AI systems are stochastic; a technique that fails 9 times and succeeds once represents a real 漏洞 that must be reported with its bypass rate.

Ignoring the application layer. Focusing exclusively on 模型 and missing traditional web 漏洞 in the application wrapping it. API 認證, rate limiting, 輸入 validation, and session management all affect the AI system's 安全 posture.

測試 only the happy path of attacks. Trying the most common injection techniques and stopping when they fail. Sophisticated real-world attackers combine techniques, adapt to 防禦, and persist through failures.

Skipping reconnaissance. Attempting 利用 without first 理解 what model, version, 系統提示詞, and tools are in use. Recon findings dramatically change which attack techniques are most effective.

Try It Yourself

Practice

Exercise: Plan a Mini-Engagement

Design a 測試 plan for a hypothetical AI chatbot application.

Step 1
Define a target: an AI chatbot that answers customer questions about a software product, has access to a 知識庫 via RAG, and can create support tickets through a tool integration. Define the scope and rules of engagement.
Step 2
Conduct reconnaissance by listing all components, trust boundaries, and data flows. Build a simple attack tree with at least three attack paths.
Step 3
Create a prioritized 測試 plan with at least 10 specific 測試 cases, each linked to a threat from your model. Include the attack technique, expected outcome, and severity if successful.

Success criteria: You have a complete mini-engagement plan covering scoping, reconnaissance, threat modeling, and a prioritized 測試 plan with specific 測試 cases.

參考文獻

MITRE (2024). ATLAS - 對抗性 Threat Landscape for AI Systems
NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
Anthropic (2024). "Challenges in 紅隊演練 AI Systems"
Microsoft (2024). "AI 紅隊最佳實務"

Knowledge Check

What is the most critical difference between AI 紅隊 reporting and traditional penetration 測試 reporting?

紅隊 Methodology 概覽

Intermediate9 min readUpdated 2026-03-15

A structured methodology for AI red team engagements: phases, deliverables, role definitions, and how AI-specific testing differs from traditional penetration testing.

methodology red-team engagement phases tradecraft process

How AI 紅隊演練 Differs

Before diving into methodology, it helps to 理解 what makes AI 紅隊演練 distinct from traditional penetration 測試:

Dimension	Traditional Pen 測試	AI 紅隊
攻擊面	Code, configs, network	Prompts, 訓練資料, tool integrations, model behavior
漏洞 type	CVEs, misconfigurations	Alignment failures, injection, hallucination, bias
Reproducibility	Usually deterministic	Stochastic: same 輸入 may produce different outputs
利用 format	Code, scripts, payloads	Natural language prompts, 對抗性 inputs
Impact measurement	Confidentiality, integrity, availability	Plus: fairness, 安全, truthfulness, 對齊
Tools	Scanners, fuzzers, 利用 frameworks	Prompt libraries, behavioral 測試 suites, eval frameworks
Success criteria	Binary (exploited or not)	Gradient (bypass rate, severity spectrum, harm taxonomy)

Engagement Phases

An AI 紅隊 engagement follows a structured sequence of phases. Each phase produces specific deliverables that feed into the next.

Phase 1: Scoping and Rules of Engagement

Objective: Define what will be tested, how, and what constraints apply.

Activities:

識別 the AI system under 測試: model provider, deployment architecture, user-facing interfaces, tool integrations, data sources
Define scope boundaries: which components, attack vectors, and harm categories are in scope
Establish rules of engagement: rate limits, forbidden techniques, escalation procedures, data handling requirements
Agree on success criteria and severity classification with stakeholders

Deliverable: Signed rules of engagement document and detailed 測試 plan.

See Scoping & Rules of Engagement for detailed guidance.

Phase 2: Reconnaissance

Objective: Map the AI system's 攻擊面 before attempting 利用.

Activities:

Fingerprint 模型 provider and version through behavioral analysis
Extract or infer the 系統提示詞
Enumerate available tools, plugins, and external integrations
識別 data sources (RAG knowledge bases, web access, file uploads)
Map trust boundaries between components
Discover shadow AI deployments if scope includes organizational discovery

Deliverable: 攻擊面 map documenting all identified components, trust boundaries, and potential attack vectors.

See Advanced Reconnaissance for techniques.

Phase 3: Threat Modeling

Objective: Prioritize 測試 based on risk analysis.

Activities:

Build attack trees for high-value objectives
Map ATLAS tactics and techniques to the identified 攻擊面
Analyze trust boundaries and data flows for injection opportunities
Estimate cost and probability 對每個 attack path
Prioritize paths by risk (probability multiplied by impact)

Deliverable: Prioritized 測試 plan derived from the 威脅模型.

See AI-Specific Threat Modeling for frameworks.

Phase 4: 利用

Objective: Execute the prioritized 測試 plan and attempt to demonstrate identified risks.

Activities:

測試提示詞注入 vectors (direct, indirect, multi-turn, encoding)
Attempt 安全對齊 bypasses (越獄)
Probe tool and 函式呼叫 for parameter injection, unauthorized access
測試 for data exfiltration through model-mediated channels
評估 bias, fairness, and harmful 輸出 generation
測試代理 autonomy boundaries and escalation paths
Document each 測試 with inputs, outputs, bypass rates, and reproducibility data

Deliverable: Raw findings log with full evidence chain 對每個 discovered 漏洞.

Phase 5: Evidence Collection and Documentation

Objective: Produce a complete, reproducible evidence package.

Activities:

Capture full conversation logs 對每個 finding
Record API requests and responses with timestamps
Document bypass rates across multiple attempts (minimum 10 trials per technique)
Preserve model version, temperature settings, and other configuration details
Create proof-of-concept payloads that can be re-run for verification

Deliverable: Evidence package meeting chain-of-custody requirements.

See Evidence Collection for standards.

Phase 6: Reporting

Objective: Communicate findings to stakeholders in actionable form.

Activities:

Classify findings by severity using the agreed framework
Write finding descriptions with clear impact statements
Provide remediation recommendations 對每個 finding
Produce an executive summary for non-technical stakeholders
Conduct a findings walkthrough with the technical team

Deliverable: Final report with executive summary, detailed findings, and remediation roadmap.

Team Composition

Effective AI 紅隊演練 requires a combination of skills that rarely exists in a single person:

Role	Responsibilities	Key Skills
Engagement lead	Scoping, stakeholder management, reporting	Project management, communication, risk 評估
Prompt engineer / injection specialist	Prompt injection 測試, 越獄, 安全 bypass	Creative writing, LLM behavior intuition, NLP knowledge
ML engineer	Model-level attacks, gradient-based methods, 訓練 pipeline 評估	Machine learning, Python, 對抗性 ML research
Application 安全 tester	API 測試, infrastructure 評估, tool integration 安全	Web 安全, API 安全, traditional pen 測試
Domain expert	Bias 評估, harm 評估, policy compliance	Domain-specific knowledge (healthcare, finance, legal, etc.)
Automation engineer	測試 harness development, scale 測試, metrics	Software engineering, data analysis, scripting

Engagement Types

Different engagement types require different methodology emphases:

安全評估

Focus on 對齊 bypasses, harmful content generation, and bias 測試. Heavy emphasis on creative prompt engineering and edge-case discovery. Often performed before model deployment.

Application 安全評估

Focus on 提示詞注入, data exfiltration, tool abuse, and privilege escalation in a deployed application. Combines AI-specific 測試 with traditional application 安全測試.

代理式 System 評估

Focus on tool-use boundaries, autonomous decision-making, multi-代理 trust, and escalation paths. Requires 理解 of the 代理 architecture and tool integrations. See 代理利用 for techniques.

Continuous 紅隊演練

Ongoing 測試 program rather than a point-in-time engagement. Automated 測試 combined with periodic manual assessments. See Continuous 紅隊演練 for program design.

Methodology Anti-Patterns

Common mistakes that reduce AI 紅隊 effectiveness:

Try It Yourself

Practice

Exercise: Plan a Mini-Engagement

Design a 測試 plan for a hypothetical AI chatbot application.

Step 1
Define a target: an AI chatbot that answers customer questions about a software product, has access to a 知識庫 via RAG, and can create support tickets through a tool integration. Define the scope and rules of engagement.
Step 2
Conduct reconnaissance by listing all components, trust boundaries, and data flows. Build a simple attack tree with at least three attack paths.
Step 3
Create a prioritized 測試 plan with at least 10 specific 測試 cases, each linked to a threat from your model. Include the attack technique, expected outcome, and severity if successful.

Success criteria: You have a complete mini-engagement plan covering scoping, reconnaissance, threat modeling, and a prioritized 測試 plan with specific 測試 cases.

參考文獻

MITRE (2024). ATLAS - 對抗性 Threat Landscape for AI Systems
NIST (2024). AI Risk Management Framework (AI RMF 1.0)
OWASP (2025). OWASP Top 10 for LLM Applications
Anthropic (2024). "Challenges in 紅隊演練 AI Systems"
Microsoft (2024). "AI 紅隊最佳實務"

Knowledge Check

What is the most critical difference between AI 紅隊 reporting and traditional penetration 測試 reporting?

紅隊 Methodology 概覽

How AI 紅隊演練 Differs

Engagement Phases

Phase 1: Scoping and Rules of Engagement

Phase 2: Reconnaissance

Phase 3: Threat Modeling

Phase 4: 利用

Phase 5: Evidence Collection and Documentation

Phase 6: Reporting

Team Composition

Engagement Types

安全評估

Application 安全評估

代理式 System 評估

Continuous 紅隊演練

Methodology Anti-Patterns

Try It Yourself

相關主題

參考文獻

Learning Path

紅隊 Methodology 概覽

How AI 紅隊演練 Differs

Engagement Phases

Phase 1: Scoping and Rules of Engagement

Phase 2: Reconnaissance

Phase 3: Threat Modeling

Phase 4: 利用

Phase 5: Evidence Collection and Documentation

Phase 6: Reporting

Team Composition

Engagement Types

安全評估

Application 安全評估

代理式 System 評估

Continuous 紅隊演練

Methodology Anti-Patterns

Try It Yourself

相關主題

參考文獻

Learning Path

紅隊 Methodology 概覽

Learning Path

Related articles

紅隊 Methodology 概覽

Learning Path

Related articles