紅隊 Methodology 概覽
A structured methodology for AI red team engagements: phases, deliverables, role definitions, and how AI-specific testing differs from traditional penetration testing.
AI 紅隊演練 adapts traditional 對抗性 安全 測試 to the unique characteristics of AI systems. Where a traditional penetration 測試 probes for software 漏洞 and misconfigurations, an AI 紅隊 evaluates whether a system can be manipulated through its natural language interface, its 訓練資料 pipeline, its tool integrations, or its 代理式 decision-making. The methodology must account for stochastic behavior, prompt-level attack surfaces, and harms that are semantic rather than technical.
How AI 紅隊演練 Differs
Before diving into methodology, it helps to 理解 what makes AI 紅隊演練 distinct from traditional penetration 測試:
| Dimension | Traditional Pen 測試 | AI 紅隊 |
|---|---|---|
| 攻擊面 | Code, configs, network | Prompts, 訓練資料, tool integrations, model behavior |
| 漏洞 type | CVEs, misconfigurations | Alignment failures, injection, hallucination, bias |
| Reproducibility | Usually deterministic | Stochastic: same 輸入 may produce different outputs |
| 利用 format | Code, scripts, payloads | Natural language prompts, 對抗性 inputs |
| Impact measurement | Confidentiality, integrity, availability | Plus: fairness, 安全, truthfulness, 對齊 |
| Tools | Scanners, fuzzers, 利用 frameworks | Prompt libraries, behavioral 測試 suites, eval frameworks |
| Success criteria | Binary (exploited or not) | Gradient (bypass rate, severity spectrum, harm taxonomy) |
Engagement Phases
An AI 紅隊 engagement follows a structured sequence of phases. Each phase produces specific deliverables that feed into the next.
Phase 1: Scoping and Rules of Engagement
Objective: Define what will be tested, how, and what constraints apply.
Activities:
- 識別 the AI system under 測試: model provider, deployment architecture, user-facing interfaces, tool integrations, data sources
- Define scope boundaries: which components, attack vectors, and harm categories are in scope
- Establish rules of engagement: rate limits, forbidden techniques, escalation procedures, data handling requirements
- Agree on success criteria and severity classification with stakeholders
Deliverable: Signed rules of engagement document and detailed 測試 plan.
See Scoping & Rules of Engagement for detailed guidance.
Phase 2: Reconnaissance
Objective: Map the AI system's 攻擊面 before attempting 利用.
Activities:
- Fingerprint 模型 provider and version through behavioral analysis
- Extract or infer the 系統提示詞
- Enumerate available tools, plugins, and external integrations
- 識別 data sources (RAG knowledge bases, web access, file uploads)
- Map trust boundaries between components
- Discover shadow AI deployments if scope includes organizational discovery
Deliverable: 攻擊面 map documenting all identified components, trust boundaries, and potential attack vectors.
See Advanced Reconnaissance for techniques.
Phase 3: Threat Modeling
Objective: Prioritize 測試 based on risk analysis.
Activities:
- Build attack trees for high-value objectives
- Map ATLAS tactics and techniques to the identified 攻擊面
- Analyze trust boundaries and data flows for injection opportunities
- Estimate cost and probability 對每個 attack path
- Prioritize paths by risk (probability multiplied by impact)
Deliverable: Prioritized 測試 plan derived from the 威脅模型.
See AI-Specific Threat Modeling for frameworks.
Phase 4: 利用
Objective: Execute the prioritized 測試 plan and attempt to demonstrate identified risks.
Activities:
- 測試 提示詞注入 vectors (direct, indirect, multi-turn, encoding)
- Attempt 安全 對齊 bypasses (越獄)
- Probe tool and 函式呼叫 for parameter injection, unauthorized access
- 測試 for data exfiltration through model-mediated channels
- 評估 bias, fairness, and harmful 輸出 generation
- 測試 代理 autonomy boundaries and escalation paths
- Document each 測試 with inputs, outputs, bypass rates, and reproducibility data
Deliverable: Raw findings log with full evidence chain 對每個 discovered 漏洞.
Phase 5: Evidence Collection and Documentation
Objective: Produce a complete, reproducible evidence package.
Activities:
- Capture full conversation logs 對每個 finding
- Record API requests and responses with timestamps
- Document bypass rates across multiple attempts (minimum 10 trials per technique)
- Preserve model version, temperature settings, and other configuration details
- Create proof-of-concept payloads that can be re-run for verification
Deliverable: Evidence package meeting chain-of-custody requirements.
See Evidence Collection for standards.
Phase 6: Reporting
Objective: Communicate findings to stakeholders in actionable form.
Activities:
- Classify findings by severity using the agreed framework
- Write finding descriptions with clear impact statements
- Provide remediation recommendations 對每個 finding
- Produce an executive summary for non-technical stakeholders
- Conduct a findings walkthrough with the technical team
Deliverable: Final report with executive summary, detailed findings, and remediation roadmap.
Team Composition
Effective AI 紅隊演練 requires a combination of skills that rarely exists in a single person:
| Role | Responsibilities | Key Skills |
|---|---|---|
| Engagement lead | Scoping, stakeholder management, reporting | Project management, communication, risk 評估 |
| Prompt engineer / injection specialist | Prompt injection 測試, 越獄, 安全 bypass | Creative writing, LLM behavior intuition, NLP knowledge |
| ML engineer | Model-level attacks, gradient-based methods, 訓練 pipeline 評估 | Machine learning, Python, 對抗性 ML research |
| Application 安全 tester | API 測試, infrastructure 評估, tool integration 安全 | Web 安全, API 安全, traditional pen 測試 |
| Domain expert | Bias 評估, harm 評估, policy compliance | Domain-specific knowledge (healthcare, finance, legal, etc.) |
| Automation engineer | 測試 harness development, scale 測試, metrics | Software engineering, data analysis, scripting |
Engagement Types
Different engagement types require different methodology emphases:
安全 評估
Focus on 對齊 bypasses, harmful content generation, and bias 測試. Heavy emphasis on creative prompt engineering and edge-case discovery. Often performed before model deployment.
Application 安全 評估
Focus on 提示詞注入, data exfiltration, tool abuse, and privilege escalation in a deployed application. Combines AI-specific 測試 with traditional application 安全 測試.
代理式 System 評估
Focus on tool-use boundaries, autonomous decision-making, multi-代理 trust, and escalation paths. Requires 理解 of the 代理 architecture and tool integrations. See 代理 利用 for techniques.
Continuous 紅隊演練
Ongoing 測試 program rather than a point-in-time engagement. Automated 測試 combined with periodic manual assessments. See Continuous 紅隊演練 for program design.
Methodology Anti-Patterns
Common mistakes that reduce AI 紅隊 effectiveness:
Ad-hoc 測試 without threat modeling. Jumping straight to "try some prompts" without 理解 系統 architecture, trust boundaries, or highest-risk paths. This leads to shallow coverage and missed critical 漏洞.
Binary pass/fail reporting. Reporting that a technique "doesn't work" after a few failed attempts. AI systems are stochastic; a technique that fails 9 times and succeeds once represents a real 漏洞 that must be reported with its bypass rate.
Ignoring the application layer. Focusing exclusively on 模型 and missing traditional web 漏洞 in the application wrapping it. API 認證, rate limiting, 輸入 validation, and session management all affect the AI system's 安全 posture.
測試 only the happy path of attacks. Trying the most common injection techniques and stopping when they fail. Sophisticated real-world attackers combine techniques, adapt to 防禦, and persist through failures.
Skipping reconnaissance. Attempting 利用 without first 理解 what model, version, 系統提示詞, and tools are in use. Recon findings dramatically change which attack techniques are most effective.
Try It Yourself
相關主題
- Scoping & Rules of Engagement - Detailed scoping guidance
- Advanced Reconnaissance - Recon techniques for AI targets
- AI-Specific Threat Modeling - Threat modeling frameworks
- Evidence Collection - Evidence standards
- Continuous 紅隊演練 - Ongoing program design
參考文獻
- MITRE (2024). ATLAS - 對抗性 Threat Landscape for AI Systems
- NIST (2024). AI Risk Management Framework (AI RMF 1.0)
- OWASP (2025). OWASP Top 10 for LLM Applications
- Anthropic (2024). "Challenges in 紅隊演練 AI Systems"
- Microsoft (2024). "AI 紅隊 最佳實務"
What is the most critical difference between AI 紅隊 reporting and traditional penetration 測試 reporting?