任務後續分析方法論
進行完整的任務後續分析,包含經驗教訓、技術成效與方法論精進。
概覽
進行完整的任務後續分析,包含經驗教訓、技術成效與方法論精進。 本方法論詳解提供一套結構化、可重複執行的框架,讓你可以依組織的具體情境與合規需求進行調整。
方法論是專業紅隊演練的骨幹。缺乏結構化方法的評估,會淪為臨時性的操作,容易遺漏關鍵漏洞並產生不一致的結果。本詳解所建立的框架,能在確保全面覆蓋的同時,仍可因應不同類型的任務進行靈活調整。
背景與動機
本方法論回應了 HarmBench — github.com/centerforaisafety/HarmBench — 標準化評測 所指出的問題:組織在進行 AI 安全評估時普遍存在方法落差。此框架與 其他相關研究 的設計理念一致,並融入真實任務中汲取的經驗。
AI 安全領域日新月異,新的攻擊手法與防禦機制不斷湧現。健全的方法論必須同時具備納入新發展的彈性,以及專業評估所需的嚴謹度。本詳解即試圖在兩者之間取得平衡。
第一階段:規劃與範圍界定
有效的評估始於充分的規劃。此階段訂立任務的邊界、目標與成功標準。
#!/usr/bin/env python3
"""
Adversarial Simulation Design — Methodology Framework
Structured approach for AI red team assessments.
"""
import json
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
from typing import Optional
class EngagementType(Enum):
FULL_ASSESSMENT = "full_assessment"
FOCUSED_TEST = "focused_test"
COMPLIANCE_AUDIT = "compliance_audit"
CONTINUOUS_MONITORING = "continuous_monitoring"
class Priority(Enum):
CRITICAL = 4
HIGH = 3
MEDIUM = 2
LOW = 1
@dataclass
class EngagementScope:
"""Define the scope and boundaries of an AI red team engagement."""
client: str
engagement_type: EngagementType
start_date: str
end_date: str
target_systems: list = field(default_factory=list)
in_scope_techniques: list = field(default_factory=list)
out_of_scope: list = field(default_factory=list)
objectives: list = field(default_factory=list)
success_criteria: list = field(default_factory=list)
compliance_requirements: list = field(default_factory=list)
contact_info: dict = field(default_factory=dict)
def validate(self) -> list[str]:
"""Validate the engagement scope for completeness."""
issues = []
if not self.target_systems:
issues.append("No target systems defined")
if not self.objectives:
issues.append("No assessment objectives defined")
if not self.success_criteria:
issues.append("No success criteria defined")
if not self.contact_info:
issues.append("No client contact information")
return issues
def to_dict(self) -> dict:
return {
"client": self.client,
"type": self.engagement_type.value,
"dates": f"{self.start_date} to {self.end_date}",
"targets": self.target_systems,
"objectives": self.objectives,
"exclusions": self.out_of_scope,
}
# Example scope definition
scope = EngagementScope(
client="Acme Corp",
engagement_type=EngagementType.FULL_ASSESSMENT,
start_date="2026-03-20",
end_date="2026-04-03",
target_systems=[
"Customer service chatbot (chat.acme.com)",
"Internal knowledge base assistant (kb.internal.acme.com)",
"Code review AI (codereview.acme.com)",
],
in_scope_techniques=[
"Prompt injection (direct and indirect)",
"System prompt extraction",
"Data exfiltration through model outputs",
"Safety bypass and jailbreaking",
"Function calling abuse",
],
out_of_scope=[
"Denial of service attacks",
"Physical infrastructure testing",
"Social engineering of employees",
"Model weight extraction",
],
objectives=[
"Evaluate prompt injection resistance across all target systems",
"Test data exfiltration controls and output filtering",
"Assess function calling security in the code review AI",
"Verify compliance with OWASP LLM Top 10 2025",
],
success_criteria=[
"Complete testing of all in-scope techniques against each target",
"Document all findings with reproducible evidence",
"Deliver final report within 5 business days of testing completion",
],
)
validation_issues = scope.validate()
if validation_issues:
print(f"Scope issues: {validation_issues}")
else:
print("Scope validated successfully")
print(json.dumps(scope.to_dict(), indent=2))第二階段:偵察與分析
系統性的偵察會在測試開始前,先行盤點目標的架構、模型與防禦機制。
@dataclass
class ReconFinding:
"""A single reconnaissance finding."""
category: str
detail: str
confidence: str # high, medium, low
source: str
priority: Priority
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
class ReconFramework:
"""Structured reconnaissance framework for AI systems."""
def __init__(self, scope: EngagementScope):
self.scope = scope
self.findings: list[ReconFinding] = []
def passive_recon(self, target: str) -> list[ReconFinding]:
"""Gather information without directly interacting with the target."""
checklist = [
("Documentation Review", "Review public docs, API references, and changelogs"),
("Technology Stack", "Identify frameworks, cloud providers, and model providers"),
("Public Disclosures", "Check for prior vulnerability disclosures or bug bounties"),
("Job Postings", "Analyze job postings for technology and architecture clues"),
("Social Media", "Search for engineering blog posts and conference talks"),
]
findings = []
for category, description in checklist:
finding = ReconFinding(
category=category,
detail=f"{description} for {target}",
confidence="medium",
source="passive",
priority=Priority.MEDIUM,
)
findings.append(finding)
self.findings.append(finding)
return findings
def active_recon(self, target: str) -> list[ReconFinding]:
"""Actively probe the target to map its configuration."""
probes = [
("Model Identification", "Fingerprint the underlying model through behavioral analysis"),
("Defense Mapping", "Identify input filters, output sanitizers, and guardrails"),
("API Surface", "Map available endpoints, parameters, and error handling"),
("Rate Limiting", "Determine rate limits and throttling behavior"),
("Session Handling", "Analyze session management and context persistence"),
("Tool/Function Discovery", "Enumerate available tools and function calling capabilities"),
]
findings = []
for category, description in probes:
finding = ReconFinding(
category=category,
detail=f"{description} for {target}",
confidence="high",
source="active",
priority=Priority.HIGH,
)
findings.append(finding)
self.findings.append(finding)
return findings
def generate_attack_plan(self) -> dict:
"""Generate a prioritized attack plan based on recon findings."""
return {
"target_count": len(self.scope.target_systems),
"findings_count": len(self.findings),
"high_priority": sum(1 for f in self.findings if f.priority == Priority.HIGH),
"recommended_approach": "Start with highest-priority targets using "
"techniques most likely to succeed based on recon",
}
recon = ReconFramework(scope)
for target in scope.target_systems:
recon.passive_recon(target)
recon.active_recon(target)
plan = recon.generate_attack_plan()
print(json.dumps(plan, indent=2))第三階段:執行框架
以系統性的方式執行測試,並隨時記錄發現。執行階段是任務中耗時最多的部分,前面階段的品質會直接影響此階段的效率。
| 階段 | 時間分配 | 主要活動 | 交付物 |
|---|---|---|---|
| 規劃 | 10-15% | 範圍界定、排程、環境建置 | 任務計畫、範圍文件 |
| 偵察 | 15-20% | 被動研究、主動探測、盤點 | 偵察報告、攻擊計畫 |
| 測試 | 50-60% | 攻擊執行、驗證、反覆迭代 | 原始發現、證據日誌 |
| 回報 | 15-20% | 分析、撰寫、審閱、交付 | 最終報告、執行摘要 |
class ExecutionTracker:
"""Track execution progress and manage testing workflow."""
def __init__(self, scope: EngagementScope):
self.scope = scope
self.test_cases = []
self.completed = []
self.findings = []
def create_test_matrix(self) -> list[dict]:
"""Generate a test matrix from scope and recon findings."""
techniques = self.scope.in_scope_techniques
targets = self.scope.target_systems
matrix = []
for target in targets:
for technique in techniques:
matrix.append({
"target": target,
"technique": technique,
"priority": Priority.HIGH if "injection" in technique.lower() else Priority.MEDIUM,
"status": "pending",
"assigned_to": None,
"time_estimate_min": 30,
})
self.test_cases = matrix
return matrix
def update_status(self, target: str, technique: str, status: str, finding: dict = None):
"""Update the status of a test case."""
for tc in self.test_cases:
if tc["target"] == target and tc["technique"] == technique:
tc["status"] = status
if finding:
self.findings.append(finding)
break
def get_progress(self) -> dict:
"""Get current execution progress."""
total = len(self.test_cases)
completed = sum(1 for tc in self.test_cases if tc["status"] in ("completed", "blocked"))
in_progress = sum(1 for tc in self.test_cases if tc["status"] == "in_progress")
return {
"total_tests": total,
"completed": completed,
"in_progress": in_progress,
"pending": total - completed - in_progress,
"completion_pct": completed / max(total, 1) * 100,
"findings_count": len(self.findings),
}
tracker = ExecutionTracker(scope)
matrix = tracker.create_test_matrix()
print(f"Generated {len(matrix)} test cases across "
f"{len(scope.target_systems)} targets")
# Simulate execution progress
tracker.update_status(
scope.target_systems[0],
scope.in_scope_techniques[0],
"completed",
{"title": "Example finding", "severity": "High"}
)
print(json.dumps(tracker.get_progress(), indent=2))執行階段應依優先順序進行:先以最高衝擊的技術對付最關鍵的目標。一旦時間吃緊,此排序能確保最重要的測試已完成。要透過雙方約定的管道定期向客戶回報進度,特別是在發現需要立即處理的關鍵問題時。
第 3.5 階段:發現驗證與分級
進入回報階段前,先驗證所有發現是否達到專業品質標準。未驗證的發現會浪費客戶時間並損害評估者可信度。每一項發現都要通過以下條件,才能納入最終報告。
class FindingValidator:
"""Validate and triage findings before reporting."""
SEVERITY_CRITERIA = {
"Critical": {
"min_success_rate": 0.5,
"requires_validation": True,
"impact": "Direct data breach, unauthorized code execution, or safety bypass "
"with real-world harm potential",
},
"High": {
"min_success_rate": 0.3,
"requires_validation": True,
"impact": "System prompt extraction, significant safety bypass, "
"or unauthorized action execution",
},
"Medium": {
"min_success_rate": 0.2,
"requires_validation": True,
"impact": "Information disclosure, partial safety bypass, "
"or limited unauthorized behavior",
},
"Low": {
"min_success_rate": 0.1,
"requires_validation": False,
"impact": "Informational findings, theoretical vulnerabilities, "
"or minimal impact issues",
},
}
def validate_finding(self, finding: dict, test_runs: int = 5) -> dict:
"""Validate a finding meets quality standards."""
issues = []
if not finding.get("title"):
issues.append("Missing title")
if not finding.get("evidence"):
issues.append("Missing evidence")
if not finding.get("remediation"):
issues.append("Missing remediation guidance")
success_rate = finding.get("success_rate", 0)
severity = finding.get("severity", "Medium")
criteria = self.SEVERITY_CRITERIA.get(severity, self.SEVERITY_CRITERIA["Medium"])
if success_rate < criteria["min_success_rate"]:
issues.append(f"Success rate {success_rate:.0%} below minimum "
f"{criteria['min_success_rate']:.0%} for {severity} severity")
return {
"valid": len(issues) == 0,
"issues": issues,
"severity_appropriate": success_rate >= criteria["min_success_rate"],
}
def deduplicate(self, findings: list[dict]) -> list[dict]:
"""Remove duplicate findings that describe the same vulnerability."""
unique = []
seen_techniques = set()
for finding in sorted(findings, key=lambda f: f.get("success_rate", 0), reverse=True):
technique_key = finding.get("technique", "").lower().strip()
if technique_key not in seen_techniques:
unique.append(finding)
seen_techniques.add(technique_key)
else:
# Merge evidence into the existing finding
for existing in unique:
if existing.get("technique", "").lower().strip() == technique_key:
existing.setdefault("additional_evidence", []).append(
finding.get("evidence", "")
)
break
return unique
validator = FindingValidator()發現驗證是關鍵的品質關卡。在時間壓力下的任務中,很容易忍不住把每個觀察都列為發現。然而,讓客戶被大量低品質或未驗證的發現淹沒,反而會稀釋真正關鍵問題的影響力,並讓報告難以落實。請嚴格套用驗證標準,只呈現達到該嚴重度品質門檻的發現。
去重同樣重要。不同團隊成員可能透過不同路徑發現同一個漏洞。這些應合併為單一發現、以多筆證據呈現,而不是拆成多個發現。這樣能更清楚呈現漏洞樣貌,也避免膨脹發現數量。
第四階段:回報與溝通
把原始發現轉化為能驅動修復的可行動報告。報告是任務的主要交付物,必須對多元受眾都能有效溝通。
@dataclass
class Finding:
"""A validated red team finding."""
title: str
severity: str # Critical, High, Medium, Low, Informational
description: str
evidence: str
impact: str
remediation: list[str]
references: list[str]
cvss_score: Optional[float] = None
owasp_mapping: Optional[str] = None
mitre_atlas: Optional[str] = None
reproducible: bool = True
success_rate: float = 0.0
class ReportGenerator:
"""Generate professional red team assessment reports."""
def __init__(self, scope: EngagementScope):
self.scope = scope
self.findings: list[Finding] = []
def add_finding(self, finding: Finding):
self.findings.append(finding)
def generate_executive_summary(self) -> str:
critical = sum(1 for f in self.findings if f.severity == "Critical")
high = sum(1 for f in self.findings if f.severity == "High")
medium = sum(1 for f in self.findings if f.severity == "Medium")
low = sum(1 for f in self.findings if f.severity == "Low")
return f"""
## Executive Summary
Red team assessment of {self.scope.client} AI systems conducted
{self.scope.start_date} through {self.scope.end_date}.
**Overall Risk**: {"Critical" if critical > 0 else "High" if high > 0 else "Medium"}
| Severity | Count |
|----------|-------|
| Critical | {critical} |
| High | {high} |
| Medium | {medium} |
| Low | {low} |
| **Total** | **{len(self.findings)}** |
**Key Findings**: {len(self.findings)} vulnerabilities identified across
{len(self.scope.target_systems)} target systems. Immediate remediation
recommended for all Critical and High severity findings.
"""
def generate_full_report(self) -> str:
sections = [self.generate_executive_summary()]
for i, finding in enumerate(self.findings, 1):
sections.append(f"""
### Finding {i}: {finding.title}
**Severity**: {finding.severity}
**Reproducible**: {"Yes" if finding.reproducible else "No"}
**Success Rate**: {finding.success_rate:.0%}
**Description**: {finding.description}
**Evidence**: {finding.evidence}
**Impact**: {finding.impact}
**Remediation**:
{chr(10).join(f'{j}. {r}' for j, r in enumerate(finding.remediation, 1))}
""")
return "\n".join(sections)
# Example report generation
report = ReportGenerator(scope)
report.add_finding(Finding(
title="System Prompt Extraction via Role-Play",
severity="High",
description="The customer service chatbot revealed its full system prompt "
"when a role-play technique was used to request configuration details.",
evidence="See Appendix A for full request/response logs",
impact="Attackers can understand the full system configuration, enabling "
"targeted attacks against specific safety measures.",
remediation=[
"Implement output filtering for system prompt content",
"Add instruction hierarchy enforcement to resist role-play overrides",
"Deploy monitoring for system prompt extraction attempts",
],
references=["OWASP LLM01", "MITRE ATLAS AML.T0051"],
owasp_mapping="LLM01 — Prompt Injection",
mitre_atlas="AML.T0051",
reproducible=True,
success_rate=0.8,
))
print(report.generate_full_report())第五階段:任務後續活動
任務並不因報告交付而結束。後續活動確保發現能真正驅動安全改善,並讓方法論依據經驗教訓持續演進。
發現說明會:安排一場發現詳解會議,同時邀請技術與領導層利害關係人。盡可能以現場演示逐一走過每項發現。高階主管需要了解業務風險;工程師則需要了解根本原因與修復步驟。為不同受眾準備不同的簡報。
修復支援:主動協助驗證修復實作。當客戶的工程團隊部署修復後,針對特定發現進行回測,確認漏洞已被處理。此回測階段通常比原任務短,能有效確保修復的實際成效。
經驗教訓:與評估團隊召開內部經驗教訓會議。記錄哪些做法有效、哪些不佳,以及方法論應如何調整以利未來任務。關鍵的自省問題包括:
| 問題 | 目的 |
|---|---|
| 對此類目標最有效的技術是哪些? | 改善未來類似任務的技術選擇 |
| 是否有範圍缺口遺漏了重要的攻擊面? | 精進範圍界定方法論 |
| 時間分配是否與實際工時相符? | 改善未來任務的估算 |
| 工具是否有限制影響覆蓋率? | 驅動工具開發優先順序 |
| 報告格式是否有效傳達發現? | 精進報告樣板 |
知識庫更新:將任務期間發現的新漏洞、有效載荷與防禦繞過技術加入內部知識庫。這些機構知識能造福未來的評估團隊,並確保組織能持續精進紅隊能力。
客戶關係維繫:在合理的修復期(通常為 30-60 天)後追蹤客戶,了解修復進度。這展現你對他們安全態勢的長期承諾,也可能帶來後續的驗證測試或擴大範圍的任務。
參考文獻與延伸閱讀
- HarmBench — github.com/centerforaisafety/HarmBench — 標準化評測
- OWASP LLM Top 10 2025 — 大型語言模型應用的主要安全風險
- MITRE ATLAS — AI 系統的對抗性威脅景觀
- Perez et al. 2022 — "Red Teaming Language Models with Language Models"
為什麼即使在時間有限的評估中,偵察仍應佔用總任務時間的 15-20%?
在決定防禦元件採取 fail-open 或 fail-closed 時,最重要的考量因素是什麼?