AI 紅隊 Evidence Collection
Systematic evidence collection methodologies for AI red team engagements, including artifact preservation, finding documentation, and chain of custody procedures.
概覽
Evidence collection during AI 紅隊 engagements serves a fundamentally different purpose than evidence collection during incident response. In incident response, you are reconstructing what happened. In a 紅隊 engagement, you are creating a record of what you did, what you found, and what it means for the organization's 安全 posture. The evidence must be detailed enough to reproduce findings, credible enough to drive remediation decisions, and structured enough to be analyzed across multiple engagements over time.
AI 紅隊演練 introduces evidence types that do not exist in traditional penetration 測試. When you successfully 越獄 an LLM, the "evidence" is a conversation — a sequence of natural language messages that led 模型 to behave outside its intended boundaries. When you find a 提示詞注入 漏洞 in a RAG system, the evidence includes the injected document, the retrieval context, and 模型's compromised 輸出. When you discover that a model leaks 訓練資料, the evidence is a set of generated outputs that match private data with statistical significance.
These artifacts are fragile. Conversations are ephemeral by default. Model behavior is stochastic, meaning a successful attack may not reproduce identically on the next attempt. 雲端-hosted models may be updated at any time, changing the 攻擊面 without notice. A robust evidence collection methodology must account for all of these challenges.
This article covers the design of evidence collection systems for AI 紅隊 engagements, from planning what to capture through preserving artifacts with cryptographic integrity through generating reports that drive remediation.
Evidence Collection Framework
Defining the Evidence Taxonomy
AI 紅隊 evidence falls into several categories, each requiring different collection and preservation methods.
import json
import hashlib
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
from pathlib import Path
class EvidenceType(Enum):
"""Types of evidence collected during AI 紅隊 engagements."""
CONVERSATION = "conversation" # Full conversation logs
PROMPT_PAYLOAD = "prompt_payload" # Specific attack prompts
MODEL_RESPONSE = "model_response" # Model outputs of interest
SCREENSHOT = "screenshot" # Visual evidence
CONFIGURATION = "configuration" # System/model configuration
NETWORK_CAPTURE = "network_capture" # API traffic captures
BEHAVIORAL_OBSERVATION = "behavioral_observation" # Noted behaviors
RETRIEVAL_CONTEXT = "retrieval_context" # RAG retrieval results
TOOL_OUTPUT = "tool_output" # 輸出 from 測試 tools
METRIC_DATA = "metric_data" # Quantitative measurements
class SeverityLevel(Enum):
"""Severity levels for 紅隊 findings."""
INFORMATIONAL = "informational"
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
@dataclass
class EvidenceItem:
"""A single piece of evidence from a 紅隊 engagement."""
evidence_id: str
evidence_type: EvidenceType
timestamp: str
title: str
description: str
content: str | dict | bytes
engagement_id: str
finding_id: Optional[str] = None
severity: SeverityLevel = SeverityLevel.INFORMATIONAL
tags: list[str] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
integrity_hash: str = ""
collector: str = ""
def compute_hash(self) -> str:
"""Compute SHA-256 hash of evidence content for integrity."""
if isinstance(self.content, bytes):
content_bytes = self.content
elif isinstance(self.content, dict):
content_bytes = json.dumps(
self.content, sort_keys=True
).encode("utf-8")
else:
content_bytes = str(self.content).encode("utf-8")
self.integrity_hash = hashlib.sha256(content_bytes).hexdigest()
return self.integrity_hash
@dataclass
class Finding:
"""A 紅隊 finding supported by evidence."""
finding_id: str
title: str
description: str
severity: SeverityLevel
attack_category: str
attack_vector: str
impact: str
remediation: str
evidence_ids: list[str]
reproducibility: str # always, usually, sometimes, once
cvss_score: Optional[float] = None
cwe_id: Optional[str] = None
timestamp: str = ""
engagement_id: str = ""
class RedTeamEvidenceCollector:
"""Systematic evidence collection for AI 紅隊 engagements."""
def __init__(
self,
engagement_id: str,
output_dir: str,
collector_name: str,
):
self.engagement_id = engagement_id
self.output_dir = Path(output_dir) / engagement_id
self.output_dir.mkdir(parents=True, exist_ok=True)
self.collector_name = collector_name
self.evidence_items: list[EvidenceItem] = []
self.findings: list[Finding] = []
self._evidence_log: list[dict] = []
def collect_conversation(
self,
messages: list[dict],
title: str,
description: str,
model_id: str = "",
severity: SeverityLevel = SeverityLevel.INFORMATIONAL,
tags: Optional[list[str]] = None,
) -> str:
"""
Collect a conversation as evidence.
Args:
messages: List of message dicts with 'role' and 'content'.
title: Short title for the evidence.
description: Detailed description of what this shows.
model_id: 模型 targeted.
severity: Severity level of the finding.
tags: Optional tags for categorization.
Returns:
Evidence ID.
"""
evidence = EvidenceItem(
evidence_id=f"EV-{uuid.uuid4().hex[:12]}",
evidence_type=EvidenceType.CONVERSATION,
timestamp=datetime.now(timezone.utc).isoformat(),
title=title,
description=description,
content={
"messages": messages,
"message_count": len(messages),
"model_id": model_id,
},
engagement_id=self.engagement_id,
severity=severity,
tags=tags or [],
metadata={"model_id": model_id},
collector=self.collector_name,
)
evidence.compute_hash()
self._store_evidence(evidence)
return evidence.evidence_id
def collect_prompt_payload(
self,
payload: str,
title: str,
attack_technique: str,
target_model: str = "",
effectiveness: str = "", # success, partial, failure
response_summary: str = "",
severity: SeverityLevel = SeverityLevel.INFORMATIONAL,
) -> str:
"""
Collect a specific attack prompt payload as evidence.
Args:
payload: The attack prompt text.
title: Short title for the evidence.
attack_technique: Category of attack (越獄, injection, etc.)
target_model: 模型 this payload targets.
effectiveness: Whether the payload was successful.
response_summary: 總結 of how 模型 responded.
severity: Severity level.
Returns:
Evidence ID.
"""
evidence = EvidenceItem(
evidence_id=f"EV-{uuid.uuid4().hex[:12]}",
evidence_type=EvidenceType.PROMPT_PAYLOAD,
timestamp=datetime.now(timezone.utc).isoformat(),
title=title,
description=(
f"攻擊 technique: {attack_technique}. "
f"Target: {target_model}. "
f"Effectiveness: {effectiveness}."
),
content={
"payload": payload,
"attack_technique": attack_technique,
"target_model": target_model,
"effectiveness": effectiveness,
"response_summary": response_summary,
},
engagement_id=self.engagement_id,
severity=severity,
tags=[attack_technique, effectiveness],
metadata={
"target_model": target_model,
"attack_technique": attack_technique,
},
collector=self.collector_name,
)
evidence.compute_hash()
self._store_evidence(evidence)
return evidence.evidence_id
def collect_behavioral_observation(
self,
observation: str,
title: str,
category: str,
severity: SeverityLevel = SeverityLevel.INFORMATIONAL,
supporting_evidence_ids: Optional[list[str]] = None,
) -> str:
"""
Collect a behavioral observation about the AI system.
Args:
observation: Detailed description of observed behavior.
title: Short title.
category: Category (safety_bypass, data_leak, etc.)
severity: Severity level.
supporting_evidence_ids: IDs of related evidence.
Returns:
Evidence ID.
"""
evidence = EvidenceItem(
evidence_id=f"EV-{uuid.uuid4().hex[:12]}",
evidence_type=EvidenceType.BEHAVIORAL_OBSERVATION,
timestamp=datetime.now(timezone.utc).isoformat(),
title=title,
description=observation,
content={
"observation": observation,
"category": category,
"supporting_evidence": supporting_evidence_ids or [],
},
engagement_id=self.engagement_id,
severity=severity,
tags=[category],
collector=self.collector_name,
)
evidence.compute_hash()
self._store_evidence(evidence)
return evidence.evidence_id
def create_finding(
self,
title: str,
description: str,
severity: SeverityLevel,
attack_category: str,
attack_vector: str,
impact: str,
remediation: str,
evidence_ids: list[str],
reproducibility: str = "usually",
cvss_score: Optional[float] = None,
cwe_id: Optional[str] = None,
) -> str:
"""
Create a finding supported by collected evidence.
Args:
title: Finding title.
description: Detailed description.
severity: Severity level.
attack_category: Category of attack.
attack_vector: How the attack works.
impact: What 攻擊者 can achieve.
remediation: Recommended fixes.
evidence_ids: IDs of supporting evidence.
reproducibility: How reliably the finding reproduces.
cvss_score: Optional CVSS score.
cwe_id: Optional CWE identifier.
Returns:
Finding ID.
"""
finding = Finding(
finding_id=f"FND-{uuid.uuid4().hex[:12]}",
title=title,
description=description,
severity=severity,
attack_category=attack_category,
attack_vector=attack_vector,
impact=impact,
remediation=remediation,
evidence_ids=evidence_ids,
reproducibility=reproducibility,
cvss_score=cvss_score,
cwe_id=cwe_id,
timestamp=datetime.now(timezone.utc).isoformat(),
engagement_id=self.engagement_id,
)
# Verify all evidence IDs exist
existing_ids = {e.evidence_id for e in self.evidence_items}
missing = set(evidence_ids) - existing_ids
if missing:
raise ValueError(
f"Evidence IDs not found: {missing}. "
f"Collect evidence before creating findings."
)
self.findings.append(finding)
# Update severity on linked evidence items
for ev in self.evidence_items:
if ev.evidence_id in evidence_ids:
ev.finding_id = finding.finding_id
self._save_finding(finding)
return finding.finding_id
def _store_evidence(self, evidence: EvidenceItem) -> None:
"""Store evidence to disk and in-memory index."""
self.evidence_items.append(evidence)
# Write evidence file
evidence_dir = self.output_dir / "evidence"
evidence_dir.mkdir(exist_ok=True)
evidence_data = {
"evidence_id": evidence.evidence_id,
"evidence_type": evidence.evidence_type.value,
"timestamp": evidence.timestamp,
"title": evidence.title,
"description": evidence.description,
"content": evidence.content if not isinstance(evidence.content, bytes) else "<binary>",
"engagement_id": evidence.engagement_id,
"finding_id": evidence.finding_id,
"severity": evidence.severity.value,
"tags": evidence.tags,
"metadata": evidence.metadata,
"integrity_hash": evidence.integrity_hash,
"collector": evidence.collector,
}
file_path = evidence_dir / f"{evidence.evidence_id}.json"
with open(str(file_path), "w") as f:
json.dump(evidence_data, f, indent=2)
# Append to evidence log
self._evidence_log.append({
"evidence_id": evidence.evidence_id,
"timestamp": evidence.timestamp,
"type": evidence.evidence_type.value,
"title": evidence.title,
"hash": evidence.integrity_hash,
})
self._save_evidence_log()
def _save_finding(self, finding: Finding) -> None:
"""Save a finding to disk."""
findings_dir = self.output_dir / "findings"
findings_dir.mkdir(exist_ok=True)
finding_data = {
"finding_id": finding.finding_id,
"title": finding.title,
"description": finding.description,
"severity": finding.severity.value,
"attack_category": finding.attack_category,
"attack_vector": finding.attack_vector,
"impact": finding.impact,
"remediation": finding.remediation,
"evidence_ids": finding.evidence_ids,
"reproducibility": finding.reproducibility,
"cvss_score": finding.cvss_score,
"cwe_id": finding.cwe_id,
"timestamp": finding.timestamp,
"engagement_id": finding.engagement_id,
}
file_path = findings_dir / f"{finding.finding_id}.json"
with open(str(file_path), "w") as f:
json.dump(finding_data, f, indent=2)
def _save_evidence_log(self) -> None:
"""Save the evidence collection log (append-only audit trail)."""
log_path = self.output_dir / "evidence_log.json"
with open(str(log_path), "w") as f:
json.dump(self._evidence_log, f, indent=2)Automated Evidence Capture During 測試
Wrapping API Calls for Automatic Capture
During active 紅隊 測試, manually documenting every API call is impractical. Instead, wrap the API client to automatically capture all interactions as evidence.
import time
from typing import Any
class EvidenceCapturingClient:
"""Wraps an LLM API client to automatically capture
all interactions as forensic evidence."""
def __init__(
self,
api_client: Any,
evidence_collector: RedTeamEvidenceCollector,
target_model: str,
auto_tag: bool = True,
):
self.api_client = api_client
self.collector = evidence_collector
self.target_model = target_model
self.auto_tag = auto_tag
self.interaction_count = 0
def send_message(
self,
messages: list[dict],
attack_technique: str = "unknown",
notes: str = "",
**kwargs,
) -> dict:
"""
Send a message through the API client and automatically
capture the interaction as evidence.
Args:
messages: The messages to send.
attack_technique: What attack 這是 part of.
notes: Red teamer's notes about this interaction.
**kwargs: Additional arguments for the API client.
Returns:
The API response dict.
"""
self.interaction_count += 1
start_time = time.time()
# Call the actual API
response = self.api_client.chat(messages=messages, **kwargs)
elapsed_ms = (time.time() - start_time) * 1000
# Build full conversation with response
full_messages = list(messages)
response_text = self._extract_response_text(response)
full_messages.append({
"role": "assistant",
"content": response_text,
})
# Auto-capture as evidence
self.collector.collect_conversation(
messages=full_messages,
title=f"Interaction #{self.interaction_count}: {attack_technique}",
description=(
f"Automated capture during {attack_technique} 測試. "
f"Latency: {elapsed_ms:.0f}ms. "
f"{notes}"
),
model_id=self.target_model,
tags=[attack_technique, "auto_captured"],
)
return response
def _extract_response_text(self, response: Any) -> str:
"""Extract text from various API response formats."""
if isinstance(response, dict):
# OpenAI format
choices = response.get("choices", [])
if choices:
return choices[0].get("message", {}).get("content", "")
# Anthropic format
content = response.get("content", [])
if content and isinstance(content, list):
return content[0].get("text", "")
if isinstance(response, str):
return response
return str(response)Reproducibility 測試
A finding that cannot be reproduced has limited value. After identifying a potential 漏洞, 測試 its reproducibility systematically.
def test_reproducibility(
client: EvidenceCapturingClient,
attack_messages: list[dict],
success_criteria: callable,
num_trials: int = 10,
attack_technique: str = "unknown",
) -> dict:
"""
測試 the reproducibility of an attack by running it multiple
times and checking success criteria.
Args:
client: Evidence-capturing API client.
attack_messages: The attack messages to send.
success_criteria: Function that takes the response text
and returns True if the attack succeeded.
num_trials: Number of reproduction attempts.
attack_technique: Category of attack being tested.
Returns:
Reproducibility 評估 dict.
"""
results = []
for trial in range(num_trials):
response = client.send_message(
messages=attack_messages,
attack_technique=f"{attack_technique}_repro_trial_{trial + 1}",
notes=f"Reproducibility trial {trial + 1}/{num_trials}",
)
response_text = client._extract_response_text(response)
success = success_criteria(response_text)
results.append({
"trial": trial + 1,
"success": success,
"response_length": len(response_text),
})
success_count = sum(1 for r in results if r["success"])
success_rate = success_count / num_trials
if success_rate >= 0.9:
reproducibility = "always"
elif success_rate >= 0.6:
reproducibility = "usually"
elif success_rate >= 0.2:
reproducibility = "sometimes"
else:
reproducibility = "rarely"
return {
"total_trials": num_trials,
"successes": success_count,
"success_rate": success_rate,
"reproducibility": reproducibility,
"results": results,
}Report Generation
Producing Engagement Reports
At the conclusion of a 紅隊 engagement, generate a structured report that presents findings with supporting evidence. The report should be accessible to both technical and executive audiences.
class EngagementReportGenerator:
"""Generate structured reports from 紅隊 evidence."""
def generate_report(
self,
collector: RedTeamEvidenceCollector,
engagement_name: str,
scope_description: str,
executive_summary: str = "",
) -> str:
"""
Generate a full engagement report.
Args:
collector: The evidence collector with all findings.
engagement_name: Name of the engagement.
scope_description: Description of what was in scope.
executive_summary: Optional pre-written executive summary.
Returns:
Report as formatted text.
"""
findings = sorted(
collector.findings,
key=lambda f: self._severity_order(f.severity),
reverse=True,
)
lines = [
"=" * 70,
f"AI RED TEAM ENGAGEMENT REPORT",
"=" * 70,
"",
f"Engagement: {engagement_name}",
f"Engagement ID: {collector.engagement_id}",
f"Report Generated: {datetime.now(timezone.utc).isoformat()}",
"",
"SCOPE",
"-" * 40,
scope_description,
"",
"EXECUTIVE SUMMARY",
"-" * 40,
]
if executive_summary:
lines.append(executive_summary)
else:
lines.append(self._auto_executive_summary(findings, collector))
lines.extend([
"",
"FINDINGS SUMMARY",
"-" * 40,
])
severity_counts = {}
for f in findings:
sev = f.severity.value
severity_counts[sev] = severity_counts.get(sev, 0) + 1
for sev in ["critical", "high", "medium", "low", "informational"]:
count = severity_counts.get(sev, 0)
lines.append(f" {sev.upper()}: {count}")
lines.extend([
"",
f" Total findings: {len(findings)}",
f" Total evidence items: {len(collector.evidence_items)}",
"",
"DETAILED FINDINGS",
"-" * 40,
])
for i, finding in enumerate(findings, 1):
lines.extend(self._format_finding(finding, i, collector))
return "\n".join(lines)
def _format_finding(
self,
finding: Finding,
number: int,
collector: RedTeamEvidenceCollector,
) -> list[str]:
"""Format a single finding for the report."""
lines = [
"",
f"--- Finding #{number}: {finding.title} ---",
f"ID: {finding.finding_id}",
f"Severity: {finding.severity.value.upper()}",
f"CVSS: {finding.cvss_score or 'N/A'}",
f"CWE: {finding.cwe_id or 'N/A'}",
f"Reproducibility: {finding.reproducibility}",
"",
f"Description: {finding.description}",
"",
f"攻擊 Vector: {finding.attack_vector}",
"",
f"Impact: {finding.impact}",
"",
f"Remediation: {finding.remediation}",
"",
f"Supporting Evidence ({len(finding.evidence_ids)} items):",
]
for ev_id in finding.evidence_ids:
matching = [
e for e in collector.evidence_items
if e.evidence_id == ev_id
]
if matching:
ev = matching[0]
lines.append(
f" - [{ev.evidence_id}] {ev.evidence_type.value}: {ev.title}"
)
return lines
def _auto_executive_summary(
self,
findings: list[Finding],
collector: RedTeamEvidenceCollector,
) -> str:
"""Generate an automatic executive summary."""
critical = [f for f in findings if f.severity == SeverityLevel.CRITICAL]
high = [f for f in findings if f.severity == SeverityLevel.HIGH]
summary_parts = [
f"This engagement identified {len(findings)} findings "
f"across the target AI system."
]
if critical:
summary_parts.append(
f"{len(critical)} critical finding(s) require immediate "
f"remediation: {', '.join(f.title for f in critical)}."
)
if high:
summary_parts.append(
f"{len(high)} high-severity finding(s) should be "
f"addressed within the next sprint cycle."
)
return " ".join(summary_parts)
@staticmethod
def _severity_order(severity: SeverityLevel) -> int:
return {
SeverityLevel.INFORMATIONAL: 0,
SeverityLevel.LOW: 1,
SeverityLevel.MEDIUM: 2,
SeverityLevel.HIGH: 3,
SeverityLevel.CRITICAL: 4,
}.get(severity, 0)Engagement Planning for Evidence Collection
Pre-Engagement Evidence Requirements
Before any 測試 begins, define what evidence will be collected and how. Work with the engagement stakeholders (typically the AI product team and the 安全 team) to establish evidence requirements. These typically include: minimum documentation standards 對每個 finding, required metadata 對每個 evidence item, evidence classification levels and handling procedures, and the format and structure of the final report.
Define the evidence retention policy: how long will evidence be stored after the engagement concludes? For most organizations, evidence from 紅隊 engagements should be retained for at least one year to support trend analysis across multiple engagements. For regulated industries, retention requirements may be longer.
Establish the tooling and infrastructure before 測試 begins. Set up the evidence collection system with the engagement ID, create the 輸出 directories, configure the API client wrappers for automatic capture, and verify that evidence is being stored correctly by running a 測試 interaction. Discovering tooling issues mid-engagement wastes 測試 time and may result in lost evidence.
Rules of Engagement and Evidence Boundaries
AI 紅隊 engagements require clear rules of engagement that define what systems can be tested, what attack techniques are permitted, and what data can be collected. These rules directly affect evidence collection.
If the rules of engagement prohibit 測試 against production user data, ensure your evidence collection system does not capture real user conversations or PII. If certain attack techniques are out of scope (e.g., social engineering of the development team), ensure that any incidental evidence of potential social engineering 漏洞 is documented but not actively pursued.
The rules of engagement should also specify what happens if the 紅隊 discovers an active, non-simulated 安全 incident during 測試. This scenario is uncommon but not unheard of: a 紅隊員 investigating an AI system's 防禦 may discover evidence of actual malicious activity. Define the escalation procedure: stop 測試, preserve evidence, and notify the incident response team through an established communication channel.
Collaboration with Blue Teams
The most effective AI 紅隊 engagements produce evidence that is directly actionable by the blue team (the defenders). Design your evidence collection to support this goal. Each finding should include not just the attack technique and its impact, but also specific 偵測 opportunities: what log entries, metrics, or behavioral signals would have revealed this attack in progress? What 監控 could the blue team deploy to detect similar attacks in the future?
Document the attack from both perspectives: from the 紅隊's perspective (what they did and why it worked) and from the blue team's perspective (what signals were available and why they were missed or not monitored). This dual perspective makes the evidence package a complete learning resource rather than just a 漏洞 report.
Chain of Custody Considerations
Maintaining Evidence Integrity
Throughout an AI 紅隊 engagement, maintain chain of custody for all evidence. Every evidence item should have a SHA-256 hash computed at collection time, a collector identifier, and a timestamp. The evidence log provides an append-only audit trail of what was collected and when.
For engagements that may have legal implications (e.g., 測試 in regulated industries or collecting evidence of third-party AI misuse), 考慮 additional controls: write-once storage for evidence files, two-person integrity verification for critical findings, and signed evidence manifests using GPG or a similar tool.
Evidence Collection for Specific AI 攻擊 Types
越獄 Evidence
When documenting a successful 越獄, capture the complete conversation from the first message through the 越獄 success, including any failed attempts in between. The failed attempts are forensically valuable 因為 they document 模型's 防禦 boundaries and show the progression of the attack technique. Record the exact model version, temperature setting, and any other parameters, as 越獄 success rates are sensitive to these settings. 測試 reproducibility at multiple temperature values and note the range where the 越獄 succeeds.
Include negative evidence as well: document what 模型 correctly refused before the 越獄 succeeded. This establishes the baseline 安全 behavior and makes the 越獄 finding more credible. If the 越獄 only works with a specific 系統提示詞 configuration, document both the vulnerable configuration and configurations where the 越獄 fails.
RAG Poisoning Evidence
For RAG 投毒 findings, capture the poisoned document, its 嵌入向量, the similarity score at which it was retrieved, the query that triggered retrieval, 模型's 輸出 with the poisoned context, and 模型's 輸出 with clean context for the same query. The comparison between poisoned and clean outputs is the core evidence that demonstrates impact.
Document the ingestion pathway: how the poisoned document entered the 知識庫, what validation or filtering it passed through, and whether the 投毒 could be detected by existing content moderation. If the 投毒 exploits a specific property of the 嵌入向量 model (e.g., adversarially crafted text that embeds close to a target query), document the 嵌入向量 similarity analysis.
Training Data Extraction Evidence
When a model leaks 訓練資料, the evidence must demonstrate that the generated 輸出 matches actual 訓練資料 rather than being a coincidental generation. This requires comparing model outputs against known 訓練資料 sources. Use multiple prompting strategies to elicit the same data, and document the success rate. Calculate the statistical likelihood that the generation is coincidental by measuring the 輸出's perplexity under 模型 versus its perplexity under a reference distribution.
Capture the complete prompt that triggers the extraction, 模型's full 輸出, the matching 訓練資料 source (with provenance), and the similarity analysis showing the match is statistically significant. If the extraction reveals personally identifiable information or copyrighted material, flag this separately with appropriate handling markings.
Handling Sensitive Evidence
AI 紅隊 evidence frequently contains sensitive material: successful 越獄 payloads, model outputs that include harmful content, leaked 訓練資料, or exposed system prompts. Classify evidence appropriately and apply access controls. Store sensitive payloads separately from the main evidence repository with additional access restrictions. Redact sensitive content in reports distributed to broader audiences, while maintaining full unredacted evidence in the secure evidence store for technical review.
Evidence Quality Metrics
Measuring Evidence Completeness
At the conclusion of each engagement, 評估 the quality and completeness of the evidence collected. Metrics to track include: evidence-per-finding ratio (each finding should have at least 2-3 supporting evidence items), reproducibility 測試 coverage (what percentage of findings were reproducibility-tested), metadata completeness (what percentage of evidence items have all standard metadata fields populated), and hash verification rate (what percentage of evidence items have integrity hashes computed and verified).
Track these metrics across engagements to 識別 trends. If evidence quality is declining, investigate whether the 測試 methodology has become rushed, whether the tooling needs improvement, or whether the engagement scope has expanded beyond the team's capacity to document thoroughly. Evidence quality directly affects the credibility of findings and the likelihood that remediation will be prioritized.
Evidence Archival and Cross-Engagement Analysis
After each engagement, archive the complete evidence package in a long-term storage system accessible for future reference. Over time, the archive enables cross-engagement analysis: are the same 漏洞 patterns appearing across multiple engagements? Are remediation recommendations being implemented and effective? Which AI models or architectures are most frequently vulnerable?
Build a searchable index across archived engagements that allows querying by attack technique, AI model, 漏洞 type, and severity. This index becomes a 知識庫 that informs future engagement planning and helps the 紅隊 prioritize their 測試 based on historically observed 漏洞 patterns.
參考文獻
- MITRE ATLAS (對抗性 Threat Landscape for AI Systems). "ATLAS Matrix." https://atlas.mitre.org/
- NIST AI 100-2 E2023 (2024). "對抗性 Machine Learning: A Taxonomy and Terminology of 攻擊 and Mitigations." https://csrc.nist.gov/pubs/ai/100/2/e2023/final
- Microsoft (2024). "AI 紅隊 Playbook." https://learn.microsoft.com/en-us/安全/ai-red-team/