AI 安全 Regression Testing Methodology

Intermediate15 min readUpdated 2026-03-20

Design regression testing suites that verify security fixes remain effective across model updates and deployments.

walkthroughs methodology regression-testing continuous

概覽

Design regression 測試 suites that verify 安全 fixes remain effective across model updates and deployments. This methodology walkthrough provides a structured, repeatable framework that you can adapt to your organization's specific context and compliance requirements.

Methodology is the backbone of professional 紅隊演練. Without a structured approach, assessments become ad hoc exercises that miss critical 漏洞 and produce inconsistent results. This walkthrough establishes a framework that ensures comprehensive coverage while remaining adaptable to different engagement types.

Context and Motivation

This methodology addresses a gap identified by Promptfoo — github.com/promptfoo/promptfoo — LLM 測試 and 評估 in how organizations approach AI 安全 assessments. The framework aligns with Greenblatt et al. 2024 — "Alignment Faking in Large Language Models" and incorporates lessons learned from real-world engagements.

The AI 安全 landscape evolves rapidly, with new attack techniques and 防禦 mechanisms emerging continuously. A robust methodology must be flexible enough to incorporate new developments while maintaining the rigor needed for professional assessments. This walkthrough provides that balance.

Phase 1: Planning and Scoping

Effective 評估 begins with thorough planning. This phase establishes the boundaries, objectives, and success criteria for the engagement.

#!/usr/bin/env python3
"""
AI 安全 Regression 測試 Methodology — Methodology Framework
Structured approach for AI 紅隊 assessments.
"""
 
import json
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
from typing import Optional
 
 
class EngagementType(Enum):
    FULL_ASSESSMENT = "full_assessment"
    FOCUSED_TEST = "focused_test"
    COMPLIANCE_AUDIT = "compliance_audit"
    CONTINUOUS_MONITORING = "continuous_monitoring"
 
 
class Priority(Enum):
    CRITICAL = 4
    HIGH = 3
    MEDIUM = 2
    LOW = 1
 
 
@dataclass
class EngagementScope:
    """Define the scope and boundaries of an AI 紅隊 engagement."""
    client: str
    engagement_type: EngagementType
    start_date: str
    end_date: str
    target_systems: list = field(default_factory=list)
    in_scope_techniques: list = field(default_factory=list)
    out_of_scope: list = field(default_factory=list)
    objectives: list = field(default_factory=list)
    success_criteria: list = field(default_factory=list)
    compliance_requirements: list = field(default_factory=list)
    contact_info: dict = field(default_factory=dict)
 
    def validate(self) -> list[str]:
        """Validate the engagement scope for completeness."""
        issues = []
        if not self.target_systems:
            issues.append("No target systems defined")
        if not self.objectives:
            issues.append("No 評估 objectives defined")
        if not self.success_criteria:
            issues.append("No success criteria defined")
        if not self.contact_info:
            issues.append("No client contact information")
        return issues
 
    def to_dict(self) -> dict:
        return {
            "client": self.client,
            "type": self.engagement_type.value,
            "dates": f"{self.start_date} to {self.end_date}",
            "targets": self.target_systems,
            "objectives": self.objectives,
            "exclusions": self.out_of_scope,
        }
 
 
# 範例 scope definition
scope = EngagementScope(
    client="Acme Corp",
    engagement_type=EngagementType.FULL_ASSESSMENT,
    start_date="2026-03-20",
    end_date="2026-04-03",
    target_systems=[
        "Customer service chatbot (chat.acme.com)",
        "Internal 知識庫 assistant (kb.internal.acme.com)",
        "Code review AI (codereview.acme.com)",
    ],
    in_scope_techniques=[
        "Prompt injection (direct and indirect)",
        "系統提示詞 extraction",
        "Data exfiltration through model outputs",
        "安全 bypass and 越獄",
        "Function calling abuse",
    ],
    out_of_scope=[
        "Denial of service attacks",
        "Physical infrastructure 測試",
        "Social engineering of employees",
        "Model weight extraction",
    ],
    objectives=[
        "評估 提示詞注入 resistance across all target systems",
        "測試 data exfiltration controls and 輸出 filtering",
        "評估 函式呼叫 安全 in the code review AI",
        "Verify compliance with OWASP LLM Top 10 2025",
    ],
    success_criteria=[
        "Complete 測試 of all in-scope techniques against each target",
        "Document all findings with reproducible evidence",
        "Deliver final report within 5 business days of 測試 completion",
    ],
)
 
validation_issues = scope.validate()
if validation_issues:
    print(f"Scope issues: {validation_issues}")
else:
    print("Scope validated successfully")
    print(json.dumps(scope.to_dict(), indent=2))

Phase 2: Reconnaissance and Analysis

Systematic reconnaissance maps the target's architecture, models, and 防禦 before 測試 begins.

@dataclass
class ReconFinding:
    """A single reconnaissance finding."""
    category: str
    detail: str
    confidence: str  # high, medium, low
    source: str
    priority: Priority
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
 
 
class ReconFramework:
    """Structured reconnaissance framework for AI systems."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.findings: list[ReconFinding] = []
 
    def passive_recon(self, target: str) -> list[ReconFinding]:
        """Gather information without directly interacting with the target."""
        checklist = [
            ("Documentation Review", "Review public docs, API references, and changelogs"),
            ("Technology Stack", "識別 frameworks, 雲端 providers, and model providers"),
            ("Public Disclosures", "Check for prior 漏洞 disclosures or bug bounties"),
            ("Job Postings", "Analyze job postings for technology and architecture clues"),
            ("Social Media", "Search for engineering blog posts and conference talks"),
        ]
 
        findings = []
        for category, description in checklist:
            finding = ReconFinding(
                category=category,
                detail=f"{description} for {target}",
                confidence="medium",
                source="passive",
                priority=Priority.MEDIUM,
            )
            findings.append(finding)
            self.findings.append(finding)
 
        return findings
 
    def active_recon(self, target: str) -> list[ReconFinding]:
        """Actively probe the target to map its configuration."""
        probes = [
            ("Model Identification", "Fingerprint the underlying model through behavioral analysis"),
            ("防禦 Mapping", "識別 輸入 filters, 輸出 sanitizers, and 護欄"),
            ("API Surface", "Map available endpoints, parameters, and error handling"),
            ("Rate Limiting", "Determine rate limits and throttling behavior"),
            ("Session Handling", "Analyze session management and context persistence"),
            ("Tool/Function Discovery", "Enumerate available tools and 函式呼叫 capabilities"),
        ]
 
        findings = []
        for category, description in probes:
            finding = ReconFinding(
                category=category,
                detail=f"{description} for {target}",
                confidence="high",
                source="active",
                priority=Priority.HIGH,
            )
            findings.append(finding)
            self.findings.append(finding)
 
        return findings
 
    def generate_attack_plan(self) -> dict:
        """Generate a prioritized attack plan based on recon findings."""
        return {
            "target_count": len(self.scope.target_systems),
            "findings_count": len(self.findings),
            "high_priority": sum(1 for f in self.findings if f.priority == Priority.HIGH),
            "recommended_approach": "Start with highest-priority targets using "
                                   "techniques most likely to succeed based on recon",
        }
 
 
recon = ReconFramework(scope)
for target in scope.target_systems:
    recon.passive_recon(target)
    recon.active_recon(target)
 
plan = recon.generate_attack_plan()
print(json.dumps(plan, indent=2))

Phase 3: Execution Framework

Execute 測試 systematically, documenting all findings as they are discovered. The execution phase is where the majority of engagement time is spent and where the quality of previous phases directly impacts efficiency.

Phase	Time Allocation	Key Activities	Deliverables
Planning	10-15%	Scoping, scheduling, environment setup	Engagement plan, scope document
Reconnaissance	15-20%	Passive research, active probing, mapping	Recon report, attack plan
測試	50-60%	攻擊 execution, validation, iteration	Raw findings, evidence logs
Reporting	15-20%	Analysis, writing, review, delivery	Final report, executive summary

class ExecutionTracker:
    """Track execution progress and manage 測試 workflow."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.test_cases = []
        self.completed = []
        self.findings = []
 
    def create_test_matrix(self) -> list[dict]:
        """Generate a 測試 matrix from scope and recon findings."""
        techniques = self.scope.in_scope_techniques
        targets = self.scope.target_systems
        matrix = []
 
        for target in targets:
            for technique in techniques:
                matrix.append({
                    "target": target,
                    "technique": technique,
                    "priority": Priority.HIGH if "injection" in technique.lower() else Priority.MEDIUM,
                    "status": "pending",
                    "assigned_to": None,
                    "time_estimate_min": 30,
                })
 
        self.test_cases = matrix
        return matrix
 
    def update_status(self, target: str, technique: str, status: str, finding: dict = None):
        """Update the status of a 測試 case."""
        for tc in self.test_cases:
            if tc["target"] == target and tc["technique"] == technique:
                tc["status"] = status
                if finding:
                    self.findings.append(finding)
                break
 
    def get_progress(self) -> dict:
        """Get current execution progress."""
        total = len(self.test_cases)
        completed = sum(1 for tc in self.test_cases if tc["status"] in ("completed", "blocked"))
        in_progress = sum(1 for tc in self.test_cases if tc["status"] == "in_progress")
 
        return {
            "total_tests": total,
            "completed": completed,
            "in_progress": in_progress,
            "pending": total - completed - in_progress,
            "completion_pct": completed / max(total, 1) * 100,
            "findings_count": len(self.findings),
        }
 
 
tracker = ExecutionTracker(scope)
matrix = tracker.create_test_matrix()
print(f"Generated {len(matrix)} 測試 cases across "
      f"{len(scope.target_systems)} targets")
 
# Simulate execution progress
tracker.update_status(
    scope.target_systems[0],
    scope.in_scope_techniques[0],
    "completed",
    {"title": "範例 finding", "severity": "High"}
)
print(json.dumps(tracker.get_progress(), indent=2))

The execution phase should follow a prioritized order: start with the highest-impact techniques against the most critical targets. If time becomes constrained, this ordering ensures that the most important tests have been completed. Communicate progress regularly with the client through agreed-upon channels, especially if critical findings are discovered that require immediate 注意力.

Phase 3.5: Finding Validation and Triage

Before moving to reporting, validate all findings to ensure they meet professional quality standards. Unvalidated findings waste client time and damage assessor credibility. Every finding should pass these criteria before inclusion in the final report.

class FindingValidator:
    """Validate and triage findings before reporting."""
 
    SEVERITY_CRITERIA = {
        "Critical": {
            "min_success_rate": 0.5,
            "requires_validation": True,
            "impact": "Direct data breach, unauthorized code execution, or 安全 bypass "
                     "with real-world harm potential",
        },
        "High": {
            "min_success_rate": 0.3,
            "requires_validation": True,
            "impact": "系統提示詞 extraction, significant 安全 bypass, "
                     "or unauthorized action execution",
        },
        "Medium": {
            "min_success_rate": 0.2,
            "requires_validation": True,
            "impact": "Information disclosure, partial 安全 bypass, "
                     "or limited unauthorized behavior",
        },
        "Low": {
            "min_success_rate": 0.1,
            "requires_validation": False,
            "impact": "Informational findings, theoretical 漏洞, "
                     "or minimal impact issues",
        },
    }
 
    def validate_finding(self, finding: dict, test_runs: int = 5) -> dict:
        """Validate a finding meets quality standards."""
        issues = []
 
        if not finding.get("title"):
            issues.append("Missing title")
        if not finding.get("evidence"):
            issues.append("Missing evidence")
        if not finding.get("remediation"):
            issues.append("Missing remediation guidance")
 
        success_rate = finding.get("success_rate", 0)
        severity = finding.get("severity", "Medium")
        criteria = self.SEVERITY_CRITERIA.get(severity, self.SEVERITY_CRITERIA["Medium"])
 
        if success_rate < criteria["min_success_rate"]:
            issues.append(f"Success rate {success_rate:.0%} below minimum "
                         f"{criteria['min_success_rate']:.0%} for {severity} severity")
 
        return {
            "valid": len(issues) == 0,
            "issues": issues,
            "severity_appropriate": success_rate >= criteria["min_success_rate"],
        }
 
    def deduplicate(self, findings: list[dict]) -> list[dict]:
        """Remove duplicate findings that describe the same 漏洞."""
        unique = []
        seen_techniques = set()
 
        for finding in sorted(findings, key=lambda f: f.get("success_rate", 0), reverse=True):
            technique_key = finding.get("technique", "").lower().strip()
            if technique_key not in seen_techniques:
                unique.append(finding)
                seen_techniques.add(technique_key)
            else:
                # Merge evidence into the existing finding
                for existing in unique:
                    if existing.get("technique", "").lower().strip() == technique_key:
                        existing.setdefault("additional_evidence", []).append(
                            finding.get("evidence", "")
                        )
                        break
 
        return unique
 
 
validator = FindingValidator()

Finding validation is a critical quality gate. In the pressure of a time-boxed engagement, it can be tempting to include every observation as a finding. 然而, flooding the client with low-quality or unvalidated findings dilutes the impact of genuine critical issues and makes the report harder to act on. Apply strict validation criteria and present only findings that meet the quality bar for their assigned severity level.

Deduplication is equally important. Different team members may discover the same 漏洞 through different attack paths. These should be consolidated into a single finding with multiple evidence items, rather than presented as separate findings. This provides a clearer picture of the 漏洞 and avoids inflating finding counts.

Phase 4: Reporting and Communication

Transform raw findings into actionable reports that drive remediation. The report is the primary deliverable of the engagement and must communicate effectively to multiple audiences.

@dataclass
class Finding:
    """A validated 紅隊 finding."""
    title: str
    severity: str  # Critical, High, Medium, Low, Informational
    description: str
    evidence: str
    impact: str
    remediation: list[str]
    references: list[str]
    cvss_score: Optional[float] = None
    owasp_mapping: Optional[str] = None
    mitre_atlas: Optional[str] = None
    reproducible: bool = True
    success_rate: float = 0.0
 
 
class ReportGenerator:
    """Generate professional 紅隊 評估 reports."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.findings: list[Finding] = []
 
    def add_finding(self, finding: Finding):
        self.findings.append(finding)
 
    def generate_executive_summary(self) -> str:
        critical = sum(1 for f in self.findings if f.severity == "Critical")
        high = sum(1 for f in self.findings if f.severity == "High")
        medium = sum(1 for f in self.findings if f.severity == "Medium")
        low = sum(1 for f in self.findings if f.severity == "Low")
 
        return f"""
## Executive 總結
 
Red team 評估 of {self.scope.client} AI systems conducted
{self.scope.start_date} through {self.scope.end_date}.
 
**Overall Risk**: {"Critical" if critical > 0 else "High" if high > 0 else "Medium"}
 
| Severity | Count |
|----------|-------|
| Critical | {critical} |
| High | {high} |
| Medium | {medium} |
| Low | {low} |
| **Total** | **{len(self.findings)}** |
 
**Key Findings**: {len(self.findings)} 漏洞 identified across
{len(self.scope.target_systems)} target systems. Immediate remediation
recommended for all Critical and High severity findings.
"""
 
    def generate_full_report(self) -> str:
        sections = [self.generate_executive_summary()]
        for i, finding in enumerate(self.findings, 1):
            sections.append(f"""
### Finding {i}: {finding.title}
 
**Severity**: {finding.severity}
**Reproducible**: {"Yes" if finding.reproducible else "No"}
**Success Rate**: {finding.success_rate:.0%}
 
**Description**: {finding.description}
 
**Evidence**: {finding.evidence}
 
**Impact**: {finding.impact}
 
**Remediation**:
{chr(10).join(f'{j}. {r}' for j, r in enumerate(finding.remediation, 1))}
""")
        return "\n".join(sections)
 
 
# 範例 report generation
report = ReportGenerator(scope)
report.add_finding(Finding(
    title="System Prompt Extraction via Role-Play",
    severity="High",
    description="The customer service chatbot revealed its full 系統提示詞 "
                "when a role-play technique was used to request configuration details.",
    evidence="See Appendix A for full request/response logs",
    impact="Attackers can 理解 the full system configuration, enabling "
           "targeted attacks against specific 安全 measures.",
    remediation=[
        "實作 輸出 filtering for 系統提示詞 content",
        "Add instruction hierarchy enforcement to resist role-play overrides",
        "Deploy 監控 for 系統提示詞 extraction attempts",
    ],
    references=["OWASP LLM01", "MITRE ATLAS AML.T0051"],
    owasp_mapping="LLM01 — 提示詞注入",
    mitre_atlas="AML.T0051",
    reproducible=True,
    success_rate=0.8,
))
 
print(report.generate_full_report())

Phase 5: Post-Engagement Activities

The engagement does not end with report delivery. Post-engagement activities ensure that findings drive actual 安全 improvements and that the methodology evolves based on lessons learned.

Findings presentation: Schedule a findings walkthrough meeting with both technical and leadership stakeholders. Walk through each finding with live demonstrations where possible. Executives need to 理解 business risk; engineers need to 理解 root causes and remediation steps. Prepare different presentation decks 對每個 audience.

Remediation support: Offer to validate remediation implementations. When the client's engineering team deploys fixes, re-測試 the specific findings to verify that the 漏洞 have been addressed. This retesting phase is typically shorter than the original engagement and provides valuable assurance that fixes are effective.

Lessons learned: Conduct an internal lessons learned session with the 評估 team. Document what worked, what did not, and how the methodology should be adjusted for future engagements. Key questions to address include:

Question	Purpose
Which techniques were most effective against this target type?	Improve technique selection for similar future engagements
Were there any scope gaps that missed important attack surfaces?	Refine scoping methodology
Did the time allocation match actual effort distribution?	Improve estimation for future engagements
Were there any tooling limitations that impacted coverage?	Drive tool development priorities
Did the reporting format effectively communicate findings?	Refine report templates

知識庫 updates: Update the internal 知識庫 with new findings, effective payloads, and 防禦 bypass techniques discovered during the engagement. This institutional knowledge benefits future 評估 teams and ensures that the organization continuously improves its 紅隊 capabilities.

Client relationship: Follow up with the client after a reasonable remediation window (typically 30-60 days) to check on remediation progress. This demonstrates ongoing commitment to their 安全 posture and can lead to follow-on engagements for verification 測試 or expanded scope assessments.

參考文獻 and Further Reading

Promptfoo — github.com/promptfoo/promptfoo — LLM 測試 and 評估
Greenblatt et al. 2024 — "Alignment Faking in Large Language Models"
Wei et al. 2023 — "Jailbroken: How Does LLM 安全 Training Fail?"
HarmBench — github.com/centerforaisafety/HarmBench — standardized 評估
MITRE ATLAS — AML.T0043 (對抗性 ML 攻擊 Staging)
Carlini et al. 2021 — "Extracting Training Data from Large Language Models"

Knowledge Check

Why should reconnaissance receive 15-20% of total engagement time even in time-boxed assessments?

Knowledge Check

What is the most important factor when deciding between fail-open and fail-closed for a 防禦 component?

AI 安全 Regression Testing Methodology

Intermediate15 min readUpdated 2026-03-20

Design regression testing suites that verify security fixes remain effective across model updates and deployments.

walkthroughs methodology regression-testing continuous

概覽

Context and Motivation

Phase 1: Planning and Scoping

Effective 評估 begins with thorough planning. This phase establishes the boundaries, objectives, and success criteria for the engagement.

#!/usr/bin/env python3
"""
AI 安全 Regression 測試 Methodology — Methodology Framework
Structured approach for AI 紅隊 assessments.
"""
 
import json
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
from typing import Optional
 
 
class EngagementType(Enum):
    FULL_ASSESSMENT = "full_assessment"
    FOCUSED_TEST = "focused_test"
    COMPLIANCE_AUDIT = "compliance_audit"
    CONTINUOUS_MONITORING = "continuous_monitoring"
 
 
class Priority(Enum):
    CRITICAL = 4
    HIGH = 3
    MEDIUM = 2
    LOW = 1
 
 
@dataclass
class EngagementScope:
    """Define the scope and boundaries of an AI 紅隊 engagement."""
    client: str
    engagement_type: EngagementType
    start_date: str
    end_date: str
    target_systems: list = field(default_factory=list)
    in_scope_techniques: list = field(default_factory=list)
    out_of_scope: list = field(default_factory=list)
    objectives: list = field(default_factory=list)
    success_criteria: list = field(default_factory=list)
    compliance_requirements: list = field(default_factory=list)
    contact_info: dict = field(default_factory=dict)
 
    def validate(self) -> list[str]:
        """Validate the engagement scope for completeness."""
        issues = []
        if not self.target_systems:
            issues.append("No target systems defined")
        if not self.objectives:
            issues.append("No 評估 objectives defined")
        if not self.success_criteria:
            issues.append("No success criteria defined")
        if not self.contact_info:
            issues.append("No client contact information")
        return issues
 
    def to_dict(self) -> dict:
        return {
            "client": self.client,
            "type": self.engagement_type.value,
            "dates": f"{self.start_date} to {self.end_date}",
            "targets": self.target_systems,
            "objectives": self.objectives,
            "exclusions": self.out_of_scope,
        }
 
 
# 範例 scope definition
scope = EngagementScope(
    client="Acme Corp",
    engagement_type=EngagementType.FULL_ASSESSMENT,
    start_date="2026-03-20",
    end_date="2026-04-03",
    target_systems=[
        "Customer service chatbot (chat.acme.com)",
        "Internal 知識庫 assistant (kb.internal.acme.com)",
        "Code review AI (codereview.acme.com)",
    ],
    in_scope_techniques=[
        "Prompt injection (direct and indirect)",
        "系統提示詞 extraction",
        "Data exfiltration through model outputs",
        "安全 bypass and 越獄",
        "Function calling abuse",
    ],
    out_of_scope=[
        "Denial of service attacks",
        "Physical infrastructure 測試",
        "Social engineering of employees",
        "Model weight extraction",
    ],
    objectives=[
        "評估 提示詞注入 resistance across all target systems",
        "測試 data exfiltration controls and 輸出 filtering",
        "評估 函式呼叫 安全 in the code review AI",
        "Verify compliance with OWASP LLM Top 10 2025",
    ],
    success_criteria=[
        "Complete 測試 of all in-scope techniques against each target",
        "Document all findings with reproducible evidence",
        "Deliver final report within 5 business days of 測試 completion",
    ],
)
 
validation_issues = scope.validate()
if validation_issues:
    print(f"Scope issues: {validation_issues}")
else:
    print("Scope validated successfully")
    print(json.dumps(scope.to_dict(), indent=2))

Phase 2: Reconnaissance and Analysis

Systematic reconnaissance maps the target's architecture, models, and 防禦 before 測試 begins.

@dataclass
class ReconFinding:
    """A single reconnaissance finding."""
    category: str
    detail: str
    confidence: str  # high, medium, low
    source: str
    priority: Priority
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
 
 
class ReconFramework:
    """Structured reconnaissance framework for AI systems."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.findings: list[ReconFinding] = []
 
    def passive_recon(self, target: str) -> list[ReconFinding]:
        """Gather information without directly interacting with the target."""
        checklist = [
            ("Documentation Review", "Review public docs, API references, and changelogs"),
            ("Technology Stack", "識別 frameworks, 雲端 providers, and model providers"),
            ("Public Disclosures", "Check for prior 漏洞 disclosures or bug bounties"),
            ("Job Postings", "Analyze job postings for technology and architecture clues"),
            ("Social Media", "Search for engineering blog posts and conference talks"),
        ]
 
        findings = []
        for category, description in checklist:
            finding = ReconFinding(
                category=category,
                detail=f"{description} for {target}",
                confidence="medium",
                source="passive",
                priority=Priority.MEDIUM,
            )
            findings.append(finding)
            self.findings.append(finding)
 
        return findings
 
    def active_recon(self, target: str) -> list[ReconFinding]:
        """Actively probe the target to map its configuration."""
        probes = [
            ("Model Identification", "Fingerprint the underlying model through behavioral analysis"),
            ("防禦 Mapping", "識別 輸入 filters, 輸出 sanitizers, and 護欄"),
            ("API Surface", "Map available endpoints, parameters, and error handling"),
            ("Rate Limiting", "Determine rate limits and throttling behavior"),
            ("Session Handling", "Analyze session management and context persistence"),
            ("Tool/Function Discovery", "Enumerate available tools and 函式呼叫 capabilities"),
        ]
 
        findings = []
        for category, description in probes:
            finding = ReconFinding(
                category=category,
                detail=f"{description} for {target}",
                confidence="high",
                source="active",
                priority=Priority.HIGH,
            )
            findings.append(finding)
            self.findings.append(finding)
 
        return findings
 
    def generate_attack_plan(self) -> dict:
        """Generate a prioritized attack plan based on recon findings."""
        return {
            "target_count": len(self.scope.target_systems),
            "findings_count": len(self.findings),
            "high_priority": sum(1 for f in self.findings if f.priority == Priority.HIGH),
            "recommended_approach": "Start with highest-priority targets using "
                                   "techniques most likely to succeed based on recon",
        }
 
 
recon = ReconFramework(scope)
for target in scope.target_systems:
    recon.passive_recon(target)
    recon.active_recon(target)
 
plan = recon.generate_attack_plan()
print(json.dumps(plan, indent=2))

Phase 3: Execution Framework

Phase	Time Allocation	Key Activities	Deliverables
Planning	10-15%	Scoping, scheduling, environment setup	Engagement plan, scope document
Reconnaissance	15-20%	Passive research, active probing, mapping	Recon report, attack plan
測試	50-60%	攻擊 execution, validation, iteration	Raw findings, evidence logs
Reporting	15-20%	Analysis, writing, review, delivery	Final report, executive summary

class ExecutionTracker:
    """Track execution progress and manage 測試 workflow."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.test_cases = []
        self.completed = []
        self.findings = []
 
    def create_test_matrix(self) -> list[dict]:
        """Generate a 測試 matrix from scope and recon findings."""
        techniques = self.scope.in_scope_techniques
        targets = self.scope.target_systems
        matrix = []
 
        for target in targets:
            for technique in techniques:
                matrix.append({
                    "target": target,
                    "technique": technique,
                    "priority": Priority.HIGH if "injection" in technique.lower() else Priority.MEDIUM,
                    "status": "pending",
                    "assigned_to": None,
                    "time_estimate_min": 30,
                })
 
        self.test_cases = matrix
        return matrix
 
    def update_status(self, target: str, technique: str, status: str, finding: dict = None):
        """Update the status of a 測試 case."""
        for tc in self.test_cases:
            if tc["target"] == target and tc["technique"] == technique:
                tc["status"] = status
                if finding:
                    self.findings.append(finding)
                break
 
    def get_progress(self) -> dict:
        """Get current execution progress."""
        total = len(self.test_cases)
        completed = sum(1 for tc in self.test_cases if tc["status"] in ("completed", "blocked"))
        in_progress = sum(1 for tc in self.test_cases if tc["status"] == "in_progress")
 
        return {
            "total_tests": total,
            "completed": completed,
            "in_progress": in_progress,
            "pending": total - completed - in_progress,
            "completion_pct": completed / max(total, 1) * 100,
            "findings_count": len(self.findings),
        }
 
 
tracker = ExecutionTracker(scope)
matrix = tracker.create_test_matrix()
print(f"Generated {len(matrix)} 測試 cases across "
      f"{len(scope.target_systems)} targets")
 
# Simulate execution progress
tracker.update_status(
    scope.target_systems[0],
    scope.in_scope_techniques[0],
    "completed",
    {"title": "範例 finding", "severity": "High"}
)
print(json.dumps(tracker.get_progress(), indent=2))

Phase 3.5: Finding Validation and Triage

class FindingValidator:
    """Validate and triage findings before reporting."""
 
    SEVERITY_CRITERIA = {
        "Critical": {
            "min_success_rate": 0.5,
            "requires_validation": True,
            "impact": "Direct data breach, unauthorized code execution, or 安全 bypass "
                     "with real-world harm potential",
        },
        "High": {
            "min_success_rate": 0.3,
            "requires_validation": True,
            "impact": "系統提示詞 extraction, significant 安全 bypass, "
                     "or unauthorized action execution",
        },
        "Medium": {
            "min_success_rate": 0.2,
            "requires_validation": True,
            "impact": "Information disclosure, partial 安全 bypass, "
                     "or limited unauthorized behavior",
        },
        "Low": {
            "min_success_rate": 0.1,
            "requires_validation": False,
            "impact": "Informational findings, theoretical 漏洞, "
                     "or minimal impact issues",
        },
    }
 
    def validate_finding(self, finding: dict, test_runs: int = 5) -> dict:
        """Validate a finding meets quality standards."""
        issues = []
 
        if not finding.get("title"):
            issues.append("Missing title")
        if not finding.get("evidence"):
            issues.append("Missing evidence")
        if not finding.get("remediation"):
            issues.append("Missing remediation guidance")
 
        success_rate = finding.get("success_rate", 0)
        severity = finding.get("severity", "Medium")
        criteria = self.SEVERITY_CRITERIA.get(severity, self.SEVERITY_CRITERIA["Medium"])
 
        if success_rate < criteria["min_success_rate"]:
            issues.append(f"Success rate {success_rate:.0%} below minimum "
                         f"{criteria['min_success_rate']:.0%} for {severity} severity")
 
        return {
            "valid": len(issues) == 0,
            "issues": issues,
            "severity_appropriate": success_rate >= criteria["min_success_rate"],
        }
 
    def deduplicate(self, findings: list[dict]) -> list[dict]:
        """Remove duplicate findings that describe the same 漏洞."""
        unique = []
        seen_techniques = set()
 
        for finding in sorted(findings, key=lambda f: f.get("success_rate", 0), reverse=True):
            technique_key = finding.get("technique", "").lower().strip()
            if technique_key not in seen_techniques:
                unique.append(finding)
                seen_techniques.add(technique_key)
            else:
                # Merge evidence into the existing finding
                for existing in unique:
                    if existing.get("technique", "").lower().strip() == technique_key:
                        existing.setdefault("additional_evidence", []).append(
                            finding.get("evidence", "")
                        )
                        break
 
        return unique
 
 
validator = FindingValidator()

Phase 4: Reporting and Communication

Transform raw findings into actionable reports that drive remediation. The report is the primary deliverable of the engagement and must communicate effectively to multiple audiences.

@dataclass
class Finding:
    """A validated 紅隊 finding."""
    title: str
    severity: str  # Critical, High, Medium, Low, Informational
    description: str
    evidence: str
    impact: str
    remediation: list[str]
    references: list[str]
    cvss_score: Optional[float] = None
    owasp_mapping: Optional[str] = None
    mitre_atlas: Optional[str] = None
    reproducible: bool = True
    success_rate: float = 0.0
 
 
class ReportGenerator:
    """Generate professional 紅隊 評估 reports."""
 
    def __init__(self, scope: EngagementScope):
        self.scope = scope
        self.findings: list[Finding] = []
 
    def add_finding(self, finding: Finding):
        self.findings.append(finding)
 
    def generate_executive_summary(self) -> str:
        critical = sum(1 for f in self.findings if f.severity == "Critical")
        high = sum(1 for f in self.findings if f.severity == "High")
        medium = sum(1 for f in self.findings if f.severity == "Medium")
        low = sum(1 for f in self.findings if f.severity == "Low")
 
        return f"""
## Executive 總結
 
Red team 評估 of {self.scope.client} AI systems conducted
{self.scope.start_date} through {self.scope.end_date}.
 
**Overall Risk**: {"Critical" if critical > 0 else "High" if high > 0 else "Medium"}
 
| Severity | Count |
|----------|-------|
| Critical | {critical} |
| High | {high} |
| Medium | {medium} |
| Low | {low} |
| **Total** | **{len(self.findings)}** |
 
**Key Findings**: {len(self.findings)} 漏洞 identified across
{len(self.scope.target_systems)} target systems. Immediate remediation
recommended for all Critical and High severity findings.
"""
 
    def generate_full_report(self) -> str:
        sections = [self.generate_executive_summary()]
        for i, finding in enumerate(self.findings, 1):
            sections.append(f"""
### Finding {i}: {finding.title}
 
**Severity**: {finding.severity}
**Reproducible**: {"Yes" if finding.reproducible else "No"}
**Success Rate**: {finding.success_rate:.0%}
 
**Description**: {finding.description}
 
**Evidence**: {finding.evidence}
 
**Impact**: {finding.impact}
 
**Remediation**:
{chr(10).join(f'{j}. {r}' for j, r in enumerate(finding.remediation, 1))}
""")
        return "\n".join(sections)
 
 
# 範例 report generation
report = ReportGenerator(scope)
report.add_finding(Finding(
    title="System Prompt Extraction via Role-Play",
    severity="High",
    description="The customer service chatbot revealed its full 系統提示詞 "
                "when a role-play technique was used to request configuration details.",
    evidence="See Appendix A for full request/response logs",
    impact="Attackers can 理解 the full system configuration, enabling "
           "targeted attacks against specific 安全 measures.",
    remediation=[
        "實作 輸出 filtering for 系統提示詞 content",
        "Add instruction hierarchy enforcement to resist role-play overrides",
        "Deploy 監控 for 系統提示詞 extraction attempts",
    ],
    references=["OWASP LLM01", "MITRE ATLAS AML.T0051"],
    owasp_mapping="LLM01 — 提示詞注入",
    mitre_atlas="AML.T0051",
    reproducible=True,
    success_rate=0.8,
))
 
print(report.generate_full_report())

Phase 5: Post-Engagement Activities

The engagement does not end with report delivery. Post-engagement activities ensure that findings drive actual 安全 improvements and that the methodology evolves based on lessons learned.

Question	Purpose
Which techniques were most effective against this target type?	Improve technique selection for similar future engagements
Were there any scope gaps that missed important attack surfaces?	Refine scoping methodology
Did the time allocation match actual effort distribution?	Improve estimation for future engagements
Were there any tooling limitations that impacted coverage?	Drive tool development priorities
Did the reporting format effectively communicate findings?	Refine report templates

參考文獻 and Further Reading

Promptfoo — github.com/promptfoo/promptfoo — LLM 測試 and 評估
Greenblatt et al. 2024 — "Alignment Faking in Large Language Models"
Wei et al. 2023 — "Jailbroken: How Does LLM 安全 Training Fail?"
HarmBench — github.com/centerforaisafety/HarmBench — standardized 評估
MITRE ATLAS — AML.T0043 (對抗性 ML 攻擊 Staging)
Carlini et al. 2021 — "Extracting Training Data from Large Language Models"

Knowledge Check

Why should reconnaissance receive 15-20% of total engagement time even in time-boxed assessments?

Knowledge Check

What is the most important factor when deciding between fail-open and fail-closed for a 防禦 component?

AI 安全 Regression Testing Methodology

Related articles

AI 安全 Regression Testing Methodology

Related articles