Measuring and Reporting AI 紅隊 Effectiveness

中級17 分鐘閱讀更新於 2026-03-20

導覽 for defining, collecting, and reporting metrics that measure the effectiveness of AI red teaming programs, covering coverage metrics, detection rates, time-to-find analysis, remediation tracking, and ROI calculation.

metrics effectiveness reporting kpis program-management methodology walkthrough

"How do we know our AI 紅隊演練 program is working?" is the question that every CISO eventually asks. Without metrics, the answer is subjective -- "we found some things" is not compelling when competing for budget against other 安全 initiatives. This walkthrough defines a practical metrics framework that quantifies the value of AI 紅隊演練, tracks improvement over time, and provides the data needed for program justification and resource allocation.

Step 1: Defining the Metrics Framework

Organize metrics into four categories that answer different stakeholder questions:

# AI 紅隊 Metrics Framework
 
## Category 1: Coverage Metrics
**Question**: "Are we 測試 everything we should be?"
 
| Metric | Definition | Target |
|--------|-----------|--------|
| OWASP LLM Top 10 Coverage | % of OWASP categories tested in each engagement | 100% for standard engagements |
| 攻擊 Surface Coverage | % of enumerated components tested | >80% for standard, >95% for comprehensive |
| 測試 Case Volume | Number of unique 測試 cases per engagement | >200 for standard engagements |
| Technique Diversity | Number of distinct attack techniques used | >15 per engagement |
| Model Coverage | % of production models tested in the quarter | >90% |
 
## Category 2: 偵測 Metrics
**Question**: "Are we finding real 漏洞?"
 
| Metric | Definition | Target |
|--------|-----------|--------|
| 漏洞 Count | Total findings per engagement by severity | Track trend, not absolute number |
| 偵測 Rate | Findings per 測試 hour | Track trend |
| Novel Finding Rate | % of findings not found in previous engagements | >20% indicates evolving methodology |
| False Positive Rate | % of reported findings that were not actual 漏洞 | <5% |
| Time to First Finding | Hours from engagement start to first Critical/High finding | <4 hours |
 
## Category 3: Remediation Metrics
**Question**: "Are we driving actual 安全 improvement?"
 
| Metric | Definition | Target |
|--------|-----------|--------|
| Remediation Rate | % of findings remediated within SLA | >90% for Critical, >80% for High |
| Mean Time to Remediate (MTTR) | Average days from report to fix by severity | <7 days Critical, <30 days High |
| Regression Rate | % of previously fixed findings that reappear | <5% |
| Retest Pass Rate | % of remediated findings that pass verification | >95% |
| Open Finding Age | Average age of unresolved findings | Decreasing trend |
 
## Category 4: Program Metrics
**Question**: "Is the program worth the investment?"
 
| Metric | Definition | Target |
|--------|-----------|--------|
| Cost per Finding | Total program cost / number of findings | Track trend (should stabilize) |
| 安全 Posture Score | Average model 安全 score across evaluations | Improving trend |
| Engagement Frequency | Red team engagements per quarter | Per policy |
| Tool ROI | Findings from automated tools vs. manual 測試 | Track ratio |
| Coverage Gap Closure | % of identified gaps addressed since last quarter | >50% per quarter |

Step 2: Collecting Metrics Data

Build a data collection system that captures metrics automatically:

# metrics/collector.py
"""Collect and store AI 紅隊 metrics from engagement data."""
import json
import sqlite3
from datetime import datetime, date
from pathlib import Path
from dataclasses import dataclass
 
@dataclass
class EngagementMetrics:
    engagement_id: str
    client: str
    start_date: str
    end_date: str
    engagement_type: str
    total_hours: float
    components_in_scope: int
    components_tested: int
    test_cases_executed: int
    techniques_used: int
    findings_critical: int
    findings_high: int
    findings_medium: int
    findings_low: int
    owasp_categories_tested: int
 
class MetricsCollector:
    """Collect and store engagement metrics."""
 
    def __init__(self, db_path: str = "metrics/redteam_metrics.db"):
        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
        self.conn = sqlite3.connect(db_path)
        self._init_db()
 
    def _init_db(self):
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS engagements (
                id TEXT PRIMARY KEY,
                client TEXT,
                start_date TEXT,
                end_date TEXT,
                engagement_type TEXT,
                total_hours REAL,
                components_in_scope INTEGER,
                components_tested INTEGER,
                test_cases INTEGER,
                techniques_used INTEGER,
                findings_critical INTEGER,
                findings_high INTEGER,
                findings_medium INTEGER,
                findings_low INTEGER,
                owasp_categories INTEGER
            );
 
            CREATE TABLE IF NOT EXISTS findings (
                id TEXT PRIMARY KEY,
                engagement_id TEXT,
                severity TEXT,
                category TEXT,
                found_date TEXT,
                reported_date TEXT,
                remediated_date TEXT,
                verified_date TEXT,
                regression_count INTEGER DEFAULT 0,
                FOREIGN KEY (engagement_id) REFERENCES engagements(id)
            );
 
            CREATE TABLE IF NOT EXISTS continuous_scans (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                scan_date TEXT,
                scan_type TEXT,
                total_tests INTEGER,
                passed INTEGER,
                failed INTEGER,
                model_name TEXT,
                commit_sha TEXT
            );
        """)
        self.conn.commit()
 
    def record_engagement(self, metrics: EngagementMetrics):
        """Record metrics from a completed engagement."""
        self.conn.execute(
            "INSERT OR REPLACE INTO engagements VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
            (
                metrics.engagement_id, metrics.client, metrics.start_date,
                metrics.end_date, metrics.engagement_type, metrics.total_hours,
                metrics.components_in_scope, metrics.components_tested,
                metrics.test_cases_executed, metrics.techniques_used,
                metrics.findings_critical, metrics.findings_high,
                metrics.findings_medium, metrics.findings_low,
                metrics.owasp_categories_tested,
            ),
        )
        self.conn.commit()
 
    def record_finding(self, finding_id: str, engagement_id: str,
                       severity: str, category: str, found_date: str):
        """Record a finding for tracking remediation."""
        self.conn.execute(
            "INSERT OR REPLACE INTO findings (id, engagement_id, severity, "
            "category, found_date) VALUES (?,?,?,?,?)",
            (finding_id, engagement_id, severity, category, found_date),
        )
        self.conn.commit()
 
    def mark_remediated(self, finding_id: str, remediated_date: str):
        """Mark a finding as remediated."""
        self.conn.execute(
            "UPDATE findings SET remediated_date = ? WHERE id = ?",
            (remediated_date, finding_id),
        )
        self.conn.commit()
 
    def mark_verified(self, finding_id: str, verified_date: str):
        """Mark a remediated finding as verified."""
        self.conn.execute(
            "UPDATE findings SET verified_date = ? WHERE id = ?",
            (verified_date, finding_id),
        )
        self.conn.commit()
 
    def record_regression(self, finding_id: str):
        """Record that a previously fixed finding has regressed."""
        self.conn.execute(
            "UPDATE findings SET regression_count = regression_count + 1, "
            "remediated_date = NULL, verified_date = NULL WHERE id = ?",
            (finding_id,),
        )
        self.conn.commit()

Step 3: Calculating Key Metrics

Compute each metric from the collected data:

# metrics/calculator.py
"""Calculate AI 紅隊 metrics from collected data."""
from datetime import datetime, timedelta
from metrics.collector import MetricsCollector
 
class MetricsCalculator:
    """Calculate metrics from the metrics 資料庫."""
 
    def __init__(self, collector: MetricsCollector):
        self.conn = collector.conn
 
    def coverage_metrics(self, period_days: int = 90) -> dict:
        """Calculate coverage metrics for the specified period."""
        cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
 
        rows = self.conn.execute(
            "SELECT AVG(CAST(components_tested AS REAL) / NULLIF(components_in_scope, 0)), "
            "AVG(test_cases), AVG(techniques_used), AVG(owasp_categories) "
            "FROM engagements WHERE start_date >= ?",
            (cutoff,),
        ).fetchone()
 
        return {
            "avg_attack_surface_coverage": f"{(rows[0] or 0) * 100:.1f}%",
            "avg_test_cases_per_engagement": round(rows[1] or 0),
            "avg_techniques_per_engagement": round(rows[2] or 0),
            "avg_owasp_categories_covered": round(rows[3] or 0),
        }
 
    def detection_metrics(self, period_days: int = 90) -> dict:
        """Calculate 偵測 metrics."""
        cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
 
        # Total findings by severity
        severity_counts = {}
        for severity in ["Critical", "High", "Medium", "Low"]:
            count = self.conn.execute(
                "SELECT COUNT(*) FROM findings WHERE severity = ? AND found_date >= ?",
                (severity, cutoff),
            ).fetchone()[0]
            severity_counts[severity] = count
 
        # Findings per hour
        total_findings = sum(severity_counts.values())
        total_hours = self.conn.execute(
            "SELECT SUM(total_hours) FROM engagements WHERE start_date >= ?",
            (cutoff,),
        ).fetchone()[0] or 1
 
        # Novel finding rate (findings not in previous period)
        prev_cutoff = (datetime.now() - timedelta(days=period_days * 2)).isoformat()
        previous_categories = set(
            row[0] for row in self.conn.execute(
                "SELECT DISTINCT category FROM findings WHERE found_date >= ? AND found_date < ?",
                (prev_cutoff, cutoff),
            ).fetchall()
        )
        current_categories = set(
            row[0] for row in self.conn.execute(
                "SELECT DISTINCT category FROM findings WHERE found_date >= ?",
                (cutoff,),
            ).fetchall()
        )
        novel_categories = current_categories - previous_categories
        novel_rate = len(novel_categories) / max(len(current_categories), 1) * 100
 
        return {
            "findings_by_severity": severity_counts,
            "total_findings": total_findings,
            "findings_per_hour": round(total_findings / total_hours, 2),
            "novel_finding_rate": f"{novel_rate:.0f}%",
        }
 
    def remediation_metrics(self) -> dict:
        """Calculate remediation metrics."""
        # Remediation rate by severity
        for severity in ["Critical", "High"]:
            total = self.conn.execute(
                "SELECT COUNT(*) FROM findings WHERE severity = ?",
                (severity,),
            ).fetchone()[0]
            remediated = self.conn.execute(
                "SELECT COUNT(*) FROM findings WHERE severity = ? AND remediated_date IS NOT NULL",
                (severity,),
            ).fetchone()[0]
 
        # MTTR calculation
        mttr_rows = self.conn.execute(
            "SELECT severity, AVG(julianday(remediated_date) - julianday(found_date)) "
            "FROM findings WHERE remediated_date IS NOT NULL "
            "GROUP BY severity",
        ).fetchall()
        mttr = {row[0]: round(row[1], 1) for row in mttr_rows}
 
        # Regression rate
        total_findings = self.conn.execute(
            "SELECT COUNT(*) FROM findings"
        ).fetchone()[0]
        regressed = self.conn.execute(
            "SELECT COUNT(*) FROM findings WHERE regression_count > 0"
        ).fetchone()[0]
        regression_rate = regressed / max(total_findings, 1) * 100
 
        # Retest pass rate
        verified = self.conn.execute(
            "SELECT COUNT(*) FROM findings WHERE verified_date IS NOT NULL"
        ).fetchone()[0]
        retested = self.conn.execute(
            "SELECT COUNT(*) FROM findings WHERE remediated_date IS NOT NULL"
        ).fetchone()[0]
        retest_pass_rate = verified / max(retested, 1) * 100
 
        return {
            "mean_time_to_remediate_days": mttr,
            "regression_rate": f"{regression_rate:.1f}%",
            "retest_pass_rate": f"{retest_pass_rate:.0f}%",
        }
 
    def security_posture_trend(self, weeks: int = 12) -> list[dict]:
        """Get the 安全 posture score trend from continuous scans."""
        rows = self.conn.execute(
            "SELECT strftime('%Y-W%W', scan_date) as week, "
            "AVG(CAST(passed AS REAL) / NULLIF(total_tests, 0)) "
            "FROM continuous_scans "
            "WHERE scan_date >= date('now', ?) "
            "GROUP BY week ORDER BY week",
            (f'-{weeks * 7} days',),
        ).fetchall()
 
        return [{"week": row[0], "safety_score": round(row[1] or 0, 3)} for row in rows]

Step 4: Building the Metrics Dashboard

Create a report that presents metrics to stakeholders:

# metrics/dashboard.py
"""Generate a metrics dashboard report."""
from metrics.collector import MetricsCollector
from metrics.calculator import MetricsCalculator
from datetime import datetime
 
def generate_dashboard(db_path: str = "metrics/redteam_metrics.db") -> str:
    """Generate a markdown metrics dashboard."""
    collector = MetricsCollector(db_path)
    calc = MetricsCalculator(collector)
 
    coverage = calc.coverage_metrics()
    偵測 = calc.detection_metrics()
    remediation = calc.remediation_metrics()
    trend = calc.security_posture_trend()
 
    report = f"""# AI 紅隊 Program Metrics Dashboard
 
**Report Date**: {datetime.now().strftime('%Y-%m-%d')}
**Period**: Last 90 days
 
---
 
## Coverage Metrics
 
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| 攻擊 Surface Coverage | {coverage['avg_attack_surface_coverage']} | >80% | {'On Track' if float(coverage['avg_attack_surface_coverage'].rstrip('%')) > 80 else 'Below Target'} |
| 測試 Cases per Engagement | {coverage['avg_test_cases_per_engagement']} | >200 | {'On Track' if coverage['avg_test_cases_per_engagement'] > 200 else 'Below Target'} |
| Techniques per Engagement | {coverage['avg_techniques_per_engagement']} | >15 | {'On Track' if coverage['avg_techniques_per_engagement'] > 15 else 'Below Target'} |
| OWASP Categories Covered | {coverage['avg_owasp_categories_covered']}/10 | 10 | {'On Track' if coverage['avg_owasp_categories_covered'] >= 10 else 'Below Target'} |
 
## 偵測 Metrics
 
| Metric | Value |
|--------|-------|
| Total Findings | {偵測['total_findings']} |
| Critical | {偵測['findings_by_severity'].get('Critical', 0)} |
| High | {偵測['findings_by_severity'].get('High', 0)} |
| Medium | {偵測['findings_by_severity'].get('Medium', 0)} |
| Low | {偵測['findings_by_severity'].get('Low', 0)} |
| Findings per 測試 Hour | {偵測['findings_per_hour']} |
| Novel Finding Rate | {偵測['novel_finding_rate']} |
 
## Remediation Metrics
 
| Metric | Value | Target |
|--------|-------|--------|
| MTTR (Critical) | {remediation['mean_time_to_remediate_days'].get('Critical', 'N/A')} days | <7 days |
| MTTR (High) | {remediation['mean_time_to_remediate_days'].get('High', 'N/A')} days | <30 days |
| Regression Rate | {remediation['regression_rate']} | <5% |
| Retest Pass Rate | {remediation['retest_pass_rate']} | >95% |
 
## 安全 Posture Trend
 
"""
 
    if trend:
        report += "| Week | 安全 Score |\n|------|-------------|\n"
        for entry in trend:
            bar_length = int(entry['safety_score'] * 20)
            bar = '█' * bar_length + '░' * (20 - bar_length)
            report += f"| {entry['week']} | {entry['safety_score']:.3f} {bar} |\n"
    else:
        report += "*No continuous scan data available for trend analysis.*\n"
 
    return report
 
if __name__ == "__main__":
    print(generate_dashboard())

Step 5: Calculating Program ROI

Quantify the return on investment for the AI 紅隊演練 program:

# metrics/roi.py
"""Calculate ROI for the AI 紅隊演練 program."""
from dataclasses import dataclass
 
@dataclass
class ROICalculation:
    # Costs
    team_cost: float          # Annual team salary/contractor costs
    tool_cost: float          # Annual tool and API costs
    infrastructure_cost: float # Lab, compute, etc.
 
    # Value (risk reduction)
    critical_findings: int
    high_findings: int
    avg_breach_cost: float    # Industry average cost of an AI 安全 incident
    estimated_breach_probability_reduction: float  # 0.0-1.0
 
    @property
    def total_cost(self) -> float:
        return self.team_cost + self.tool_cost + self.infrastructure_cost
 
    @property
    def estimated_risk_reduction(self) -> float:
        """Estimated annual risk reduction in dollar terms."""
        # Each critical finding represents a prevented potential incident
        critical_value = self.critical_findings * self.avg_breach_cost * 0.3
        high_value = self.high_findings * self.avg_breach_cost * 0.1
        return (critical_value + high_value) * self.estimated_breach_probability_reduction
 
    @property
    def roi_ratio(self) -> float:
        return self.estimated_risk_reduction / self.total_cost if self.total_cost > 0 else 0
 
    def generate_roi_report(self) -> str:
        return f"""## Program ROI Analysis
 
### Investment
| Item | Annual Cost |
|------|------------|
| Team (salary/contractors) | ${self.team_cost:,.0f} |
| Tools and API costs | ${self.tool_cost:,.0f} |
| Infrastructure | ${self.infrastructure_cost:,.0f} |
| **Total Investment** | **${self.total_cost:,.0f}** |
 
### Value Generated
| Item | Value |
|------|-------|
| Critical findings identified | {self.critical_findings} |
| High findings identified | {self.high_findings} |
| Industry avg breach cost | ${self.avg_breach_cost:,.0f} |
| Estimated risk reduction | ${self.estimated_risk_reduction:,.0f} |
 
### ROI
| Metric | Value |
|--------|-------|
| ROI Ratio | {self.roi_ratio:.1f}x |
| Net Value | ${self.estimated_risk_reduction - self.total_cost:,.0f} |
 
*Note: ROI calculations use industry-average breach costs and estimated
probability reduction. Actual values may vary.*
"""
 
# 範例 calculation
if __name__ == "__main__":
    roi = ROICalculation(
        team_cost=300000,
        tool_cost=25000,
        infrastructure_cost=10000,
        critical_findings=8,
        high_findings=15,
        avg_breach_cost=4450000,  # IBM Cost of a Data Breach 2025
        estimated_breach_probability_reduction=0.15,
    )
    print(roi.generate_roi_report())

Common Pitfalls and Troubleshooting

Problem	Cause	Solution
Metrics show no improvement	測試 methodology hasn't changed; finding same issues	Update 測試 suites with new techniques quarterly
Finding count drops after first engagement	Low-hanging fruit already found	這是 expected; track novel finding rate instead of raw count
MTTR is misleading	Includes findings stuck in backlog	Report MTTR by severity; exclude intentionally deferred findings
Regression rate spikes	New deployments without CI 測試	實作 continuous 紅隊演練 pipeline from the continuous 測試 walkthrough
Stakeholders fixate on a single metric	Metric presented without context	Always present metrics in context with targets and trends
ROI calculation disputed	Breach cost estimates are speculative	Use industry benchmarks (IBM, Verizon DBIR) and present ranges

關鍵要點

Measuring AI 紅隊 effectiveness transforms the program from a cost center into a demonstrable risk reduction investment:

Measure what matters to each audience -- coverage metrics for the 安全 team, remediation metrics for engineering, and ROI for executive leadership. No single metric serves all audiences.
Track trends, not snapshots -- a single engagement's finding count is meaningless in isolation. The trend of 安全 posture over time tells the real story.
Novel findings indicate program health -- if every engagement finds only the same issues, the methodology has stagnated. A healthy novel finding rate (>20%) indicates evolving attack techniques.
Regression rate is the most actionable metric -- it directly measures whether fixes are holding. High regression rates indicate a need for continuous 測試, not more point-in-time engagements.
ROI requires honest estimation -- breach cost estimates and probability reductions are inherently uncertain. Present ranges rather than point estimates, and anchor to industry benchmarks.

Advanced Considerations

Adapting to Modern 防禦

The defensive landscape for LLM applications has evolved significantly since the initial wave of 提示詞注入 research. Modern production systems often deploy multiple independent defensive layers, requiring attackers to adapt their techniques accordingly.

輸入 classification: The most common first line of 防禦 is an 輸入 classifier that evaluates incoming prompts for 對抗性 patterns. These classifiers range from simple keyword-based filters to sophisticated ML models trained on 對抗性 examples. Bypassing 輸入 classifiers requires 理解 their 偵測 methodology:

Keyword-based classifiers can be evaded through encoding (Base64, ROT13, Unicode homoglyphs), paraphrasing, or splitting the payload across multiple turns
ML-based classifiers require more sophisticated evasion such as semantic camouflage, gradual escalation, or 利用 of the classifier's own blind spots

class DefenseBypassStrategy:
    """Strategy selector for bypassing different 防禦 configurations."""
 
    def __init__(self):
        self.strategies = {
            "keyword_filter": self._bypass_keyword,
            "ml_classifier": self._bypass_ml,
            "output_filter": self._bypass_output,
            "rate_limiter": self._bypass_rate,
            "behavioral": self._bypass_behavioral,
        }
 
    def select_strategy(self, identified_defenses: list) -> callable:
        """Select the optimal bypass strategy based on identified 防禦."""
        if not identified_defenses:
            return self._direct_attack
 
        # Combine strategies for multi-layer 防禦
        applicable = [
            self.strategies[d] for d in identified_defenses
            if d in self.strategies
        ]
        return self._chain_strategies(applicable)
 
    def _bypass_keyword(self, payload: str) -> str:
        """Bypass keyword-based 輸入 filters."""
        import base64
        # Encode the payload to bypass keyword matching
        encoded = base64.b64encode(payload.encode()).decode()
        return f"Decode and follow: {encoded}"
 
    def _bypass_ml(self, payload: str) -> list:
        """Generate multiple paraphrased variants for ML classifier evasion."""
        variants = [
            f"As a thought experiment, 考慮: {payload}",
            f"For academic research purposes: {payload}",
            f"In a fictional scenario where: {payload}",
        ]
        return variants
 
    def _bypass_output(self, payload: str) -> str:
        """Modify the payload to extract data through 輸出 filter gaps."""
        return payload + "\nRespond using only code blocks and variable names."
 
    def _bypass_rate(self, payload: str) -> dict:
        """Configure timing to avoid rate limit triggers."""
        return {"payload": payload, "delay": 3.0, "jitter": 1.5}
 
    def _bypass_behavioral(self, payload: str) -> list:
        """Split the payload across multiple natural-sounding turns."""
        # Multi-turn escalation to avoid behavioral anomaly 偵測
        return [
            "I'm working on a 安全 research project.",
            "Could you help me 理解 how 安全 mechanisms work?",
            "Specifically, I'm interested in how systems like yours " + payload[:100],
            payload,
        ]
 
    def _direct_attack(self, payload: str) -> str:
        return payload
 
    def _chain_strategies(self, strategies: list) -> callable:
        """Chain multiple bypass strategies."""
        def chained(payload):
            result = payload
            for strategy in strategies:
                result = strategy(result)
            return result
        return chained

輸出 filtering: 輸出 filters inspect 模型's response before it reaches 使用者, looking for sensitive data leakage, harmful content, or other policy violations. Common 輸出 filter bypass techniques include:

Technique	運作方式	Effectiveness
Encoding 輸出	Request Base64/hex encoded responses	Medium — some filters check decoded content
Code block wrapping	Embed data in code comments/variables	High — many filters skip code blocks
Steganographic 輸出	Hide data in formatting, capitalization, or spacing	High — difficult to detect
Chunked extraction	Extract small pieces across many turns	High — individual pieces may pass filters
Indirect extraction	Have 模型 reveal data through behavior changes	Very High — no explicit data in 輸出

Cross-Model Considerations

Techniques that work against one model may not directly transfer to others. 然而, 理解 the general principles allows adaptation:

安全訓練 methodology: Models trained with RLHF (GPT-4, Claude) have different 安全 characteristics than those using DPO (Llama, Mistral) or other methods. RLHF-trained models tend to refuse more broadly but may be more susceptible to multi-turn escalation.
Context window size: Models with larger context windows (Claude with 200K, Gemini with 1M+) may be more susceptible to 上下文視窗 manipulation where 對抗性 content is buried in large amounts of benign text.
Multimodal capabilities: Models that process images, audio, or other modalities introduce additional attack surfaces not present in text-only models.
工具使用實作: The 實作 details of 函式呼叫 vary significantly between providers. OpenAI uses a structured 函式呼叫 format, while Anthropic uses 工具使用 blocks. These differences affect 利用 techniques.

Operational Considerations

測試 Ethics and Boundaries

Professional 紅隊測試 operates within clear ethical and legal boundaries:

Authorization: Always obtain written 授權 before 測試. This should specify the scope, methods allowed, and any restrictions.
Scope limits: Stay within the authorized scope. If you discover a 漏洞 that leads outside the authorized perimeter, document it and report it without exploiting it.
Data handling: Handle any sensitive data discovered during 測試 according to the engagement agreement. Never retain sensitive data beyond what's needed for reporting.
Responsible disclosure: Follow responsible disclosure practices for any 漏洞 discovered, particularly if they affect systems beyond your 測試 scope.

Documenting Results

Professional documentation follows a structured format:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
 
@dataclass
class Finding:
    """Structure for documenting a 安全 finding."""
    id: str
    title: str
    severity: str  # Critical, High, Medium, Low, Informational
    category: str  # OWASP LLM Top 10 category
    description: str
    steps_to_reproduce: list[str]
    impact: str
    recommendation: str
    evidence: list[str] = field(default_factory=list)
    mitre_atlas: Optional[str] = None
    cvss_score: Optional[float] = None
    discovered_at: str = field(default_factory=lambda: datetime.now().isoformat())
 
    def to_report_section(self) -> str:
        """Generate a report section for this finding."""
        steps = "\n".join(f"   {i+1}. {s}" for i, s in enumerate(self.steps_to_reproduce))
        return f"""
### {self.id}: {self.title}
 
**Severity**: {self.severity}
**Category**: {self.category}
{f"**MITRE ATLAS**: {self.mitre_atlas}" if self.mitre_atlas else ""}
 
#### Description
{self.description}
 
#### Steps to Reproduce
{steps}
 
#### Impact
{self.impact}
 
#### Recommendation
{self.recommendation}
"""

This structured approach ensures that findings are actionable and that remediation teams have the information they need to address the 漏洞 effectively.

Measuring and Reporting AI 紅隊 Effectiveness

相關文章

Measuring and Reporting AI 紅隊 Effectiveness

相關文章