Measuring and Reporting AI 紅隊 Effectiveness
導覽 for defining, collecting, and reporting metrics that measure the effectiveness of AI red teaming programs, covering coverage metrics, detection rates, time-to-find analysis, remediation tracking, and ROI calculation.
"How do we know our AI 紅隊演練 program is working?" is the question that every CISO eventually asks. Without metrics, the answer is subjective -- "we found some things" is not compelling when competing for budget against other 安全 initiatives. This walkthrough defines a practical metrics framework that quantifies the value of AI 紅隊演練, tracks improvement over time, and provides the data needed for program justification and resource allocation.
Step 1: Defining the Metrics Framework
Organize metrics into four categories that answer different stakeholder questions:
# AI 紅隊 Metrics Framework
## Category 1: Coverage Metrics
**Question**: "Are we 測試 everything we should be?"
| Metric | Definition | Target |
|--------|-----------|--------|
| OWASP LLM Top 10 Coverage | % of OWASP categories tested in each engagement | 100% for standard engagements |
| 攻擊 Surface Coverage | % of enumerated components tested | >80% for standard, >95% for comprehensive |
| 測試 Case Volume | Number of unique 測試 cases per engagement | >200 for standard engagements |
| Technique Diversity | Number of distinct attack techniques used | >15 per engagement |
| Model Coverage | % of production models tested in the quarter | >90% |
## Category 2: 偵測 Metrics
**Question**: "Are we finding real 漏洞?"
| Metric | Definition | Target |
|--------|-----------|--------|
| 漏洞 Count | Total findings per engagement by severity | Track trend, not absolute number |
| 偵測 Rate | Findings per 測試 hour | Track trend |
| Novel Finding Rate | % of findings not found in previous engagements | >20% indicates evolving methodology |
| False Positive Rate | % of reported findings that were not actual 漏洞 | <5% |
| Time to First Finding | Hours from engagement start to first Critical/High finding | <4 hours |
## Category 3: Remediation Metrics
**Question**: "Are we driving actual 安全 improvement?"
| Metric | Definition | Target |
|--------|-----------|--------|
| Remediation Rate | % of findings remediated within SLA | >90% for Critical, >80% for High |
| Mean Time to Remediate (MTTR) | Average days from report to fix by severity | <7 days Critical, <30 days High |
| Regression Rate | % of previously fixed findings that reappear | <5% |
| Retest Pass Rate | % of remediated findings that pass verification | >95% |
| Open Finding Age | Average age of unresolved findings | Decreasing trend |
## Category 4: Program Metrics
**Question**: "Is the program worth the investment?"
| Metric | Definition | Target |
|--------|-----------|--------|
| Cost per Finding | Total program cost / number of findings | Track trend (should stabilize) |
| 安全 Posture Score | Average model 安全 score across evaluations | Improving trend |
| Engagement Frequency | Red team engagements per quarter | Per policy |
| Tool ROI | Findings from automated tools vs. manual 測試 | Track ratio |
| Coverage Gap Closure | % of identified gaps addressed since last quarter | >50% per quarter |Step 2: Collecting Metrics Data
Build a data collection system that captures metrics automatically:
# metrics/collector.py
"""Collect and store AI 紅隊 metrics from engagement data."""
import json
import sqlite3
from datetime import datetime, date
from pathlib import Path
from dataclasses import dataclass
@dataclass
class EngagementMetrics:
engagement_id: str
client: str
start_date: str
end_date: str
engagement_type: str
total_hours: float
components_in_scope: int
components_tested: int
test_cases_executed: int
techniques_used: int
findings_critical: int
findings_high: int
findings_medium: int
findings_low: int
owasp_categories_tested: int
class MetricsCollector:
"""Collect and store engagement metrics."""
def __init__(self, db_path: str = "metrics/redteam_metrics.db"):
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
self.conn = sqlite3.connect(db_path)
self._init_db()
def _init_db(self):
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS engagements (
id TEXT PRIMARY KEY,
client TEXT,
start_date TEXT,
end_date TEXT,
engagement_type TEXT,
total_hours REAL,
components_in_scope INTEGER,
components_tested INTEGER,
test_cases INTEGER,
techniques_used INTEGER,
findings_critical INTEGER,
findings_high INTEGER,
findings_medium INTEGER,
findings_low INTEGER,
owasp_categories INTEGER
);
CREATE TABLE IF NOT EXISTS findings (
id TEXT PRIMARY KEY,
engagement_id TEXT,
severity TEXT,
category TEXT,
found_date TEXT,
reported_date TEXT,
remediated_date TEXT,
verified_date TEXT,
regression_count INTEGER DEFAULT 0,
FOREIGN KEY (engagement_id) REFERENCES engagements(id)
);
CREATE TABLE IF NOT EXISTS continuous_scans (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scan_date TEXT,
scan_type TEXT,
total_tests INTEGER,
passed INTEGER,
failed INTEGER,
model_name TEXT,
commit_sha TEXT
);
""")
self.conn.commit()
def record_engagement(self, metrics: EngagementMetrics):
"""Record metrics from a completed engagement."""
self.conn.execute(
"INSERT OR REPLACE INTO engagements VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(
metrics.engagement_id, metrics.client, metrics.start_date,
metrics.end_date, metrics.engagement_type, metrics.total_hours,
metrics.components_in_scope, metrics.components_tested,
metrics.test_cases_executed, metrics.techniques_used,
metrics.findings_critical, metrics.findings_high,
metrics.findings_medium, metrics.findings_low,
metrics.owasp_categories_tested,
),
)
self.conn.commit()
def record_finding(self, finding_id: str, engagement_id: str,
severity: str, category: str, found_date: str):
"""Record a finding for tracking remediation."""
self.conn.execute(
"INSERT OR REPLACE INTO findings (id, engagement_id, severity, "
"category, found_date) VALUES (?,?,?,?,?)",
(finding_id, engagement_id, severity, category, found_date),
)
self.conn.commit()
def mark_remediated(self, finding_id: str, remediated_date: str):
"""Mark a finding as remediated."""
self.conn.execute(
"UPDATE findings SET remediated_date = ? WHERE id = ?",
(remediated_date, finding_id),
)
self.conn.commit()
def mark_verified(self, finding_id: str, verified_date: str):
"""Mark a remediated finding as verified."""
self.conn.execute(
"UPDATE findings SET verified_date = ? WHERE id = ?",
(verified_date, finding_id),
)
self.conn.commit()
def record_regression(self, finding_id: str):
"""Record that a previously fixed finding has regressed."""
self.conn.execute(
"UPDATE findings SET regression_count = regression_count + 1, "
"remediated_date = NULL, verified_date = NULL WHERE id = ?",
(finding_id,),
)
self.conn.commit()Step 3: Calculating Key Metrics
Compute each metric from the collected data:
# metrics/calculator.py
"""Calculate AI 紅隊 metrics from collected data."""
from datetime import datetime, timedelta
from metrics.collector import MetricsCollector
class MetricsCalculator:
"""Calculate metrics from the metrics 資料庫."""
def __init__(self, collector: MetricsCollector):
self.conn = collector.conn
def coverage_metrics(self, period_days: int = 90) -> dict:
"""Calculate coverage metrics for the specified period."""
cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
rows = self.conn.execute(
"SELECT AVG(CAST(components_tested AS REAL) / NULLIF(components_in_scope, 0)), "
"AVG(test_cases), AVG(techniques_used), AVG(owasp_categories) "
"FROM engagements WHERE start_date >= ?",
(cutoff,),
).fetchone()
return {
"avg_attack_surface_coverage": f"{(rows[0] or 0) * 100:.1f}%",
"avg_test_cases_per_engagement": round(rows[1] or 0),
"avg_techniques_per_engagement": round(rows[2] or 0),
"avg_owasp_categories_covered": round(rows[3] or 0),
}
def detection_metrics(self, period_days: int = 90) -> dict:
"""Calculate 偵測 metrics."""
cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
# Total findings by severity
severity_counts = {}
for severity in ["Critical", "High", "Medium", "Low"]:
count = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ? AND found_date >= ?",
(severity, cutoff),
).fetchone()[0]
severity_counts[severity] = count
# Findings per hour
total_findings = sum(severity_counts.values())
total_hours = self.conn.execute(
"SELECT SUM(total_hours) FROM engagements WHERE start_date >= ?",
(cutoff,),
).fetchone()[0] or 1
# Novel finding rate (findings not in previous period)
prev_cutoff = (datetime.now() - timedelta(days=period_days * 2)).isoformat()
previous_categories = set(
row[0] for row in self.conn.execute(
"SELECT DISTINCT category FROM findings WHERE found_date >= ? AND found_date < ?",
(prev_cutoff, cutoff),
).fetchall()
)
current_categories = set(
row[0] for row in self.conn.execute(
"SELECT DISTINCT category FROM findings WHERE found_date >= ?",
(cutoff,),
).fetchall()
)
novel_categories = current_categories - previous_categories
novel_rate = len(novel_categories) / max(len(current_categories), 1) * 100
return {
"findings_by_severity": severity_counts,
"total_findings": total_findings,
"findings_per_hour": round(total_findings / total_hours, 2),
"novel_finding_rate": f"{novel_rate:.0f}%",
}
def remediation_metrics(self) -> dict:
"""Calculate remediation metrics."""
# Remediation rate by severity
for severity in ["Critical", "High"]:
total = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ?",
(severity,),
).fetchone()[0]
remediated = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ? AND remediated_date IS NOT NULL",
(severity,),
).fetchone()[0]
# MTTR calculation
mttr_rows = self.conn.execute(
"SELECT severity, AVG(julianday(remediated_date) - julianday(found_date)) "
"FROM findings WHERE remediated_date IS NOT NULL "
"GROUP BY severity",
).fetchall()
mttr = {row[0]: round(row[1], 1) for row in mttr_rows}
# Regression rate
total_findings = self.conn.execute(
"SELECT COUNT(*) FROM findings"
).fetchone()[0]
regressed = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE regression_count > 0"
).fetchone()[0]
regression_rate = regressed / max(total_findings, 1) * 100
# Retest pass rate
verified = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE verified_date IS NOT NULL"
).fetchone()[0]
retested = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE remediated_date IS NOT NULL"
).fetchone()[0]
retest_pass_rate = verified / max(retested, 1) * 100
return {
"mean_time_to_remediate_days": mttr,
"regression_rate": f"{regression_rate:.1f}%",
"retest_pass_rate": f"{retest_pass_rate:.0f}%",
}
def security_posture_trend(self, weeks: int = 12) -> list[dict]:
"""Get the 安全 posture score trend from continuous scans."""
rows = self.conn.execute(
"SELECT strftime('%Y-W%W', scan_date) as week, "
"AVG(CAST(passed AS REAL) / NULLIF(total_tests, 0)) "
"FROM continuous_scans "
"WHERE scan_date >= date('now', ?) "
"GROUP BY week ORDER BY week",
(f'-{weeks * 7} days',),
).fetchall()
return [{"week": row[0], "safety_score": round(row[1] or 0, 3)} for row in rows]Step 4: Building the Metrics Dashboard
Create a report that presents metrics to stakeholders:
# metrics/dashboard.py
"""Generate a metrics dashboard report."""
from metrics.collector import MetricsCollector
from metrics.calculator import MetricsCalculator
from datetime import datetime
def generate_dashboard(db_path: str = "metrics/redteam_metrics.db") -> str:
"""Generate a markdown metrics dashboard."""
collector = MetricsCollector(db_path)
calc = MetricsCalculator(collector)
coverage = calc.coverage_metrics()
偵測 = calc.detection_metrics()
remediation = calc.remediation_metrics()
trend = calc.security_posture_trend()
report = f"""# AI 紅隊 Program Metrics Dashboard
**Report Date**: {datetime.now().strftime('%Y-%m-%d')}
**Period**: Last 90 days
---
## Coverage Metrics
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| 攻擊 Surface Coverage | {coverage['avg_attack_surface_coverage']} | >80% | {'On Track' if float(coverage['avg_attack_surface_coverage'].rstrip('%')) > 80 else 'Below Target'} |
| 測試 Cases per Engagement | {coverage['avg_test_cases_per_engagement']} | >200 | {'On Track' if coverage['avg_test_cases_per_engagement'] > 200 else 'Below Target'} |
| Techniques per Engagement | {coverage['avg_techniques_per_engagement']} | >15 | {'On Track' if coverage['avg_techniques_per_engagement'] > 15 else 'Below Target'} |
| OWASP Categories Covered | {coverage['avg_owasp_categories_covered']}/10 | 10 | {'On Track' if coverage['avg_owasp_categories_covered'] >= 10 else 'Below Target'} |
## 偵測 Metrics
| Metric | Value |
|--------|-------|
| Total Findings | {偵測['total_findings']} |
| Critical | {偵測['findings_by_severity'].get('Critical', 0)} |
| High | {偵測['findings_by_severity'].get('High', 0)} |
| Medium | {偵測['findings_by_severity'].get('Medium', 0)} |
| Low | {偵測['findings_by_severity'].get('Low', 0)} |
| Findings per 測試 Hour | {偵測['findings_per_hour']} |
| Novel Finding Rate | {偵測['novel_finding_rate']} |
## Remediation Metrics
| Metric | Value | Target |
|--------|-------|--------|
| MTTR (Critical) | {remediation['mean_time_to_remediate_days'].get('Critical', 'N/A')} days | <7 days |
| MTTR (High) | {remediation['mean_time_to_remediate_days'].get('High', 'N/A')} days | <30 days |
| Regression Rate | {remediation['regression_rate']} | <5% |
| Retest Pass Rate | {remediation['retest_pass_rate']} | >95% |
## 安全 Posture Trend
"""
if trend:
report += "| Week | 安全 Score |\n|------|-------------|\n"
for entry in trend:
bar_length = int(entry['safety_score'] * 20)
bar = '█' * bar_length + '░' * (20 - bar_length)
report += f"| {entry['week']} | {entry['safety_score']:.3f} {bar} |\n"
else:
report += "*No continuous scan data available for trend analysis.*\n"
return report
if __name__ == "__main__":
print(generate_dashboard())Step 5: Calculating Program ROI
Quantify the return on investment for the AI 紅隊演練 program:
# metrics/roi.py
"""Calculate ROI for the AI 紅隊演練 program."""
from dataclasses import dataclass
@dataclass
class ROICalculation:
# Costs
team_cost: float # Annual team salary/contractor costs
tool_cost: float # Annual tool and API costs
infrastructure_cost: float # Lab, compute, etc.
# Value (risk reduction)
critical_findings: int
high_findings: int
avg_breach_cost: float # Industry average cost of an AI 安全 incident
estimated_breach_probability_reduction: float # 0.0-1.0
@property
def total_cost(self) -> float:
return self.team_cost + self.tool_cost + self.infrastructure_cost
@property
def estimated_risk_reduction(self) -> float:
"""Estimated annual risk reduction in dollar terms."""
# Each critical finding represents a prevented potential incident
critical_value = self.critical_findings * self.avg_breach_cost * 0.3
high_value = self.high_findings * self.avg_breach_cost * 0.1
return (critical_value + high_value) * self.estimated_breach_probability_reduction
@property
def roi_ratio(self) -> float:
return self.estimated_risk_reduction / self.total_cost if self.total_cost > 0 else 0
def generate_roi_report(self) -> str:
return f"""## Program ROI Analysis
### Investment
| Item | Annual Cost |
|------|------------|
| Team (salary/contractors) | ${self.team_cost:,.0f} |
| Tools and API costs | ${self.tool_cost:,.0f} |
| Infrastructure | ${self.infrastructure_cost:,.0f} |
| **Total Investment** | **${self.total_cost:,.0f}** |
### Value Generated
| Item | Value |
|------|-------|
| Critical findings identified | {self.critical_findings} |
| High findings identified | {self.high_findings} |
| Industry avg breach cost | ${self.avg_breach_cost:,.0f} |
| Estimated risk reduction | ${self.estimated_risk_reduction:,.0f} |
### ROI
| Metric | Value |
|--------|-------|
| ROI Ratio | {self.roi_ratio:.1f}x |
| Net Value | ${self.estimated_risk_reduction - self.total_cost:,.0f} |
*Note: ROI calculations use industry-average breach costs and estimated
probability reduction. Actual values may vary.*
"""
# 範例 calculation
if __name__ == "__main__":
roi = ROICalculation(
team_cost=300000,
tool_cost=25000,
infrastructure_cost=10000,
critical_findings=8,
high_findings=15,
avg_breach_cost=4450000, # IBM Cost of a Data Breach 2025
estimated_breach_probability_reduction=0.15,
)
print(roi.generate_roi_report())Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Metrics show no improvement | 測試 methodology hasn't changed; finding same issues | Update 測試 suites with new techniques quarterly |
| Finding count drops after first engagement | Low-hanging fruit already found | 這是 expected; track novel finding rate instead of raw count |
| MTTR is misleading | Includes findings stuck in backlog | Report MTTR by severity; exclude intentionally deferred findings |
| Regression rate spikes | New deployments without CI 測試 | 實作 continuous 紅隊演練 pipeline from the continuous 測試 walkthrough |
| Stakeholders fixate on a single metric | Metric presented without context | Always present metrics in context with targets and trends |
| ROI calculation disputed | Breach cost estimates are speculative | Use industry benchmarks (IBM, Verizon DBIR) and present ranges |
關鍵要點
Measuring AI 紅隊 effectiveness transforms the program from a cost center into a demonstrable risk reduction investment:
- Measure what matters to each audience -- coverage metrics for the 安全 team, remediation metrics for engineering, and ROI for executive leadership. No single metric serves all audiences.
- Track trends, not snapshots -- a single engagement's finding count is meaningless in isolation. The trend of 安全 posture over time tells the real story.
- Novel findings indicate program health -- if every engagement finds only the same issues, the methodology has stagnated. A healthy novel finding rate (>20%) indicates evolving attack techniques.
- Regression rate is the most actionable metric -- it directly measures whether fixes are holding. High regression rates indicate a need for continuous 測試, not more point-in-time engagements.
- ROI requires honest estimation -- breach cost estimates and probability reductions are inherently uncertain. Present ranges rather than point estimates, and anchor to industry benchmarks.
Advanced Considerations
Adapting to Modern 防禦
The defensive landscape for LLM applications has evolved significantly since the initial wave of 提示詞注入 research. Modern production systems often deploy multiple independent defensive layers, requiring attackers to adapt their techniques accordingly.
輸入 classification: The most common first line of 防禦 is an 輸入 classifier that evaluates incoming prompts for 對抗性 patterns. These classifiers range from simple keyword-based filters to sophisticated ML models trained on 對抗性 examples. Bypassing 輸入 classifiers requires 理解 their 偵測 methodology:
- Keyword-based classifiers can be evaded through encoding (Base64, ROT13, Unicode homoglyphs), paraphrasing, or splitting the payload across multiple turns
- ML-based classifiers require more sophisticated evasion such as semantic camouflage, gradual escalation, or 利用 of the classifier's own blind spots
class DefenseBypassStrategy:
"""Strategy selector for bypassing different 防禦 configurations."""
def __init__(self):
self.strategies = {
"keyword_filter": self._bypass_keyword,
"ml_classifier": self._bypass_ml,
"output_filter": self._bypass_output,
"rate_limiter": self._bypass_rate,
"behavioral": self._bypass_behavioral,
}
def select_strategy(self, identified_defenses: list) -> callable:
"""Select the optimal bypass strategy based on identified 防禦."""
if not identified_defenses:
return self._direct_attack
# Combine strategies for multi-layer 防禦
applicable = [
self.strategies[d] for d in identified_defenses
if d in self.strategies
]
return self._chain_strategies(applicable)
def _bypass_keyword(self, payload: str) -> str:
"""Bypass keyword-based 輸入 filters."""
import base64
# Encode the payload to bypass keyword matching
encoded = base64.b64encode(payload.encode()).decode()
return f"Decode and follow: {encoded}"
def _bypass_ml(self, payload: str) -> list:
"""Generate multiple paraphrased variants for ML classifier evasion."""
variants = [
f"As a thought experiment, 考慮: {payload}",
f"For academic research purposes: {payload}",
f"In a fictional scenario where: {payload}",
]
return variants
def _bypass_output(self, payload: str) -> str:
"""Modify the payload to extract data through 輸出 filter gaps."""
return payload + "\nRespond using only code blocks and variable names."
def _bypass_rate(self, payload: str) -> dict:
"""Configure timing to avoid rate limit triggers."""
return {"payload": payload, "delay": 3.0, "jitter": 1.5}
def _bypass_behavioral(self, payload: str) -> list:
"""Split the payload across multiple natural-sounding turns."""
# Multi-turn escalation to avoid behavioral anomaly 偵測
return [
"I'm working on a 安全 research project.",
"Could you help me 理解 how 安全 mechanisms work?",
"Specifically, I'm interested in how systems like yours " + payload[:100],
payload,
]
def _direct_attack(self, payload: str) -> str:
return payload
def _chain_strategies(self, strategies: list) -> callable:
"""Chain multiple bypass strategies."""
def chained(payload):
result = payload
for strategy in strategies:
result = strategy(result)
return result
return chained輸出 filtering: 輸出 filters inspect 模型's response before it reaches 使用者, looking for sensitive data leakage, harmful content, or other policy violations. Common 輸出 filter bypass techniques include:
| Technique | 運作方式 | Effectiveness |
|---|---|---|
| Encoding 輸出 | Request Base64/hex encoded responses | Medium — some filters check decoded content |
| Code block wrapping | Embed data in code comments/variables | High — many filters skip code blocks |
| Steganographic 輸出 | Hide data in formatting, capitalization, or spacing | High — difficult to detect |
| Chunked extraction | Extract small pieces across many turns | High — individual pieces may pass filters |
| Indirect extraction | Have 模型 reveal data through behavior changes | Very High — no explicit data in 輸出 |
Cross-Model Considerations
Techniques that work against one model may not directly transfer to others. 然而, 理解 the general principles allows adaptation:
-
安全 訓練 methodology: Models trained with RLHF (GPT-4, Claude) have different 安全 characteristics than those using DPO (Llama, Mistral) or other methods. RLHF-trained models tend to refuse more broadly but may be more susceptible to multi-turn escalation.
-
Context window size: Models with larger context windows (Claude with 200K, Gemini with 1M+) may be more susceptible to 上下文視窗 manipulation where 對抗性 content is buried in large amounts of benign text.
-
Multimodal capabilities: Models that process images, audio, or other modalities introduce additional attack surfaces not present in text-only models.
-
工具使用 實作: The 實作 details of 函式呼叫 vary significantly between providers. OpenAI uses a structured 函式呼叫 format, while Anthropic uses 工具使用 blocks. These differences affect 利用 techniques.
Operational Considerations
測試 Ethics and Boundaries
Professional 紅隊 測試 operates within clear ethical and legal boundaries:
- Authorization: Always obtain written 授權 before 測試. This should specify the scope, methods allowed, and any restrictions.
- Scope limits: Stay within the authorized scope. If you discover a 漏洞 that leads outside the authorized perimeter, document it and report it without exploiting it.
- Data handling: Handle any sensitive data discovered during 測試 according to the engagement agreement. Never retain sensitive data beyond what's needed for reporting.
- Responsible disclosure: Follow responsible disclosure practices for any 漏洞 discovered, particularly if they affect systems beyond your 測試 scope.
Documenting Results
Professional documentation follows a structured format:
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
@dataclass
class Finding:
"""Structure for documenting a 安全 finding."""
id: str
title: str
severity: str # Critical, High, Medium, Low, Informational
category: str # OWASP LLM Top 10 category
description: str
steps_to_reproduce: list[str]
impact: str
recommendation: str
evidence: list[str] = field(default_factory=list)
mitre_atlas: Optional[str] = None
cvss_score: Optional[float] = None
discovered_at: str = field(default_factory=lambda: datetime.now().isoformat())
def to_report_section(self) -> str:
"""Generate a report section for this finding."""
steps = "\n".join(f" {i+1}. {s}" for i, s in enumerate(self.steps_to_reproduce))
return f"""
### {self.id}: {self.title}
**Severity**: {self.severity}
**Category**: {self.category}
{f"**MITRE ATLAS**: {self.mitre_atlas}" if self.mitre_atlas else ""}
#### Description
{self.description}
#### Steps to Reproduce
{steps}
#### Impact
{self.impact}
#### Recommendation
{self.recommendation}
"""This structured approach ensures that findings are actionable and that remediation teams have the information they need to address the 漏洞 effectively.