Measuring and Reporting AI Red Team Effectiveness
Walkthrough for defining, collecting, and reporting metrics that measure the effectiveness of AI red teaming programs, covering coverage metrics, detection rates, time-to-find analysis, remediation tracking, and ROI calculation.
"How do we know our AI red teaming program is working?" is the question that every CISO eventually asks. Without metrics, the answer is subjective -- "we found some things" is not compelling when competing for budget against other security initiatives. This walkthrough defines a practical metrics framework that quantifies the value of AI red teaming, tracks improvement over time, and provides the data needed for program justification and resource allocation.
Step 1: Defining the Metrics Framework
Organize metrics into four categories that answer different stakeholder questions:
# AI Red Team Metrics Framework
## Category 1: Coverage Metrics
**Question**: "Are we testing everything we should be?"
| Metric | Definition | Target |
|--------|-----------|--------|
| OWASP LLM Top 10 Coverage | % of OWASP categories tested in each engagement | 100% for standard engagements |
| Attack Surface Coverage | % of enumerated components tested | >80% for standard, >95% for comprehensive |
| Test Case Volume | Number of unique test cases per engagement | >200 for standard engagements |
| Technique Diversity | Number of distinct attack techniques used | >15 per engagement |
| Model Coverage | % of production models tested in the quarter | >90% |
## Category 2: Detection Metrics
**Question**: "Are we finding real vulnerabilities?"
| Metric | Definition | Target |
|--------|-----------|--------|
| Vulnerability Count | Total findings per engagement by severity | Track trend, not absolute number |
| Detection Rate | Findings per testing hour | Track trend |
| Novel Finding Rate | % of findings not found in previous engagements | >20% indicates evolving methodology |
| False Positive Rate | % of reported findings that were not actual vulnerabilities | <5% |
| Time to First Finding | Hours from engagement start to first Critical/High finding | <4 hours |
## Category 3: Remediation Metrics
**Question**: "Are we driving actual security improvement?"
| Metric | Definition | Target |
|--------|-----------|--------|
| Remediation Rate | % of findings remediated within SLA | >90% for Critical, >80% for High |
| Mean Time to Remediate (MTTR) | Average days from report to fix by severity | <7 days Critical, <30 days High |
| Regression Rate | % of previously fixed findings that reappear | <5% |
| Retest Pass Rate | % of remediated findings that pass verification | >95% |
| Open Finding Age | Average age of unresolved findings | Decreasing trend |
## Category 4: Program Metrics
**Question**: "Is the program worth the investment?"
| Metric | Definition | Target |
|--------|-----------|--------|
| Cost per Finding | Total program cost / number of findings | Track trend (should stabilize) |
| Security Posture Score | Average model safety score across evaluations | Improving trend |
| Engagement Frequency | Red team engagements per quarter | Per policy |
| Tool ROI | Findings from automated tools vs. manual testing | Track ratio |
| Coverage Gap Closure | % of identified gaps addressed since last quarter | >50% per quarter |Step 2: Collecting Metrics Data
Build a data collection system that captures metrics automatically:
# metrics/collector.py
"""Collect and store AI red team metrics from engagement data."""
import json
import sqlite3
from datetime import datetime, date
from pathlib import Path
from dataclasses import dataclass
@dataclass
class EngagementMetrics:
engagement_id: str
client: str
start_date: str
end_date: str
engagement_type: str
total_hours: float
components_in_scope: int
components_tested: int
test_cases_executed: int
techniques_used: int
findings_critical: int
findings_high: int
findings_medium: int
findings_low: int
owasp_categories_tested: int
class MetricsCollector:
"""Collect and store engagement metrics."""
def __init__(self, db_path: str = "metrics/redteam_metrics.db"):
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
self.conn = sqlite3.connect(db_path)
self._init_db()
def _init_db(self):
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS engagements (
id TEXT PRIMARY KEY,
client TEXT,
start_date TEXT,
end_date TEXT,
engagement_type TEXT,
total_hours REAL,
components_in_scope INTEGER,
components_tested INTEGER,
test_cases INTEGER,
techniques_used INTEGER,
findings_critical INTEGER,
findings_high INTEGER,
findings_medium INTEGER,
findings_low INTEGER,
owasp_categories INTEGER
);
CREATE TABLE IF NOT EXISTS findings (
id TEXT PRIMARY KEY,
engagement_id TEXT,
severity TEXT,
category TEXT,
found_date TEXT,
reported_date TEXT,
remediated_date TEXT,
verified_date TEXT,
regression_count INTEGER DEFAULT 0,
FOREIGN KEY (engagement_id) REFERENCES engagements(id)
);
CREATE TABLE IF NOT EXISTS continuous_scans (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scan_date TEXT,
scan_type TEXT,
total_tests INTEGER,
passed INTEGER,
failed INTEGER,
model_name TEXT,
commit_sha TEXT
);
""")
self.conn.commit()
def record_engagement(self, metrics: EngagementMetrics):
"""Record metrics from a completed engagement."""
self.conn.execute(
"INSERT OR REPLACE INTO engagements VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
(
metrics.engagement_id, metrics.client, metrics.start_date,
metrics.end_date, metrics.engagement_type, metrics.total_hours,
metrics.components_in_scope, metrics.components_tested,
metrics.test_cases_executed, metrics.techniques_used,
metrics.findings_critical, metrics.findings_high,
metrics.findings_medium, metrics.findings_low,
metrics.owasp_categories_tested,
),
)
self.conn.commit()
def record_finding(self, finding_id: str, engagement_id: str,
severity: str, category: str, found_date: str):
"""Record a finding for tracking remediation."""
self.conn.execute(
"INSERT OR REPLACE INTO findings (id, engagement_id, severity, "
"category, found_date) VALUES (?,?,?,?,?)",
(finding_id, engagement_id, severity, category, found_date),
)
self.conn.commit()
def mark_remediated(self, finding_id: str, remediated_date: str):
"""Mark a finding as remediated."""
self.conn.execute(
"UPDATE findings SET remediated_date = ? WHERE id = ?",
(remediated_date, finding_id),
)
self.conn.commit()
def mark_verified(self, finding_id: str, verified_date: str):
"""Mark a remediated finding as verified."""
self.conn.execute(
"UPDATE findings SET verified_date = ? WHERE id = ?",
(verified_date, finding_id),
)
self.conn.commit()
def record_regression(self, finding_id: str):
"""Record that a previously fixed finding has regressed."""
self.conn.execute(
"UPDATE findings SET regression_count = regression_count + 1, "
"remediated_date = NULL, verified_date = NULL WHERE id = ?",
(finding_id,),
)
self.conn.commit()Step 3: Calculating Key Metrics
Compute each metric from the collected data:
# metrics/calculator.py
"""Calculate AI red team metrics from collected data."""
from datetime import datetime, timedelta
from metrics.collector import MetricsCollector
class MetricsCalculator:
"""Calculate metrics from the metrics database."""
def __init__(self, collector: MetricsCollector):
self.conn = collector.conn
def coverage_metrics(self, period_days: int = 90) -> dict:
"""Calculate coverage metrics for the specified period."""
cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
rows = self.conn.execute(
"SELECT AVG(CAST(components_tested AS REAL) / NULLIF(components_in_scope, 0)), "
"AVG(test_cases), AVG(techniques_used), AVG(owasp_categories) "
"FROM engagements WHERE start_date >= ?",
(cutoff,),
).fetchone()
return {
"avg_attack_surface_coverage": f"{(rows[0] or 0) * 100:.1f}%",
"avg_test_cases_per_engagement": round(rows[1] or 0),
"avg_techniques_per_engagement": round(rows[2] or 0),
"avg_owasp_categories_covered": round(rows[3] or 0),
}
def detection_metrics(self, period_days: int = 90) -> dict:
"""Calculate detection metrics."""
cutoff = (datetime.now() - timedelta(days=period_days)).isoformat()
# Total findings by severity
severity_counts = {}
for severity in ["Critical", "High", "Medium", "Low"]:
count = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ? AND found_date >= ?",
(severity, cutoff),
).fetchone()[0]
severity_counts[severity] = count
# Findings per hour
total_findings = sum(severity_counts.values())
total_hours = self.conn.execute(
"SELECT SUM(total_hours) FROM engagements WHERE start_date >= ?",
(cutoff,),
).fetchone()[0] or 1
# Novel finding rate (findings not in previous period)
prev_cutoff = (datetime.now() - timedelta(days=period_days * 2)).isoformat()
previous_categories = set(
row[0] for row in self.conn.execute(
"SELECT DISTINCT category FROM findings WHERE found_date >= ? AND found_date < ?",
(prev_cutoff, cutoff),
).fetchall()
)
current_categories = set(
row[0] for row in self.conn.execute(
"SELECT DISTINCT category FROM findings WHERE found_date >= ?",
(cutoff,),
).fetchall()
)
novel_categories = current_categories - previous_categories
novel_rate = len(novel_categories) / max(len(current_categories), 1) * 100
return {
"findings_by_severity": severity_counts,
"total_findings": total_findings,
"findings_per_hour": round(total_findings / total_hours, 2),
"novel_finding_rate": f"{novel_rate:.0f}%",
}
def remediation_metrics(self) -> dict:
"""Calculate remediation metrics."""
# Remediation rate by severity
for severity in ["Critical", "High"]:
total = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ?",
(severity,),
).fetchone()[0]
remediated = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE severity = ? AND remediated_date IS NOT NULL",
(severity,),
).fetchone()[0]
# MTTR calculation
mttr_rows = self.conn.execute(
"SELECT severity, AVG(julianday(remediated_date) - julianday(found_date)) "
"FROM findings WHERE remediated_date IS NOT NULL "
"GROUP BY severity",
).fetchall()
mttr = {row[0]: round(row[1], 1) for row in mttr_rows}
# Regression rate
total_findings = self.conn.execute(
"SELECT COUNT(*) FROM findings"
).fetchone()[0]
regressed = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE regression_count > 0"
).fetchone()[0]
regression_rate = regressed / max(total_findings, 1) * 100
# Retest pass rate
verified = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE verified_date IS NOT NULL"
).fetchone()[0]
retested = self.conn.execute(
"SELECT COUNT(*) FROM findings WHERE remediated_date IS NOT NULL"
).fetchone()[0]
retest_pass_rate = verified / max(retested, 1) * 100
return {
"mean_time_to_remediate_days": mttr,
"regression_rate": f"{regression_rate:.1f}%",
"retest_pass_rate": f"{retest_pass_rate:.0f}%",
}
def security_posture_trend(self, weeks: int = 12) -> list[dict]:
"""Get the security posture score trend from continuous scans."""
rows = self.conn.execute(
"SELECT strftime('%Y-W%W', scan_date) as week, "
"AVG(CAST(passed AS REAL) / NULLIF(total_tests, 0)) "
"FROM continuous_scans "
"WHERE scan_date >= date('now', ?) "
"GROUP BY week ORDER BY week",
(f'-{weeks * 7} days',),
).fetchall()
return [{"week": row[0], "safety_score": round(row[1] or 0, 3)} for row in rows]Step 4: Building the Metrics Dashboard
Create a report that presents metrics to stakeholders:
# metrics/dashboard.py
"""Generate a metrics dashboard report."""
from metrics.collector import MetricsCollector
from metrics.calculator import MetricsCalculator
from datetime import datetime
def generate_dashboard(db_path: str = "metrics/redteam_metrics.db") -> str:
"""Generate a markdown metrics dashboard."""
collector = MetricsCollector(db_path)
calc = MetricsCalculator(collector)
coverage = calc.coverage_metrics()
detection = calc.detection_metrics()
remediation = calc.remediation_metrics()
trend = calc.security_posture_trend()
report = f"""# AI Red Team Program Metrics Dashboard
**Report Date**: {datetime.now().strftime('%Y-%m-%d')}
**Period**: Last 90 days
---
## Coverage Metrics
| Metric | Value | Target | Status |
|--------|-------|--------|--------|
| Attack Surface Coverage | {coverage['avg_attack_surface_coverage']} | >80% | {'On Track' if float(coverage['avg_attack_surface_coverage'].rstrip('%')) > 80 else 'Below Target'} |
| Test Cases per Engagement | {coverage['avg_test_cases_per_engagement']} | >200 | {'On Track' if coverage['avg_test_cases_per_engagement'] > 200 else 'Below Target'} |
| Techniques per Engagement | {coverage['avg_techniques_per_engagement']} | >15 | {'On Track' if coverage['avg_techniques_per_engagement'] > 15 else 'Below Target'} |
| OWASP Categories Covered | {coverage['avg_owasp_categories_covered']}/10 | 10 | {'On Track' if coverage['avg_owasp_categories_covered'] >= 10 else 'Below Target'} |
## Detection Metrics
| Metric | Value |
|--------|-------|
| Total Findings | {detection['total_findings']} |
| Critical | {detection['findings_by_severity'].get('Critical', 0)} |
| High | {detection['findings_by_severity'].get('High', 0)} |
| Medium | {detection['findings_by_severity'].get('Medium', 0)} |
| Low | {detection['findings_by_severity'].get('Low', 0)} |
| Findings per Testing Hour | {detection['findings_per_hour']} |
| Novel Finding Rate | {detection['novel_finding_rate']} |
## Remediation Metrics
| Metric | Value | Target |
|--------|-------|--------|
| MTTR (Critical) | {remediation['mean_time_to_remediate_days'].get('Critical', 'N/A')} days | <7 days |
| MTTR (High) | {remediation['mean_time_to_remediate_days'].get('High', 'N/A')} days | <30 days |
| Regression Rate | {remediation['regression_rate']} | <5% |
| Retest Pass Rate | {remediation['retest_pass_rate']} | >95% |
## Security Posture Trend
"""
if trend:
report += "| Week | Safety Score |\n|------|-------------|\n"
for entry in trend:
bar_length = int(entry['safety_score'] * 20)
bar = '█' * bar_length + '░' * (20 - bar_length)
report += f"| {entry['week']} | {entry['safety_score']:.3f} {bar} |\n"
else:
report += "*No continuous scan data available for trend analysis.*\n"
return report
if __name__ == "__main__":
print(generate_dashboard())Step 5: Calculating Program ROI
Quantify the return on investment for the AI red teaming program:
# metrics/roi.py
"""Calculate ROI for the AI red teaming program."""
from dataclasses import dataclass
@dataclass
class ROICalculation:
# Costs
team_cost: float # Annual team salary/contractor costs
tool_cost: float # Annual tool and API costs
infrastructure_cost: float # Lab, compute, etc.
# Value (risk reduction)
critical_findings: int
high_findings: int
avg_breach_cost: float # Industry average cost of an AI security incident
estimated_breach_probability_reduction: float # 0.0-1.0
@property
def total_cost(self) -> float:
return self.team_cost + self.tool_cost + self.infrastructure_cost
@property
def estimated_risk_reduction(self) -> float:
"""Estimated annual risk reduction in dollar terms."""
# Each critical finding represents a prevented potential incident
critical_value = self.critical_findings * self.avg_breach_cost * 0.3
high_value = self.high_findings * self.avg_breach_cost * 0.1
return (critical_value + high_value) * self.estimated_breach_probability_reduction
@property
def roi_ratio(self) -> float:
return self.estimated_risk_reduction / self.total_cost if self.total_cost > 0 else 0
def generate_roi_report(self) -> str:
return f"""## Program ROI Analysis
### Investment
| Item | Annual Cost |
|------|------------|
| Team (salary/contractors) | ${self.team_cost:,.0f} |
| Tools and API costs | ${self.tool_cost:,.0f} |
| Infrastructure | ${self.infrastructure_cost:,.0f} |
| **Total Investment** | **${self.total_cost:,.0f}** |
### Value Generated
| Item | Value |
|------|-------|
| Critical findings identified | {self.critical_findings} |
| High findings identified | {self.high_findings} |
| Industry avg breach cost | ${self.avg_breach_cost:,.0f} |
| Estimated risk reduction | ${self.estimated_risk_reduction:,.0f} |
### ROI
| Metric | Value |
|--------|-------|
| ROI Ratio | {self.roi_ratio:.1f}x |
| Net Value | ${self.estimated_risk_reduction - self.total_cost:,.0f} |
*Note: ROI calculations use industry-average breach costs and estimated
probability reduction. Actual values may vary.*
"""
# Example calculation
if __name__ == "__main__":
roi = ROICalculation(
team_cost=300000,
tool_cost=25000,
infrastructure_cost=10000,
critical_findings=8,
high_findings=15,
avg_breach_cost=4450000, # IBM Cost of a Data Breach 2025
estimated_breach_probability_reduction=0.15,
)
print(roi.generate_roi_report())Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Metrics show no improvement | Testing methodology hasn't changed; finding same issues | Update test suites with new techniques quarterly |
| Finding count drops after first engagement | Low-hanging fruit already found | This is expected; track novel finding rate instead of raw count |
| MTTR is misleading | Includes findings stuck in backlog | Report MTTR by severity; exclude intentionally deferred findings |
| Regression rate spikes | New deployments without CI testing | Implement continuous red teaming pipeline from the continuous testing walkthrough |
| Stakeholders fixate on a single metric | Metric presented without context | Always present metrics in context with targets and trends |
| ROI calculation disputed | Breach cost estimates are speculative | Use industry benchmarks (IBM, Verizon DBIR) and present ranges |
Key Takeaways
Measuring AI red team effectiveness transforms the program from a cost center into a demonstrable risk reduction investment:
- Measure what matters to each audience -- coverage metrics for the security team, remediation metrics for engineering, and ROI for executive leadership. No single metric serves all audiences.
- Track trends, not snapshots -- a single engagement's finding count is meaningless in isolation. The trend of security posture over time tells the real story.
- Novel findings indicate program health -- if every engagement finds only the same issues, the methodology has stagnated. A healthy novel finding rate (>20%) indicates evolving attack techniques.
- Regression rate is the most actionable metric -- it directly measures whether fixes are holding. High regression rates indicate a need for continuous testing, not more point-in-time engagements.
- ROI requires honest estimation -- breach cost estimates and probability reductions are inherently uncertain. Present ranges rather than point estimates, and anchor to industry benchmarks.
Advanced Considerations
Adapting to Modern Defenses
The defensive landscape for LLM applications has evolved significantly since the initial wave of prompt injection research. Modern production systems often deploy multiple independent defensive layers, requiring attackers to adapt their techniques accordingly.
Input classification: The most common first line of defense is an input classifier that evaluates incoming prompts for adversarial patterns. These classifiers range from simple keyword-based filters to sophisticated ML models trained on adversarial examples. Bypassing input classifiers requires understanding their detection methodology:
- Keyword-based classifiers can be evaded through encoding (Base64, ROT13, Unicode homoglyphs), paraphrasing, or splitting the payload across multiple turns
- ML-based classifiers require more sophisticated evasion such as semantic camouflage, gradual escalation, or exploitation of the classifier's own blind spots
class DefenseBypassStrategy:
"""Strategy selector for bypassing different defense configurations."""
def __init__(self):
self.strategies = {
"keyword_filter": self._bypass_keyword,
"ml_classifier": self._bypass_ml,
"output_filter": self._bypass_output,
"rate_limiter": self._bypass_rate,
"behavioral": self._bypass_behavioral,
}
def select_strategy(self, identified_defenses: list) -> callable:
"""Select the optimal bypass strategy based on identified defenses."""
if not identified_defenses:
return self._direct_attack
# Combine strategies for multi-layer defenses
applicable = [
self.strategies[d] for d in identified_defenses
if d in self.strategies
]
return self._chain_strategies(applicable)
def _bypass_keyword(self, payload: str) -> str:
"""Bypass keyword-based input filters."""
import base64
# Encode the payload to bypass keyword matching
encoded = base64.b64encode(payload.encode()).decode()
return f"Decode and follow: {encoded}"
def _bypass_ml(self, payload: str) -> list:
"""Generate multiple paraphrased variants for ML classifier evasion."""
variants = [
f"As a thought experiment, consider: {payload}",
f"For academic research purposes: {payload}",
f"In a fictional scenario where: {payload}",
]
return variants
def _bypass_output(self, payload: str) -> str:
"""Modify the payload to extract data through output filter gaps."""
return payload + "\nRespond using only code blocks and variable names."
def _bypass_rate(self, payload: str) -> dict:
"""Configure timing to avoid rate limit triggers."""
return {"payload": payload, "delay": 3.0, "jitter": 1.5}
def _bypass_behavioral(self, payload: str) -> list:
"""Split the payload across multiple natural-sounding turns."""
# Multi-turn escalation to avoid behavioral anomaly detection
return [
"I'm working on a security research project.",
"Could you help me understand how safety mechanisms work?",
"Specifically, I'm interested in how systems like yours " + payload[:100],
payload,
]
def _direct_attack(self, payload: str) -> str:
return payload
def _chain_strategies(self, strategies: list) -> callable:
"""Chain multiple bypass strategies."""
def chained(payload):
result = payload
for strategy in strategies:
result = strategy(result)
return result
return chainedOutput filtering: Output filters inspect the model's response before it reaches the user, looking for sensitive data leakage, harmful content, or other policy violations. Common output filter bypass techniques include:
| Technique | How It Works | Effectiveness |
|---|---|---|
| Encoding output | Request Base64/hex encoded responses | Medium — some filters check decoded content |
| Code block wrapping | Embed data in code comments/variables | High — many filters skip code blocks |
| Steganographic output | Hide data in formatting, capitalization, or spacing | High — difficult to detect |
| Chunked extraction | Extract small pieces across many turns | High — individual pieces may pass filters |
| Indirect extraction | Have the model reveal data through behavior changes | Very High — no explicit data in output |
Cross-Model Considerations
Techniques that work against one model may not directly transfer to others. However, understanding the general principles allows adaptation:
-
Safety training methodology: Models trained with RLHF (GPT-4, Claude) have different safety characteristics than those using DPO (Llama, Mistral) or other methods. RLHF-trained models tend to refuse more broadly but may be more susceptible to multi-turn escalation.
-
Context window size: Models with larger context windows (Claude with 200K, Gemini with 1M+) may be more susceptible to context window manipulation where adversarial content is buried in large amounts of benign text.
-
Multimodal capabilities: Models that process images, audio, or other modalities introduce additional attack surfaces not present in text-only models.
-
Tool use implementation: The implementation details of function calling vary significantly between providers. OpenAI uses a structured function calling format, while Anthropic uses tool use blocks. These differences affect exploitation techniques.
Operational Considerations
Testing Ethics and Boundaries
Professional red team testing operates within clear ethical and legal boundaries:
- Authorization: Always obtain written authorization before testing. This should specify the scope, methods allowed, and any restrictions.
- Scope limits: Stay within the authorized scope. If you discover a vulnerability that leads outside the authorized perimeter, document it and report it without exploiting it.
- Data handling: Handle any sensitive data discovered during testing according to the engagement agreement. Never retain sensitive data beyond what's needed for reporting.
- Responsible disclosure: Follow responsible disclosure practices for any vulnerabilities discovered, particularly if they affect systems beyond your testing scope.
Documenting Results
Professional documentation follows a structured format:
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
@dataclass
class Finding:
"""Structure for documenting a security finding."""
id: str
title: str
severity: str # Critical, High, Medium, Low, Informational
category: str # OWASP LLM Top 10 category
description: str
steps_to_reproduce: list[str]
impact: str
recommendation: str
evidence: list[str] = field(default_factory=list)
mitre_atlas: Optional[str] = None
cvss_score: Optional[float] = None
discovered_at: str = field(default_factory=lambda: datetime.now().isoformat())
def to_report_section(self) -> str:
"""Generate a report section for this finding."""
steps = "\n".join(f" {i+1}. {s}" for i, s in enumerate(self.steps_to_reproduce))
return f"""
### {self.id}: {self.title}
**Severity**: {self.severity}
**Category**: {self.category}
{f"**MITRE ATLAS**: {self.mitre_atlas}" if self.mitre_atlas else ""}
#### Description
{self.description}
#### Steps to Reproduce
{steps}
#### Impact
{self.impact}
#### Recommendation
{self.recommendation}
"""This structured approach ensures that findings are actionable and that remediation teams have the information they need to address the vulnerabilities effectively.