AI Incident Severity Scoring
Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.
概覽
When an AI 安全 incident occurs, one of the first decisions the response team must make is how severe it is. This determination drives everything downstream: how many responders are mobilized, which stakeholders are notified, what communications are sent, and how quickly the organization acts. Getting severity scoring wrong in either direction is costly -- overscoring wastes resources and creates alert fatigue, while underscoring can allow a critical incident to escalate unchecked.
Traditional severity scoring frameworks like CVSS (Common 漏洞 Scoring System) were designed for software 漏洞 in deterministic systems. AI incidents introduce dimensions that these frameworks do not capture: the probabilistic nature of model behavior, the difficulty of determining blast radius when a model serves diverse use cases, the potential for 訓練資料 contamination to have delayed and widespread effects, and the reputational harm unique to AI systems that produce harmful or embarrassing outputs.
This article presents an AI-adapted severity scoring framework that builds on established standards -- NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, and CVSS -- while adding dimensions specific to AI 安全 incidents. The goal is to provide incident response teams with a repeatable, defensible methodology for severity classification.
Limitations of Traditional Scoring for AI Incidents
CVSS Gaps
The Common 漏洞 Scoring System (CVSS v4.0) evaluates 漏洞 along dimensions including attack vector, attack complexity, privileges required, user interaction, and impact on confidentiality, integrity, and availability. While these dimensions are relevant to AI systems, CVSS misses several AI-specific factors:
| CVSS Dimension | AI Gap |
|---|---|
| 攻擊 Vector | Does not distinguish between prompt-level, 訓練-level, and infrastructure-level attacks |
| 攻擊 Complexity | Cannot capture the stochastic nature of AI exploits that succeed probabilistically |
| Privileges Required | No concept of "model access levels" (API user vs. 微調 access vs. 訓練 pipeline access) |
| User Interaction | Does not model indirect 提示詞注入 where the "user" is unknowing |
| Confidentiality Impact | Does not capture 訓練資料 extraction or model weight theft |
| Integrity Impact | Does not distinguish between transient 輸出 manipulation and persistent model corruption |
| Availability Impact | Does not capture model degradation (partial availability loss) |
NIST AI RMF Context
The NIST AI Risk Management Framework (AI RMF 1.0, published January 2023) provides a broader risk management structure through its four core functions: Govern, Map, Measure, and Manage. The Measure function is most relevant to severity scoring, as it encompasses the 評估 of AI risks including:
- Likelihood and severity of potential harm
- Breadth of impact (number of affected individuals or systems)
- Whether harms are reversible or irreversible
- Whether the AI system operates in a 安全-critical context
Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.
The AI Incident Severity Framework (AISF)
Score Components
The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.
"""
AI Incident Severity Framework (AISF) scoring 實作.
Produces a composite severity score for AI 安全 incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
class SeverityLevel(Enum):
INFORMATIONAL = "informational" # 0.0 - 1.9
LOW = "low" # 2.0 - 3.9
MEDIUM = "medium" # 4.0 - 5.9
HIGH = "high" # 6.0 - 7.9
CRITICAL = "critical" # 8.0 - 10.0
@dataclass
class AISFDimension:
"""A single dimension of the AISF severity score."""
name: str
score: float # 0.0 to 10.0
weight: float # relative weight in composite
rationale: str # explanation for the score
@dataclass
class AISFScore:
"""Complete AISF severity 評估."""
dimensions: list[AISFDimension]
composite_score: float
severity_level: SeverityLevel
summary: str
def classify_severity(score: float) -> SeverityLevel:
if score < 2.0:
return SeverityLevel.INFORMATIONAL
if score < 4.0:
return SeverityLevel.LOW
if score < 6.0:
return SeverityLevel.MEDIUM
if score < 8.0:
return SeverityLevel.HIGH
return SeverityLevel.CRITICAL
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
"""
Compute the composite AISF score from individual dimension scores.
Uses weighted average with normalization.
"""
total_weight = sum(d.weight for d in dimensions)
if total_weight == 0:
raise ValueError("Total weight must be positive")
composite = sum(d.score * d.weight for d in dimensions) / total_weight
composite = round(min(max(composite, 0.0), 10.0), 1)
return AISFScore(
dimensions=dimensions,
composite_score=composite,
severity_level=classify_severity(composite),
summary=_generate_summary(dimensions, composite),
)
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
level = classify_severity(composite)
return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."Dimension 1: 攻擊 Sophistication (Weight: 0.10)
Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | Trivial, no specialized knowledge | Copy-pasting a known 越獄 prompt |
| 3-4 | Basic, uses publicly available tools | Using an off-the-shelf 對抗性 attack library |
| 5-6 | Moderate, requires domain expertise | Crafting targeted prompt injections for a specific application |
| 7-8 | Advanced, requires ML research skills | Developing novel 對抗性 perturbations or 後門 triggers |
| 9-10 | Expert, nation-state level | Training-time attacks requiring 供應鏈 compromise |
Dimension 2: Data Sensitivity Exposure (Weight: 0.20)
Measures the sensitivity of data that was exposed, extracted, or compromised. 這是 often the highest-weighted dimension 因為 data exposure drives regulatory, legal, and reputational impact.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | Public or non-sensitive data | Model reveals information already in public 訓練資料 |
| 3-4 | Internal but non-regulated data | Model leaks internal project names or non-sensitive configs |
| 5-6 | Regulated personal data (PII) | Model extraction of user data from 訓練 set |
| 7-8 | Highly sensitive personal or financial data | 訓練資料 containing SSNs, medical records, financial data |
| 9-10 | National 安全 or 安全-critical data | Classified information exposure, 安全 system compromise |
Dimension 3: Model Integrity Impact (Weight: 0.20)
Measures the degree to which 模型's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | No model change, transient 輸出 issue | Single anomalous response, no persistent effect |
| 3-4 | Temporary behavior modification | 越獄 that works for one session only |
| 5-6 | Persistent but reversible modification | Compromised 系統提示詞 that can be restored |
| 7-8 | Weight-level modification detected | Unauthorized 微調 that altered model weights |
| 9-10 | Irreversible or widespread corruption | 後門 in base model weights, 訓練 資料投毒 affecting foundation model |
Dimension 4: Blast Radius (Weight: 0.15)
Measures how many users, systems, or downstream applications are affected.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | Single user or session | One user found a 越獄 |
| 3-4 | Small group of users | Bug affecting a specific tenant |
| 5-6 | Significant user population | 漏洞 in a feature used by thousands |
| 7-8 | Organization-wide impact | Compromised model serving all internal applications |
| 9-10 | Cross-organization or public impact | Poisoned foundation model distributed to multiple orgs |
Dimension 5: Reversibility (Weight: 0.15)
Measures how easily the incident's effects can be reversed.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | Immediately reversible | Restart the service, clear the cache |
| 3-4 | Reversible with moderate effort | Redeploy from known-good checkpoint |
| 5-6 | Reversible with significant effort | Retrain model component, notify affected users |
| 7-8 | Partially irreversible | Extracted data cannot be "unextracted" |
| 9-10 | Fully irreversible | Model weights leaked publicly, 訓練資料 breach |
Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)
Measures the regulatory, legal, and compliance implications of the incident.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | No regulatory implications | Internal tool, no regulated data |
| 3-4 | Minor compliance considerations | Logging gap that should be documented |
| 5-6 | Reportable under existing frameworks | GDPR breach notification may be required |
| 7-8 | Significant regulatory action likely | EU AI Act high-risk system violation |
| 9-10 | Major enforcement action expected | Class-action potential, regulatory investigation |
Dimension 7: Reputational Impact (Weight: 0.10)
Measures potential reputational harm if the incident becomes public.
| Score | Description | 範例 |
|---|---|---|
| 1-2 | No external visibility | Internal incident, no user impact |
| 3-4 | Minor, contained to affected users | Bug that a few users noticed |
| 5-6 | Moderate, industry 注意力 likely | 安全 researchers may publish findings |
| 7-8 | Significant media coverage expected | Mainstream tech press coverage |
| 9-10 | Major public controversy | Front-page news, congressional scrutiny |
Scoring in Practice
範例: 提示詞注入 Data Exfiltration
def score_prompt_injection_exfiltration() -> AISFScore:
"""
範例: Score a 提示詞注入 incident where 攻擊者
used indirect 提示詞注入 via a RAG document to extract
other users' conversation data from a customer support chatbot.
"""
dimensions = [
AISFDimension(
name="攻擊 Sophistication",
score=5.0,
weight=0.10,
rationale="Targeted indirect 提示詞注入 requiring knowledge "
"of the RAG pipeline and 輸出 format",
),
AISFDimension(
name="Data Sensitivity",
score=7.0,
weight=0.20,
rationale="Customer support conversations contain PII including "
"names, email addresses, and account details",
),
AISFDimension(
name="Model Integrity",
score=3.0,
weight=0.20,
rationale="Model itself was not modified; attack exploited the "
"application architecture",
),
AISFDimension(
name="Blast Radius",
score=6.0,
weight=0.15,
rationale="Approximately 2,000 customer conversations were "
"potentially accessible through the 漏洞",
),
AISFDimension(
name="Reversibility",
score=8.0,
weight=0.15,
rationale="Extracted data cannot be recalled; affected customers "
"must be notified",
),
AISFDimension(
name="Regulatory Exposure",
score=7.0,
weight=0.10,
rationale="PII exposure triggers GDPR Article 33 notification "
"requirement (72-hour window)",
),
AISFDimension(
name="Reputational Impact",
score=6.0,
weight=0.10,
rationale="Customer trust impact; potential media coverage of "
"AI chatbot leaking customer data",
),
]
return compute_aisf_score(dimensions)
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"總結: {result.summary}")範例: Model 後門 Discovery
def score_model_backdoor_discovery() -> AISFScore:
"""
範例: Score the discovery of a 後門 in a third-party
model that was deployed in a financial fraud 偵測 system.
The 後門 causes 模型 to classify certain fraudulent
transactions as legitimate when a specific pattern is present.
"""
dimensions = [
AISFDimension(
name="攻擊 Sophistication",
score=9.0,
weight=0.10,
rationale="Training-time 後門 attack requiring access to the "
"model 訓練 pipeline or 供應鏈",
),
AISFDimension(
name="Data Sensitivity",
score=8.0,
weight=0.20,
rationale="Financial transaction data with PII and account details; "
"fraudulent transactions enabled by the 後門",
),
AISFDimension(
name="Model Integrity",
score=9.0,
weight=0.20,
rationale="Model weights contain a persistent 後門; 模型 "
"cannot be trusted for any predictions",
),
AISFDimension(
name="Blast Radius",
score=8.0,
weight=0.15,
rationale="All transactions processed by this model since "
"deployment are potentially affected",
),
AISFDimension(
name="Reversibility",
score=6.0,
weight=0.15,
rationale="Model can be replaced, but all historical predictions "
"must be re-evaluated; financial losses may not be recoverable",
),
AISFDimension(
name="Regulatory Exposure",
score=8.0,
weight=0.10,
rationale="Financial services regulation requires reporting; "
"potential regulatory action for inadequate model validation",
),
AISFDimension(
name="Reputational Impact",
score=7.0,
weight=0.10,
rationale="Trust in AI-based fraud 偵測 undermined; "
"customers affected by enabled fraud",
),
]
return compute_aisf_score(dimensions)Integrating Severity Scoring into IR Workflows
Automated Initial Scoring
For organizations processing high volumes of AI-related alerts, automated initial scoring can triage incidents before human review. The automated scorer uses observable metrics to produce a preliminary score.
def automated_initial_score(alert: dict) -> AISFScore:
"""
Produce an automated preliminary severity score from alert data.
這是 intended for initial triage only. Human review is required
for any incident scoring MEDIUM or above.
Expected alert fields:
- alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
- users_affected: int
- data_classification: str ("public", "internal", "confidential", "restricted")
- model_modified: bool
- reversible: bool
- external_visibility: bool
"""
# Map data classification to sensitivity score
sensitivity_map = {
"public": 1.0,
"internal": 3.0,
"confidential": 6.0,
"restricted": 9.0,
}
# Map user count to blast radius score
users = alert.get("users_affected", 0)
if users <= 1:
blast_score = 2.0
elif users <= 100:
blast_score = 4.0
elif users <= 10000:
blast_score = 6.0
elif users <= 1000000:
blast_score = 8.0
else:
blast_score = 10.0
dimensions = [
AISFDimension("攻擊 Sophistication", 5.0, 0.10, "Default pending investigation"),
AISFDimension(
"Data Sensitivity",
sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
0.20,
f"Data classification: {alert.get('data_classification', 'unknown')}",
),
AISFDimension(
"Model Integrity",
8.0 if alert.get("model_modified") else 3.0,
0.20,
"Model modification detected" if alert.get("model_modified") else "No model modification",
),
AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
AISFDimension(
"Reversibility",
3.0 if alert.get("reversible") else 7.0,
0.15,
"Reversible" if alert.get("reversible") else "Not easily reversible",
),
AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
AISFDimension(
"Reputational Impact",
7.0 if alert.get("external_visibility") else 3.0,
0.10,
"Externally visible" if alert.get("external_visibility") else "Internal only",
),
]
return compute_aisf_score(dimensions)Escalation Matrix
Map AISF severity levels to organizational response actions:
| Severity | Response Time | Notification | Staffing | Executive Briefing |
|---|---|---|---|---|
| Informational | Next business day | Ticket created | On-call review | No |
| Low | 8 hours | Team lead notified | Assigned analyst | No |
| Medium | 4 hours | IR team activated | Dedicated team | Daily summary |
| High | 1 hour | CISO notified | Full IR team | Immediate |
| Critical | 15 minutes | C-suite and board | All hands, external support | Continuous |
Score Evolution Tracking
Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's 理解 of the incident evolved.
from datetime import datetime
@dataclass
class ScoreRevision:
"""A point-in-time severity score revision."""
timestamp: str
score: AISFScore
analyst: str
reason: str
def track_severity_evolution(
revisions: list[ScoreRevision],
) -> dict:
"""
Analyze how the severity score evolved during the investigation.
Significant score changes indicate that initial assumptions
were revised as new evidence emerged.
"""
if not revisions:
return {"error": "No revisions to analyze"}
scores = [r.score.composite_score for r in revisions]
return {
"initial_score": scores[0],
"final_score": scores[-1],
"peak_score": max(scores),
"score_delta": scores[-1] - scores[0],
"revision_count": len(revisions),
"escalations": sum(
1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
),
"deescalations": sum(
1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
),
}Mapping to Existing Frameworks
MITRE ATLAS Integration
ATLAS techniques can inform the 攻擊 Sophistication dimension. Map observed techniques to their complexity:
- Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
- ML 攻擊 Staging (AML.TA0001): Moderate sophistication (4-7)
- ML Model Access (AML.TA0003): Varies widely depending on method (3-9)
OWASP LLM Top 10 Integration
The OWASP LLM Top 10 (2025 edition) provides risk ratings that can calibrate AISF scoring. 例如, LLM01 (提示詞注入) carries an inherently high impact rating in OWASP, which should be reflected in the Model Integrity and Data Sensitivity dimensions of AISF.
EU AI Act Risk Categories
For organizations operating under the EU AI Act, severity scoring should also map to the Act's risk categories (unacceptable, high-risk, limited risk, minimal risk). Incidents involving high-risk AI systems automatically receive a minimum severity floor of MEDIUM in the AISF framework.
參考文獻
- NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
- FIRST. (2024). Common 漏洞 Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
- MITRE ATLAS. (2024). 對抗性 Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
- OWASP. (2025). OWASP Top 10 for 大型語言模型 Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/