AI Incident Severity Scoring
Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.
Overview
When an AI security incident occurs, one of the first decisions the response team must make is how severe it is. This determination drives everything downstream: how many responders are mobilized, which stakeholders are notified, what communications are sent, and how quickly the organization acts. Getting severity scoring wrong in either direction is costly -- overscoring wastes resources and creates alert fatigue, while underscoring can allow a critical incident to escalate unchecked.
Traditional severity scoring frameworks like CVSS (Common Vulnerability Scoring System) were designed for software vulnerabilities in deterministic systems. AI incidents introduce dimensions that these frameworks do not capture: the probabilistic nature of model behavior, the difficulty of determining blast radius when a model serves diverse use cases, the potential for training data contamination to have delayed and widespread effects, and the reputational harm unique to AI systems that produce harmful or embarrassing outputs.
This article presents an AI-adapted severity scoring framework that builds on established standards -- NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, and CVSS -- while adding dimensions specific to AI security incidents. The goal is to provide incident response teams with a repeatable, defensible methodology for severity classification.
Limitations of Traditional Scoring for AI Incidents
CVSS Gaps
The Common Vulnerability Scoring System (CVSS v4.0) evaluates vulnerabilities along dimensions including attack vector, attack complexity, privileges required, user interaction, and impact on confidentiality, integrity, and availability. While these dimensions are relevant to AI systems, CVSS misses several AI-specific factors:
| CVSS Dimension | AI Gap |
|---|---|
| Attack Vector | Does not distinguish between prompt-level, training-level, and infrastructure-level attacks |
| Attack Complexity | Cannot capture the stochastic nature of AI exploits that succeed probabilistically |
| Privileges Required | No concept of "model access levels" (API user vs. fine-tuning access vs. training pipeline access) |
| User Interaction | Does not model indirect prompt injection where the "user" is unknowing |
| Confidentiality Impact | Does not capture training data extraction or model weight theft |
| Integrity Impact | Does not distinguish between transient output manipulation and persistent model corruption |
| Availability Impact | Does not capture model degradation (partial availability loss) |
NIST AI RMF Context
The NIST AI Risk Management Framework (AI RMF 1.0, published January 2023) provides a broader risk management structure through its four core functions: Govern, Map, Measure, and Manage. The Measure function is most relevant to severity scoring, as it encompasses the assessment of AI risks including:
- Likelihood and severity of potential harm
- Breadth of impact (number of affected individuals or systems)
- Whether harms are reversible or irreversible
- Whether the AI system operates in a safety-critical context
Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.
The AI Incident Severity Framework (AISF)
Score Components
The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.
"""
AI Incident Severity Framework (AISF) scoring implementation.
Produces a composite severity score for AI security incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
class SeverityLevel(Enum):
INFORMATIONAL = "informational" # 0.0 - 1.9
LOW = "low" # 2.0 - 3.9
MEDIUM = "medium" # 4.0 - 5.9
HIGH = "high" # 6.0 - 7.9
CRITICAL = "critical" # 8.0 - 10.0
@dataclass
class AISFDimension:
"""A single dimension of the AISF severity score."""
name: str
score: float # 0.0 to 10.0
weight: float # relative weight in composite
rationale: str # explanation for the score
@dataclass
class AISFScore:
"""Complete AISF severity assessment."""
dimensions: list[AISFDimension]
composite_score: float
severity_level: SeverityLevel
summary: str
def classify_severity(score: float) -> SeverityLevel:
if score < 2.0:
return SeverityLevel.INFORMATIONAL
if score < 4.0:
return SeverityLevel.LOW
if score < 6.0:
return SeverityLevel.MEDIUM
if score < 8.0:
return SeverityLevel.HIGH
return SeverityLevel.CRITICAL
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
"""
Compute the composite AISF score from individual dimension scores.
Uses weighted average with normalization.
"""
total_weight = sum(d.weight for d in dimensions)
if total_weight == 0:
raise ValueError("Total weight must be positive")
composite = sum(d.score * d.weight for d in dimensions) / total_weight
composite = round(min(max(composite, 0.0), 10.0), 1)
return AISFScore(
dimensions=dimensions,
composite_score=composite,
severity_level=classify_severity(composite),
summary=_generate_summary(dimensions, composite),
)
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
level = classify_severity(composite)
return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."Dimension 1: Attack Sophistication (Weight: 0.10)
Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.
| Score | Description | Example |
|---|---|---|
| 1-2 | Trivial, no specialized knowledge | Copy-pasting a known jailbreak prompt |
| 3-4 | Basic, uses publicly available tools | Using an off-the-shelf adversarial attack library |
| 5-6 | Moderate, requires domain expertise | Crafting targeted prompt injections for a specific application |
| 7-8 | Advanced, requires ML research skills | Developing novel adversarial perturbations or backdoor triggers |
| 9-10 | Expert, nation-state level | Training-time attacks requiring supply chain compromise |
Dimension 2: Data Sensitivity Exposure (Weight: 0.20)
Measures the sensitivity of data that was exposed, extracted, or compromised. This is often the highest-weighted dimension because data exposure drives regulatory, legal, and reputational impact.
| Score | Description | Example |
|---|---|---|
| 1-2 | Public or non-sensitive data | Model reveals information already in public training data |
| 3-4 | Internal but non-regulated data | Model leaks internal project names or non-sensitive configs |
| 5-6 | Regulated personal data (PII) | Model extraction of user data from training set |
| 7-8 | Highly sensitive personal or financial data | Training data containing SSNs, medical records, financial data |
| 9-10 | National security or safety-critical data | Classified information exposure, safety system compromise |
Dimension 3: Model Integrity Impact (Weight: 0.20)
Measures the degree to which the model's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.
| Score | Description | Example |
|---|---|---|
| 1-2 | No model change, transient output issue | Single anomalous response, no persistent effect |
| 3-4 | Temporary behavior modification | Jailbreak that works for one session only |
| 5-6 | Persistent but reversible modification | Compromised system prompt that can be restored |
| 7-8 | Weight-level modification detected | Unauthorized fine-tuning that altered model weights |
| 9-10 | Irreversible or widespread corruption | Backdoor in base model weights, training data poisoning affecting foundation model |
Dimension 4: Blast Radius (Weight: 0.15)
Measures how many users, systems, or downstream applications are affected.
| Score | Description | Example |
|---|---|---|
| 1-2 | Single user or session | One user found a jailbreak |
| 3-4 | Small group of users | Bug affecting a specific tenant |
| 5-6 | Significant user population | Vulnerability in a feature used by thousands |
| 7-8 | Organization-wide impact | Compromised model serving all internal applications |
| 9-10 | Cross-organization or public impact | Poisoned foundation model distributed to multiple orgs |
Dimension 5: Reversibility (Weight: 0.15)
Measures how easily the incident's effects can be reversed.
| Score | Description | Example |
|---|---|---|
| 1-2 | Immediately reversible | Restart the service, clear the cache |
| 3-4 | Reversible with moderate effort | Redeploy from known-good checkpoint |
| 5-6 | Reversible with significant effort | Retrain model component, notify affected users |
| 7-8 | Partially irreversible | Extracted data cannot be "unextracted" |
| 9-10 | Fully irreversible | Model weights leaked publicly, training data breach |
Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)
Measures the regulatory, legal, and compliance implications of the incident.
| Score | Description | Example |
|---|---|---|
| 1-2 | No regulatory implications | Internal tool, no regulated data |
| 3-4 | Minor compliance considerations | Logging gap that should be documented |
| 5-6 | Reportable under existing frameworks | GDPR breach notification may be required |
| 7-8 | Significant regulatory action likely | EU AI Act high-risk system violation |
| 9-10 | Major enforcement action expected | Class-action potential, regulatory investigation |
Dimension 7: Reputational Impact (Weight: 0.10)
Measures potential reputational harm if the incident becomes public.
| Score | Description | Example |
|---|---|---|
| 1-2 | No external visibility | Internal incident, no user impact |
| 3-4 | Minor, contained to affected users | Bug that a few users noticed |
| 5-6 | Moderate, industry attention likely | Security researchers may publish findings |
| 7-8 | Significant media coverage expected | Mainstream tech press coverage |
| 9-10 | Major public controversy | Front-page news, congressional scrutiny |
Scoring in Practice
Example: Prompt Injection Data Exfiltration
def score_prompt_injection_exfiltration() -> AISFScore:
"""
Example: Score a prompt injection incident where an attacker
used indirect prompt injection via a RAG document to extract
other users' conversation data from a customer support chatbot.
"""
dimensions = [
AISFDimension(
name="Attack Sophistication",
score=5.0,
weight=0.10,
rationale="Targeted indirect prompt injection requiring knowledge "
"of the RAG pipeline and output format",
),
AISFDimension(
name="Data Sensitivity",
score=7.0,
weight=0.20,
rationale="Customer support conversations contain PII including "
"names, email addresses, and account details",
),
AISFDimension(
name="Model Integrity",
score=3.0,
weight=0.20,
rationale="Model itself was not modified; attack exploited the "
"application architecture",
),
AISFDimension(
name="Blast Radius",
score=6.0,
weight=0.15,
rationale="Approximately 2,000 customer conversations were "
"potentially accessible through the vulnerability",
),
AISFDimension(
name="Reversibility",
score=8.0,
weight=0.15,
rationale="Extracted data cannot be recalled; affected customers "
"must be notified",
),
AISFDimension(
name="Regulatory Exposure",
score=7.0,
weight=0.10,
rationale="PII exposure triggers GDPR Article 33 notification "
"requirement (72-hour window)",
),
AISFDimension(
name="Reputational Impact",
score=6.0,
weight=0.10,
rationale="Customer trust impact; potential media coverage of "
"AI chatbot leaking customer data",
),
]
return compute_aisf_score(dimensions)
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"Summary: {result.summary}")Example: Model Backdoor Discovery
def score_model_backdoor_discovery() -> AISFScore:
"""
Example: Score the discovery of a backdoor in a third-party
model that was deployed in a financial fraud detection system.
The backdoor causes the model to classify certain fraudulent
transactions as legitimate when a specific pattern is present.
"""
dimensions = [
AISFDimension(
name="Attack Sophistication",
score=9.0,
weight=0.10,
rationale="Training-time backdoor attack requiring access to the "
"model training pipeline or supply chain",
),
AISFDimension(
name="Data Sensitivity",
score=8.0,
weight=0.20,
rationale="Financial transaction data with PII and account details; "
"fraudulent transactions enabled by the backdoor",
),
AISFDimension(
name="Model Integrity",
score=9.0,
weight=0.20,
rationale="Model weights contain a persistent backdoor; the model "
"cannot be trusted for any predictions",
),
AISFDimension(
name="Blast Radius",
score=8.0,
weight=0.15,
rationale="All transactions processed by this model since "
"deployment are potentially affected",
),
AISFDimension(
name="Reversibility",
score=6.0,
weight=0.15,
rationale="Model can be replaced, but all historical predictions "
"must be re-evaluated; financial losses may not be recoverable",
),
AISFDimension(
name="Regulatory Exposure",
score=8.0,
weight=0.10,
rationale="Financial services regulation requires reporting; "
"potential regulatory action for inadequate model validation",
),
AISFDimension(
name="Reputational Impact",
score=7.0,
weight=0.10,
rationale="Trust in AI-based fraud detection undermined; "
"customers affected by enabled fraud",
),
]
return compute_aisf_score(dimensions)Integrating Severity Scoring into IR Workflows
Automated Initial Scoring
For organizations processing high volumes of AI-related alerts, automated initial scoring can triage incidents before human review. The automated scorer uses observable metrics to produce a preliminary score.
def automated_initial_score(alert: dict) -> AISFScore:
"""
Produce an automated preliminary severity score from alert data.
This is intended for initial triage only. Human review is required
for any incident scoring MEDIUM or above.
Expected alert fields:
- alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
- users_affected: int
- data_classification: str ("public", "internal", "confidential", "restricted")
- model_modified: bool
- reversible: bool
- external_visibility: bool
"""
# Map data classification to sensitivity score
sensitivity_map = {
"public": 1.0,
"internal": 3.0,
"confidential": 6.0,
"restricted": 9.0,
}
# Map user count to blast radius score
users = alert.get("users_affected", 0)
if users <= 1:
blast_score = 2.0
elif users <= 100:
blast_score = 4.0
elif users <= 10000:
blast_score = 6.0
elif users <= 1000000:
blast_score = 8.0
else:
blast_score = 10.0
dimensions = [
AISFDimension("Attack Sophistication", 5.0, 0.10, "Default pending investigation"),
AISFDimension(
"Data Sensitivity",
sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
0.20,
f"Data classification: {alert.get('data_classification', 'unknown')}",
),
AISFDimension(
"Model Integrity",
8.0 if alert.get("model_modified") else 3.0,
0.20,
"Model modification detected" if alert.get("model_modified") else "No model modification",
),
AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
AISFDimension(
"Reversibility",
3.0 if alert.get("reversible") else 7.0,
0.15,
"Reversible" if alert.get("reversible") else "Not easily reversible",
),
AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
AISFDimension(
"Reputational Impact",
7.0 if alert.get("external_visibility") else 3.0,
0.10,
"Externally visible" if alert.get("external_visibility") else "Internal only",
),
]
return compute_aisf_score(dimensions)Escalation Matrix
Map AISF severity levels to organizational response actions:
| Severity | Response Time | Notification | Staffing | Executive Briefing |
|---|---|---|---|---|
| Informational | Next business day | Ticket created | On-call review | No |
| Low | 8 hours | Team lead notified | Assigned analyst | No |
| Medium | 4 hours | IR team activated | Dedicated team | Daily summary |
| High | 1 hour | CISO notified | Full IR team | Immediate |
| Critical | 15 minutes | C-suite and board | All hands, external support | Continuous |
Score Evolution Tracking
Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's understanding of the incident evolved.
from datetime import datetime
@dataclass
class ScoreRevision:
"""A point-in-time severity score revision."""
timestamp: str
score: AISFScore
analyst: str
reason: str
def track_severity_evolution(
revisions: list[ScoreRevision],
) -> dict:
"""
Analyze how the severity score evolved during the investigation.
Significant score changes indicate that initial assumptions
were revised as new evidence emerged.
"""
if not revisions:
return {"error": "No revisions to analyze"}
scores = [r.score.composite_score for r in revisions]
return {
"initial_score": scores[0],
"final_score": scores[-1],
"peak_score": max(scores),
"score_delta": scores[-1] - scores[0],
"revision_count": len(revisions),
"escalations": sum(
1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
),
"deescalations": sum(
1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
),
}Mapping to Existing Frameworks
MITRE ATLAS Integration
ATLAS techniques can inform the Attack Sophistication dimension. Map observed techniques to their complexity:
- Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
- ML Attack Staging (AML.TA0001): Moderate sophistication (4-7)
- ML Model Access (AML.TA0003): Varies widely depending on method (3-9)
OWASP LLM Top 10 Integration
The OWASP LLM Top 10 (2025 edition) provides risk ratings that can calibrate AISF scoring. For example, LLM01 (Prompt Injection) carries an inherently high impact rating in OWASP, which should be reflected in the Model Integrity and Data Sensitivity dimensions of AISF.
EU AI Act Risk Categories
For organizations operating under the EU AI Act, severity scoring should also map to the Act's risk categories (unacceptable, high-risk, limited risk, minimal risk). Incidents involving high-risk AI systems automatically receive a minimum severity floor of MEDIUM in the AISF framework.
References
- NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
- FIRST. (2024). Common Vulnerability Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
- MITRE ATLAS. (2024). Adversarial Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
- OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/