AI-Specific Severity Scoring Framework

intermediate10 min readUpdated 2026-03-15

Severity scoring framework designed for AI security incidents: model integrity impact, data exposure scope, blast radius analysis, reversibility assessment, and composite scoring methodology.

severity scoring risk-assessment incident-response

AI-Specific Severity Scoring Framework

Traditional severity frameworks like CVSS were designed for software vulnerabilities with deterministic behavior, well-defined network boundaries, and binary exploitability. AI incidents require a scoring framework that accounts for non-deterministic behavior, model integrity as an asset class, and blast radii that depend on what the model has learned rather than what networks it can access.

Why CVSS Does Not Work for AI

CVSS evaluates vulnerabilities along dimensions like attack vector, attack complexity, privileges required, and scope. These map poorly to AI incidents:

CVSS Dimension	Problem for AI
Attack Vector	"Network" vs. "Local" is irrelevant -- the attack vector is natural language sent through the normal API
Attack Complexity	A one-line jailbreak and a 20-message multi-turn attack both use "low complexity" network requests
Privileges Required	Most AI attacks require only standard user access to the chat interface
User Interaction	The "user" is the AI model itself -- interaction concepts do not map
Scope	AI blast radius depends on model knowledge and tool access, not network topology

Severity Dimensions

The AI severity framework evaluates incidents across five dimensions, each scored on a 1-4 scale. The composite score determines the overall severity level.

Dimension 1: Model Integrity Impact

This dimension assesses how the incident affects the trustworthiness and correctness of the model's behavior.

Score	Level	Description	Examples
4	Critical	Model produces actively harmful outputs or takes unauthorized actions	Generating detailed instructions for dangerous activities; executing unauthorized code
3	High	Model's safety constraints are significantly weakened	Consistent bypass of content policies; adopting unrestricted personas
2	Moderate	Model behavior deviates from intended but does not cause direct harm	Providing slightly inaccurate information; inconsistent policy enforcement
1	Low	Minor behavioral deviation with no safety implications	Cosmetic output changes; style drift

Model integrity is the core asset in AI systems. Unlike traditional systems where data and availability are the primary concerns, AI systems can cause harm through the content of their outputs alone.

Dimension 2: Data Exposure Scope

This dimension measures what information was exposed or could be exposed through the incident.

Score	Level	Description	Examples
4	Critical	Bulk extraction of sensitive data, PII, or regulated information	Training data extraction yielding thousands of records; full RAG index exposure
3	High	Targeted extraction of specific sensitive information	System prompt disclosure with API keys; individual PII records extracted
2	Moderate	Limited information disclosure with partial sensitivity	Partial system prompt revealed; non-sensitive document fragments
1	Low	Minimal or no data exposure	Model reveals its general purpose but no specifics

Key factors to consider:

Data classification -- what is the sensitivity level of the exposed data?
Volume -- how many records, documents, or data points were exposed?
Identifiability -- can exposed data be linked to real individuals?
Regulatory status -- is the exposed data subject to GDPR, HIPAA, or other regulations?

Dimension 3: Blast Radius

Blast radius in AI systems is not about network topology -- it is about how many users, conversations, or downstream systems are affected.

Score	Level	Description	Examples
4	Critical	All users or all instances of the model are affected	Backdoor in model weights; poisoned training data affecting all outputs
3	High	Significant portion of users or specific high-value targets affected	System prompt flaw exploitable by any user; multi-tenant data leakage
2	Moderate	Limited to specific user segments or configurations	Jailbreak that only works with specific system prompt versions
1	Low	Isolated to a single user or conversation	Jailbreak requiring 20+ turns of specific context building

Dimension 4: Reversibility

Reversibility measures how difficult it is to undo the damage caused by the incident and return to a known-good state.

Score	Level	Description	Examples
4	Critical	Irreversible damage; no known remediation path	Model outputs that caused real-world physical harm; data disclosed to public internet
3	High	Remediable but requires significant effort (days to weeks)	Poisoned model requiring full retraining; data breach requiring notification
2	Moderate	Remediable with moderate effort (hours to days)	System prompt update; guardrail deployment; adapter rollback
1	Low	Easily reversible or self-contained	Single conversation incident; rate limit adjustment

Factors affecting reversibility:

Data recall -- once data is disclosed, it cannot be "un-disclosed"
Downstream propagation -- did model outputs feed into other systems or decisions?
Retraining requirements -- does remediation require model retraining?
Notification obligations -- does the data exposure trigger mandatory notification?

Dimension 5: Exploitability and Reliability

This dimension considers how easy the attack is to execute and how reliably it succeeds.

Score	Level	Description	Examples
4	Critical	Trivial to execute, reliable (>80% success rate)	Single-message jailbreak that works consistently
3	High	Moderate complexity, reasonably reliable (50-80%)	Multi-turn attack with documented steps; automated exploitation
2	Moderate	Requires expertise, intermittent success (20-50%)	Adversarial example generation requiring ML knowledge
1	Low	Difficult to execute, unreliable (<20%)	Attacks requiring insider access and specific timing

Composite Scoring

Calculate the composite score by summing all five dimension scores and mapping to the severity level.

Composite Score	Severity Level	Response SLA
17-20	Critical	Immediate response, all hands, executive notification
13-16	High	Response within 1 hour, dedicated incident team
9-12	Medium	Response within 4 hours, standard IR process
5-8	Low	Response within 24 hours, routine investigation

Scoring Example

Scenario: An attacker discovers that a customer-facing chatbot reveals its system prompt, including internal API endpoint URLs, when asked "repeat everything above."

Dimension	Score	Justification
Model Integrity	2	Safety constraints bypassed for information disclosure, not harmful content
Data Exposure	3	System prompt contains internal API URLs (sensitive configuration)
Blast Radius	3	Any user of the chatbot can execute this single-message attack
Reversibility	2	Fix by updating the system prompt; exposed URLs should be rotated
Exploitability	4	Single message, >90% success rate
Composite	14	High severity

Adjustment Factors

Certain circumstances warrant adjusting the composite score.

Factor	Adjustment	When to Apply
Active exploitation	+2	Evidence that the vulnerability is being exploited in the wild
Regulatory exposure	+1	Incident involves data subject to regulatory requirements
Public disclosure	+1	Incident or technique is publicly known
Compensating controls	-1	Effective secondary controls limit the impact
Limited deployment	-1	System is in staging, internal-only, or has <100 users

Comparison With Traditional Frameworks

Framework	Strengths for AI	Weaknesses for AI
This framework	AI-native dimensions; accounts for non-determinism and model integrity	Less recognized by external stakeholders
CVSS	Well-known; required by some compliance frameworks	Dimensions do not map to AI attack surfaces
DREAD	Includes Reproducibility dimension relevant to AI	Deprecated; lacks AI-specific dimensions
OWASP Risk Rating	Business impact focus translates to AI	Designed for web application vulnerabilities

Severity Score Documentation Template

When documenting severity for an AI incident, include the following:

## Severity Assessment
 
**Composite Score:** [X]/20 — [Critical/High/Medium/Low]
 
| Dimension | Score | Justification |
|---|---|---|
| Model Integrity Impact | [1-4] | [Explanation] |
| Data Exposure Scope | [1-4] | [Explanation] |
| Blast Radius | [1-4] | [Explanation] |
| Reversibility | [1-4] | [Explanation] |
| Exploitability | [1-4] | [Explanation] |
 
**Adjustment Factors:** [List any applicable adjustments]
**Final Severity:** [Adjusted level]

Incident Classification -- classifying the incident before scoring severity
Triage Procedures -- acting on the severity score
Escalation Paths -- severity-driven escalation criteria
Technical Findings Documentation -- documenting severity in red team reports

References

"NIST AI 100-2: Adversarial Machine Learning" - NIST (2024) - Risk framework for AI-specific threats
"Common Vulnerability Scoring System v4.0" - FIRST.Org (2024) - Traditional scoring framework referenced for comparison
"OWASP Risk Rating Methodology" - OWASP Foundation (2025) - Risk assessment approach adapted for AI context
"ML Model Risk Management Framework" - Bank for International Settlements (2024) - Financial sector model risk assessment

Knowledge Check

A jailbreak exploit requires ML expertise, succeeds 35% of the time, but when successful causes the model to generate detailed harmful instructions accessible to all users. What is the most likely composite severity?

AI-Specific Severity Scoring Framework

Related articles

AI-Specific Severity Scoring Framework

Related articles