AI Incident Severity Scoring

beginner14 min readUpdated 2026-03-20

Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.

ai-forensics-ir severity-scoring incident-response risk-assessment

Overview

When an AI security incident occurs, one of the first decisions the response team must make is how severe it is. This determination drives everything downstream: how many responders are mobilized, which stakeholders are notified, what communications are sent, and how quickly the organization acts. Getting severity scoring wrong in either direction is costly -- overscoring wastes resources and creates alert fatigue, while underscoring can allow a critical incident to escalate unchecked.

Traditional severity scoring frameworks like CVSS (Common Vulnerability Scoring System) were designed for software vulnerabilities in deterministic systems. AI incidents introduce dimensions that these frameworks do not capture: the probabilistic nature of model behavior, the difficulty of determining blast radius when a model serves diverse use cases, the potential for training data contamination to have delayed and widespread effects, and the reputational harm unique to AI systems that produce harmful or embarrassing outputs.

This article presents an AI-adapted severity scoring framework that builds on established standards -- NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, and CVSS -- while adding dimensions specific to AI security incidents. The goal is to provide incident response teams with a repeatable, defensible methodology for severity classification.

Limitations of Traditional Scoring for AI Incidents

CVSS Gaps

The Common Vulnerability Scoring System (CVSS v4.0) evaluates vulnerabilities along dimensions including attack vector, attack complexity, privileges required, user interaction, and impact on confidentiality, integrity, and availability. While these dimensions are relevant to AI systems, CVSS misses several AI-specific factors:

CVSS Dimension	AI Gap
Attack Vector	Does not distinguish between prompt-level, training-level, and infrastructure-level attacks
Attack Complexity	Cannot capture the stochastic nature of AI exploits that succeed probabilistically
Privileges Required	No concept of "model access levels" (API user vs. fine-tuning access vs. training pipeline access)
User Interaction	Does not model indirect prompt injection where the "user" is unknowing
Confidentiality Impact	Does not capture training data extraction or model weight theft
Integrity Impact	Does not distinguish between transient output manipulation and persistent model corruption
Availability Impact	Does not capture model degradation (partial availability loss)

NIST AI RMF Context

The NIST AI Risk Management Framework (AI RMF 1.0, published January 2023) provides a broader risk management structure through its four core functions: Govern, Map, Measure, and Manage. The Measure function is most relevant to severity scoring, as it encompasses the assessment of AI risks including:

Likelihood and severity of potential harm
Breadth of impact (number of affected individuals or systems)
Whether harms are reversible or irreversible
Whether the AI system operates in a safety-critical context

Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.

The AI Incident Severity Framework (AISF)

Score Components

The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.

"""
AI Incident Severity Framework (AISF) scoring implementation.
 
Produces a composite severity score for AI security incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
 
class SeverityLevel(Enum):
    INFORMATIONAL = "informational"  # 0.0 - 1.9
    LOW = "low"                      # 2.0 - 3.9
    MEDIUM = "medium"                # 4.0 - 5.9
    HIGH = "high"                    # 6.0 - 7.9
    CRITICAL = "critical"            # 8.0 - 10.0
 
@dataclass
class AISFDimension:
    """A single dimension of the AISF severity score."""
    name: str
    score: float  # 0.0 to 10.0
    weight: float  # relative weight in composite
    rationale: str  # explanation for the score
 
@dataclass
class AISFScore:
    """Complete AISF severity assessment."""
    dimensions: list[AISFDimension]
    composite_score: float
    severity_level: SeverityLevel
    summary: str
 
def classify_severity(score: float) -> SeverityLevel:
    if score < 2.0:
        return SeverityLevel.INFORMATIONAL
    if score < 4.0:
        return SeverityLevel.LOW
    if score < 6.0:
        return SeverityLevel.MEDIUM
    if score < 8.0:
        return SeverityLevel.HIGH
    return SeverityLevel.CRITICAL
 
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
    """
    Compute the composite AISF score from individual dimension scores.
 
    Uses weighted average with normalization.
    """
    total_weight = sum(d.weight for d in dimensions)
    if total_weight == 0:
        raise ValueError("Total weight must be positive")
 
    composite = sum(d.score * d.weight for d in dimensions) / total_weight
    composite = round(min(max(composite, 0.0), 10.0), 1)
 
    return AISFScore(
        dimensions=dimensions,
        composite_score=composite,
        severity_level=classify_severity(composite),
        summary=_generate_summary(dimensions, composite),
    )
 
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
    top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
    drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
    level = classify_severity(composite)
    return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."

Dimension 1: Attack Sophistication (Weight: 0.10)

Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.

Score	Description	Example
1-2	Trivial, no specialized knowledge	Copy-pasting a known jailbreak prompt
3-4	Basic, uses publicly available tools	Using an off-the-shelf adversarial attack library
5-6	Moderate, requires domain expertise	Crafting targeted prompt injections for a specific application
7-8	Advanced, requires ML research skills	Developing novel adversarial perturbations or backdoor triggers
9-10	Expert, nation-state level	Training-time attacks requiring supply chain compromise

Dimension 2: Data Sensitivity Exposure (Weight: 0.20)

Measures the sensitivity of data that was exposed, extracted, or compromised. This is often the highest-weighted dimension because data exposure drives regulatory, legal, and reputational impact.

Score	Description	Example
1-2	Public or non-sensitive data	Model reveals information already in public training data
3-4	Internal but non-regulated data	Model leaks internal project names or non-sensitive configs
5-6	Regulated personal data (PII)	Model extraction of user data from training set
7-8	Highly sensitive personal or financial data	Training data containing SSNs, medical records, financial data
9-10	National security or safety-critical data	Classified information exposure, safety system compromise

Dimension 3: Model Integrity Impact (Weight: 0.20)

Measures the degree to which the model's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.

Score	Description	Example
1-2	No model change, transient output issue	Single anomalous response, no persistent effect
3-4	Temporary behavior modification	Jailbreak that works for one session only
5-6	Persistent but reversible modification	Compromised system prompt that can be restored
7-8	Weight-level modification detected	Unauthorized fine-tuning that altered model weights
9-10	Irreversible or widespread corruption	Backdoor in base model weights, training data poisoning affecting foundation model

Dimension 4: Blast Radius (Weight: 0.15)

Measures how many users, systems, or downstream applications are affected.

Score	Description	Example
1-2	Single user or session	One user found a jailbreak
3-4	Small group of users	Bug affecting a specific tenant
5-6	Significant user population	Vulnerability in a feature used by thousands
7-8	Organization-wide impact	Compromised model serving all internal applications
9-10	Cross-organization or public impact	Poisoned foundation model distributed to multiple orgs

Dimension 5: Reversibility (Weight: 0.15)

Measures how easily the incident's effects can be reversed.

Score	Description	Example
1-2	Immediately reversible	Restart the service, clear the cache
3-4	Reversible with moderate effort	Redeploy from known-good checkpoint
5-6	Reversible with significant effort	Retrain model component, notify affected users
7-8	Partially irreversible	Extracted data cannot be "unextracted"
9-10	Fully irreversible	Model weights leaked publicly, training data breach

Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)

Measures the regulatory, legal, and compliance implications of the incident.

Score	Description	Example
1-2	No regulatory implications	Internal tool, no regulated data
3-4	Minor compliance considerations	Logging gap that should be documented
5-6	Reportable under existing frameworks	GDPR breach notification may be required
7-8	Significant regulatory action likely	EU AI Act high-risk system violation
9-10	Major enforcement action expected	Class-action potential, regulatory investigation

Dimension 7: Reputational Impact (Weight: 0.10)

Measures potential reputational harm if the incident becomes public.

Score	Description	Example
1-2	No external visibility	Internal incident, no user impact
3-4	Minor, contained to affected users	Bug that a few users noticed
5-6	Moderate, industry attention likely	Security researchers may publish findings
7-8	Significant media coverage expected	Mainstream tech press coverage
9-10	Major public controversy	Front-page news, congressional scrutiny

Scoring in Practice

Example: Prompt Injection Data Exfiltration

def score_prompt_injection_exfiltration() -> AISFScore:
    """
    Example: Score a prompt injection incident where an attacker
    used indirect prompt injection via a RAG document to extract
    other users' conversation data from a customer support chatbot.
    """
    dimensions = [
        AISFDimension(
            name="Attack Sophistication",
            score=5.0,
            weight=0.10,
            rationale="Targeted indirect prompt injection requiring knowledge "
                      "of the RAG pipeline and output format",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=7.0,
            weight=0.20,
            rationale="Customer support conversations contain PII including "
                      "names, email addresses, and account details",
        ),
        AISFDimension(
            name="Model Integrity",
            score=3.0,
            weight=0.20,
            rationale="Model itself was not modified; attack exploited the "
                      "application architecture",
        ),
        AISFDimension(
            name="Blast Radius",
            score=6.0,
            weight=0.15,
            rationale="Approximately 2,000 customer conversations were "
                      "potentially accessible through the vulnerability",
        ),
        AISFDimension(
            name="Reversibility",
            score=8.0,
            weight=0.15,
            rationale="Extracted data cannot be recalled; affected customers "
                      "must be notified",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=7.0,
            weight=0.10,
            rationale="PII exposure triggers GDPR Article 33 notification "
                      "requirement (72-hour window)",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=6.0,
            weight=0.10,
            rationale="Customer trust impact; potential media coverage of "
                      "AI chatbot leaking customer data",
        ),
    ]
    return compute_aisf_score(dimensions)
 
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"Summary: {result.summary}")

Example: Model Backdoor Discovery

def score_model_backdoor_discovery() -> AISFScore:
    """
    Example: Score the discovery of a backdoor in a third-party
    model that was deployed in a financial fraud detection system.
    The backdoor causes the model to classify certain fraudulent
    transactions as legitimate when a specific pattern is present.
    """
    dimensions = [
        AISFDimension(
            name="Attack Sophistication",
            score=9.0,
            weight=0.10,
            rationale="Training-time backdoor attack requiring access to the "
                      "model training pipeline or supply chain",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=8.0,
            weight=0.20,
            rationale="Financial transaction data with PII and account details; "
                      "fraudulent transactions enabled by the backdoor",
        ),
        AISFDimension(
            name="Model Integrity",
            score=9.0,
            weight=0.20,
            rationale="Model weights contain a persistent backdoor; the model "
                      "cannot be trusted for any predictions",
        ),
        AISFDimension(
            name="Blast Radius",
            score=8.0,
            weight=0.15,
            rationale="All transactions processed by this model since "
                      "deployment are potentially affected",
        ),
        AISFDimension(
            name="Reversibility",
            score=6.0,
            weight=0.15,
            rationale="Model can be replaced, but all historical predictions "
                      "must be re-evaluated; financial losses may not be recoverable",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=8.0,
            weight=0.10,
            rationale="Financial services regulation requires reporting; "
                      "potential regulatory action for inadequate model validation",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=7.0,
            weight=0.10,
            rationale="Trust in AI-based fraud detection undermined; "
                      "customers affected by enabled fraud",
        ),
    ]
    return compute_aisf_score(dimensions)

Integrating Severity Scoring into IR Workflows

Automated Initial Scoring

For organizations processing high volumes of AI-related alerts, automated initial scoring can triage incidents before human review. The automated scorer uses observable metrics to produce a preliminary score.

def automated_initial_score(alert: dict) -> AISFScore:
    """
    Produce an automated preliminary severity score from alert data.
 
    This is intended for initial triage only. Human review is required
    for any incident scoring MEDIUM or above.
 
    Expected alert fields:
    - alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
    - users_affected: int
    - data_classification: str ("public", "internal", "confidential", "restricted")
    - model_modified: bool
    - reversible: bool
    - external_visibility: bool
    """
    # Map data classification to sensitivity score
    sensitivity_map = {
        "public": 1.0,
        "internal": 3.0,
        "confidential": 6.0,
        "restricted": 9.0,
    }
 
    # Map user count to blast radius score
    users = alert.get("users_affected", 0)
    if users <= 1:
        blast_score = 2.0
    elif users <= 100:
        blast_score = 4.0
    elif users <= 10000:
        blast_score = 6.0
    elif users <= 1000000:
        blast_score = 8.0
    else:
        blast_score = 10.0
 
    dimensions = [
        AISFDimension("Attack Sophistication", 5.0, 0.10, "Default pending investigation"),
        AISFDimension(
            "Data Sensitivity",
            sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
            0.20,
            f"Data classification: {alert.get('data_classification', 'unknown')}",
        ),
        AISFDimension(
            "Model Integrity",
            8.0 if alert.get("model_modified") else 3.0,
            0.20,
            "Model modification detected" if alert.get("model_modified") else "No model modification",
        ),
        AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
        AISFDimension(
            "Reversibility",
            3.0 if alert.get("reversible") else 7.0,
            0.15,
            "Reversible" if alert.get("reversible") else "Not easily reversible",
        ),
        AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
        AISFDimension(
            "Reputational Impact",
            7.0 if alert.get("external_visibility") else 3.0,
            0.10,
            "Externally visible" if alert.get("external_visibility") else "Internal only",
        ),
    ]
 
    return compute_aisf_score(dimensions)

Escalation Matrix

Map AISF severity levels to organizational response actions:

Severity	Response Time	Notification	Staffing	Executive Briefing
Informational	Next business day	Ticket created	On-call review	No
Low	8 hours	Team lead notified	Assigned analyst	No
Medium	4 hours	IR team activated	Dedicated team	Daily summary
High	1 hour	CISO notified	Full IR team	Immediate
Critical	15 minutes	C-suite and board	All hands, external support	Continuous

Score Evolution Tracking

Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's understanding of the incident evolved.

from datetime import datetime
 
@dataclass
class ScoreRevision:
    """A point-in-time severity score revision."""
    timestamp: str
    score: AISFScore
    analyst: str
    reason: str
 
def track_severity_evolution(
    revisions: list[ScoreRevision],
) -> dict:
    """
    Analyze how the severity score evolved during the investigation.
 
    Significant score changes indicate that initial assumptions
    were revised as new evidence emerged.
    """
    if not revisions:
        return {"error": "No revisions to analyze"}
 
    scores = [r.score.composite_score for r in revisions]
    return {
        "initial_score": scores[0],
        "final_score": scores[-1],
        "peak_score": max(scores),
        "score_delta": scores[-1] - scores[0],
        "revision_count": len(revisions),
        "escalations": sum(
            1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
        ),
        "deescalations": sum(
            1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
        ),
    }

Mapping to Existing Frameworks

MITRE ATLAS Integration

ATLAS techniques can inform the Attack Sophistication dimension. Map observed techniques to their complexity:

Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
ML Attack Staging (AML.TA0001): Moderate sophistication (4-7)
ML Model Access (AML.TA0003): Varies widely depending on method (3-9)

OWASP LLM Top 10 Integration

The OWASP LLM Top 10 (2025 edition) provides risk ratings that can calibrate AISF scoring. For example, LLM01 (Prompt Injection) carries an inherently high impact rating in OWASP, which should be reflected in the Model Integrity and Data Sensitivity dimensions of AISF.

EU AI Act Risk Categories

For organizations operating under the EU AI Act, severity scoring should also map to the Act's risk categories (unacceptable, high-risk, limited risk, minimal risk). Incidents involving high-risk AI systems automatically receive a minimum severity floor of MEDIUM in the AISF framework.

References

NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
FIRST. (2024). Common Vulnerability Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
MITRE ATLAS. (2024). Adversarial Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

Edit this page on GitHub

AI Incident Severity Scoring

beginner14 min readUpdated 2026-03-20

Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.

ai-forensics-ir severity-scoring incident-response risk-assessment

Overview

Limitations of Traditional Scoring for AI Incidents

CVSS Gaps

CVSS Dimension	AI Gap
Attack Vector	Does not distinguish between prompt-level, training-level, and infrastructure-level attacks
Attack Complexity	Cannot capture the stochastic nature of AI exploits that succeed probabilistically
Privileges Required	No concept of "model access levels" (API user vs. fine-tuning access vs. training pipeline access)
User Interaction	Does not model indirect prompt injection where the "user" is unknowing
Confidentiality Impact	Does not capture training data extraction or model weight theft
Integrity Impact	Does not distinguish between transient output manipulation and persistent model corruption
Availability Impact	Does not capture model degradation (partial availability loss)

NIST AI RMF Context

Likelihood and severity of potential harm
Breadth of impact (number of affected individuals or systems)
Whether harms are reversible or irreversible
Whether the AI system operates in a safety-critical context

Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.

The AI Incident Severity Framework (AISF)

Score Components

The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.

"""
AI Incident Severity Framework (AISF) scoring implementation.
 
Produces a composite severity score for AI security incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
 
class SeverityLevel(Enum):
    INFORMATIONAL = "informational"  # 0.0 - 1.9
    LOW = "low"                      # 2.0 - 3.9
    MEDIUM = "medium"                # 4.0 - 5.9
    HIGH = "high"                    # 6.0 - 7.9
    CRITICAL = "critical"            # 8.0 - 10.0
 
@dataclass
class AISFDimension:
    """A single dimension of the AISF severity score."""
    name: str
    score: float  # 0.0 to 10.0
    weight: float  # relative weight in composite
    rationale: str  # explanation for the score
 
@dataclass
class AISFScore:
    """Complete AISF severity assessment."""
    dimensions: list[AISFDimension]
    composite_score: float
    severity_level: SeverityLevel
    summary: str
 
def classify_severity(score: float) -> SeverityLevel:
    if score < 2.0:
        return SeverityLevel.INFORMATIONAL
    if score < 4.0:
        return SeverityLevel.LOW
    if score < 6.0:
        return SeverityLevel.MEDIUM
    if score < 8.0:
        return SeverityLevel.HIGH
    return SeverityLevel.CRITICAL
 
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
    """
    Compute the composite AISF score from individual dimension scores.
 
    Uses weighted average with normalization.
    """
    total_weight = sum(d.weight for d in dimensions)
    if total_weight == 0:
        raise ValueError("Total weight must be positive")
 
    composite = sum(d.score * d.weight for d in dimensions) / total_weight
    composite = round(min(max(composite, 0.0), 10.0), 1)
 
    return AISFScore(
        dimensions=dimensions,
        composite_score=composite,
        severity_level=classify_severity(composite),
        summary=_generate_summary(dimensions, composite),
    )
 
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
    top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
    drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
    level = classify_severity(composite)
    return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."

Dimension 1: Attack Sophistication (Weight: 0.10)

Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.

Score	Description	Example
1-2	Trivial, no specialized knowledge	Copy-pasting a known jailbreak prompt
3-4	Basic, uses publicly available tools	Using an off-the-shelf adversarial attack library
5-6	Moderate, requires domain expertise	Crafting targeted prompt injections for a specific application
7-8	Advanced, requires ML research skills	Developing novel adversarial perturbations or backdoor triggers
9-10	Expert, nation-state level	Training-time attacks requiring supply chain compromise

Dimension 2: Data Sensitivity Exposure (Weight: 0.20)

Measures the sensitivity of data that was exposed, extracted, or compromised. This is often the highest-weighted dimension because data exposure drives regulatory, legal, and reputational impact.

Score	Description	Example
1-2	Public or non-sensitive data	Model reveals information already in public training data
3-4	Internal but non-regulated data	Model leaks internal project names or non-sensitive configs
5-6	Regulated personal data (PII)	Model extraction of user data from training set
7-8	Highly sensitive personal or financial data	Training data containing SSNs, medical records, financial data
9-10	National security or safety-critical data	Classified information exposure, safety system compromise

Dimension 3: Model Integrity Impact (Weight: 0.20)

Measures the degree to which the model's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.

Score	Description	Example
1-2	No model change, transient output issue	Single anomalous response, no persistent effect
3-4	Temporary behavior modification	Jailbreak that works for one session only
5-6	Persistent but reversible modification	Compromised system prompt that can be restored
7-8	Weight-level modification detected	Unauthorized fine-tuning that altered model weights
9-10	Irreversible or widespread corruption	Backdoor in base model weights, training data poisoning affecting foundation model

Dimension 4: Blast Radius (Weight: 0.15)

Measures how many users, systems, or downstream applications are affected.

Score	Description	Example
1-2	Single user or session	One user found a jailbreak
3-4	Small group of users	Bug affecting a specific tenant
5-6	Significant user population	Vulnerability in a feature used by thousands
7-8	Organization-wide impact	Compromised model serving all internal applications
9-10	Cross-organization or public impact	Poisoned foundation model distributed to multiple orgs

Dimension 5: Reversibility (Weight: 0.15)

Measures how easily the incident's effects can be reversed.

Score	Description	Example
1-2	Immediately reversible	Restart the service, clear the cache
3-4	Reversible with moderate effort	Redeploy from known-good checkpoint
5-6	Reversible with significant effort	Retrain model component, notify affected users
7-8	Partially irreversible	Extracted data cannot be "unextracted"
9-10	Fully irreversible	Model weights leaked publicly, training data breach

Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)

Measures the regulatory, legal, and compliance implications of the incident.

Score	Description	Example
1-2	No regulatory implications	Internal tool, no regulated data
3-4	Minor compliance considerations	Logging gap that should be documented
5-6	Reportable under existing frameworks	GDPR breach notification may be required
7-8	Significant regulatory action likely	EU AI Act high-risk system violation
9-10	Major enforcement action expected	Class-action potential, regulatory investigation

Dimension 7: Reputational Impact (Weight: 0.10)

Measures potential reputational harm if the incident becomes public.

Score	Description	Example
1-2	No external visibility	Internal incident, no user impact
3-4	Minor, contained to affected users	Bug that a few users noticed
5-6	Moderate, industry attention likely	Security researchers may publish findings
7-8	Significant media coverage expected	Mainstream tech press coverage
9-10	Major public controversy	Front-page news, congressional scrutiny

Scoring in Practice

Example: Prompt Injection Data Exfiltration

def score_prompt_injection_exfiltration() -> AISFScore:
    """
    Example: Score a prompt injection incident where an attacker
    used indirect prompt injection via a RAG document to extract
    other users' conversation data from a customer support chatbot.
    """
    dimensions = [
        AISFDimension(
            name="Attack Sophistication",
            score=5.0,
            weight=0.10,
            rationale="Targeted indirect prompt injection requiring knowledge "
                      "of the RAG pipeline and output format",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=7.0,
            weight=0.20,
            rationale="Customer support conversations contain PII including "
                      "names, email addresses, and account details",
        ),
        AISFDimension(
            name="Model Integrity",
            score=3.0,
            weight=0.20,
            rationale="Model itself was not modified; attack exploited the "
                      "application architecture",
        ),
        AISFDimension(
            name="Blast Radius",
            score=6.0,
            weight=0.15,
            rationale="Approximately 2,000 customer conversations were "
                      "potentially accessible through the vulnerability",
        ),
        AISFDimension(
            name="Reversibility",
            score=8.0,
            weight=0.15,
            rationale="Extracted data cannot be recalled; affected customers "
                      "must be notified",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=7.0,
            weight=0.10,
            rationale="PII exposure triggers GDPR Article 33 notification "
                      "requirement (72-hour window)",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=6.0,
            weight=0.10,
            rationale="Customer trust impact; potential media coverage of "
                      "AI chatbot leaking customer data",
        ),
    ]
    return compute_aisf_score(dimensions)
 
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"Summary: {result.summary}")

Example: Model Backdoor Discovery

def score_model_backdoor_discovery() -> AISFScore:
    """
    Example: Score the discovery of a backdoor in a third-party
    model that was deployed in a financial fraud detection system.
    The backdoor causes the model to classify certain fraudulent
    transactions as legitimate when a specific pattern is present.
    """
    dimensions = [
        AISFDimension(
            name="Attack Sophistication",
            score=9.0,
            weight=0.10,
            rationale="Training-time backdoor attack requiring access to the "
                      "model training pipeline or supply chain",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=8.0,
            weight=0.20,
            rationale="Financial transaction data with PII and account details; "
                      "fraudulent transactions enabled by the backdoor",
        ),
        AISFDimension(
            name="Model Integrity",
            score=9.0,
            weight=0.20,
            rationale="Model weights contain a persistent backdoor; the model "
                      "cannot be trusted for any predictions",
        ),
        AISFDimension(
            name="Blast Radius",
            score=8.0,
            weight=0.15,
            rationale="All transactions processed by this model since "
                      "deployment are potentially affected",
        ),
        AISFDimension(
            name="Reversibility",
            score=6.0,
            weight=0.15,
            rationale="Model can be replaced, but all historical predictions "
                      "must be re-evaluated; financial losses may not be recoverable",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=8.0,
            weight=0.10,
            rationale="Financial services regulation requires reporting; "
                      "potential regulatory action for inadequate model validation",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=7.0,
            weight=0.10,
            rationale="Trust in AI-based fraud detection undermined; "
                      "customers affected by enabled fraud",
        ),
    ]
    return compute_aisf_score(dimensions)

Integrating Severity Scoring into IR Workflows

Automated Initial Scoring

def automated_initial_score(alert: dict) -> AISFScore:
    """
    Produce an automated preliminary severity score from alert data.
 
    This is intended for initial triage only. Human review is required
    for any incident scoring MEDIUM or above.
 
    Expected alert fields:
    - alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
    - users_affected: int
    - data_classification: str ("public", "internal", "confidential", "restricted")
    - model_modified: bool
    - reversible: bool
    - external_visibility: bool
    """
    # Map data classification to sensitivity score
    sensitivity_map = {
        "public": 1.0,
        "internal": 3.0,
        "confidential": 6.0,
        "restricted": 9.0,
    }
 
    # Map user count to blast radius score
    users = alert.get("users_affected", 0)
    if users <= 1:
        blast_score = 2.0
    elif users <= 100:
        blast_score = 4.0
    elif users <= 10000:
        blast_score = 6.0
    elif users <= 1000000:
        blast_score = 8.0
    else:
        blast_score = 10.0
 
    dimensions = [
        AISFDimension("Attack Sophistication", 5.0, 0.10, "Default pending investigation"),
        AISFDimension(
            "Data Sensitivity",
            sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
            0.20,
            f"Data classification: {alert.get('data_classification', 'unknown')}",
        ),
        AISFDimension(
            "Model Integrity",
            8.0 if alert.get("model_modified") else 3.0,
            0.20,
            "Model modification detected" if alert.get("model_modified") else "No model modification",
        ),
        AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
        AISFDimension(
            "Reversibility",
            3.0 if alert.get("reversible") else 7.0,
            0.15,
            "Reversible" if alert.get("reversible") else "Not easily reversible",
        ),
        AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
        AISFDimension(
            "Reputational Impact",
            7.0 if alert.get("external_visibility") else 3.0,
            0.10,
            "Externally visible" if alert.get("external_visibility") else "Internal only",
        ),
    ]
 
    return compute_aisf_score(dimensions)

Escalation Matrix

Map AISF severity levels to organizational response actions:

Severity	Response Time	Notification	Staffing	Executive Briefing
Informational	Next business day	Ticket created	On-call review	No
Low	8 hours	Team lead notified	Assigned analyst	No
Medium	4 hours	IR team activated	Dedicated team	Daily summary
High	1 hour	CISO notified	Full IR team	Immediate
Critical	15 minutes	C-suite and board	All hands, external support	Continuous

Score Evolution Tracking

Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's understanding of the incident evolved.

from datetime import datetime
 
@dataclass
class ScoreRevision:
    """A point-in-time severity score revision."""
    timestamp: str
    score: AISFScore
    analyst: str
    reason: str
 
def track_severity_evolution(
    revisions: list[ScoreRevision],
) -> dict:
    """
    Analyze how the severity score evolved during the investigation.
 
    Significant score changes indicate that initial assumptions
    were revised as new evidence emerged.
    """
    if not revisions:
        return {"error": "No revisions to analyze"}
 
    scores = [r.score.composite_score for r in revisions]
    return {
        "initial_score": scores[0],
        "final_score": scores[-1],
        "peak_score": max(scores),
        "score_delta": scores[-1] - scores[0],
        "revision_count": len(revisions),
        "escalations": sum(
            1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
        ),
        "deescalations": sum(
            1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
        ),
    }

Mapping to Existing Frameworks

MITRE ATLAS Integration

ATLAS techniques can inform the Attack Sophistication dimension. Map observed techniques to their complexity:

Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
ML Attack Staging (AML.TA0001): Moderate sophistication (4-7)
ML Model Access (AML.TA0003): Varies widely depending on method (3-9)

NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
FIRST. (2024). Common Vulnerability Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
MITRE ATLAS. (2024). Adversarial Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
OWASP. (2025). OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

Edit this page on GitHub

AI Incident Severity Scoring

Related articles

AI Incident Severity Scoring

Related articles