AI Incident Severity Scoring

Beginner14 min readUpdated 2026-03-20

Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.

ai-forensics-ir severity-scoring incident-response risk-assessment

概覽

When an AI 安全 incident occurs, one of the first decisions the response team must make is how severe it is. This determination drives everything downstream: how many responders are mobilized, which stakeholders are notified, what communications are sent, and how quickly the organization acts. Getting severity scoring wrong in either direction is costly -- overscoring wastes resources and creates alert fatigue, while underscoring can allow a critical incident to escalate unchecked.

Traditional severity scoring frameworks like CVSS (Common 漏洞 Scoring System) were designed for software 漏洞 in deterministic systems. AI incidents introduce dimensions that these frameworks do not capture: the probabilistic nature of model behavior, the difficulty of determining blast radius when a model serves diverse use cases, the potential for 訓練資料 contamination to have delayed and widespread effects, and the reputational harm unique to AI systems that produce harmful or embarrassing outputs.

This article presents an AI-adapted severity scoring framework that builds on established standards -- NIST AI RMF, MITRE ATLAS, OWASP LLM Top 10, and CVSS -- while adding dimensions specific to AI 安全 incidents. The goal is to provide incident response teams with a repeatable, defensible methodology for severity classification.

Limitations of Traditional Scoring for AI Incidents

CVSS Gaps

The Common 漏洞 Scoring System (CVSS v4.0) evaluates 漏洞 along dimensions including attack vector, attack complexity, privileges required, user interaction, and impact on confidentiality, integrity, and availability. While these dimensions are relevant to AI systems, CVSS misses several AI-specific factors:

CVSS Dimension	AI Gap
攻擊 Vector	Does not distinguish between prompt-level, 訓練-level, and infrastructure-level attacks
攻擊 Complexity	Cannot capture the stochastic nature of AI exploits that succeed probabilistically
Privileges Required	No concept of "model access levels" (API user vs. 微調 access vs. 訓練 pipeline access)
User Interaction	Does not model indirect 提示詞注入 where the "user" is unknowing
Confidentiality Impact	Does not capture 訓練資料 extraction or model weight theft
Integrity Impact	Does not distinguish between transient 輸出 manipulation and persistent model corruption
Availability Impact	Does not capture model degradation (partial availability loss)

NIST AI RMF Context

The NIST AI Risk Management Framework (AI RMF 1.0, published January 2023) provides a broader risk management structure through its four core functions: Govern, Map, Measure, and Manage. The Measure function is most relevant to severity scoring, as it encompasses the 評估 of AI risks including:

Likelihood and severity of potential harm
Breadth of impact (number of affected individuals or systems)
Whether harms are reversible or irreversible
Whether the AI system operates in a 安全-critical context

Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.

The AI Incident Severity Framework (AISF)

Score Components

The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.

"""
AI Incident Severity Framework (AISF) scoring 實作.
 
Produces a composite severity score for AI 安全 incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
 
 
class SeverityLevel(Enum):
    INFORMATIONAL = "informational"  # 0.0 - 1.9
    LOW = "low"                      # 2.0 - 3.9
    MEDIUM = "medium"                # 4.0 - 5.9
    HIGH = "high"                    # 6.0 - 7.9
    CRITICAL = "critical"            # 8.0 - 10.0
 
 
@dataclass
class AISFDimension:
    """A single dimension of the AISF severity score."""
    name: str
    score: float  # 0.0 to 10.0
    weight: float  # relative weight in composite
    rationale: str  # explanation for the score
 
 
@dataclass
class AISFScore:
    """Complete AISF severity 評估."""
    dimensions: list[AISFDimension]
    composite_score: float
    severity_level: SeverityLevel
    summary: str
 
 
def classify_severity(score: float) -> SeverityLevel:
    if score < 2.0:
        return SeverityLevel.INFORMATIONAL
    if score < 4.0:
        return SeverityLevel.LOW
    if score < 6.0:
        return SeverityLevel.MEDIUM
    if score < 8.0:
        return SeverityLevel.HIGH
    return SeverityLevel.CRITICAL
 
 
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
    """
    Compute the composite AISF score from individual dimension scores.
 
    Uses weighted average with normalization.
    """
    total_weight = sum(d.weight for d in dimensions)
    if total_weight == 0:
        raise ValueError("Total weight must be positive")
 
    composite = sum(d.score * d.weight for d in dimensions) / total_weight
    composite = round(min(max(composite, 0.0), 10.0), 1)
 
    return AISFScore(
        dimensions=dimensions,
        composite_score=composite,
        severity_level=classify_severity(composite),
        summary=_generate_summary(dimensions, composite),
    )
 
 
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
    top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
    drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
    level = classify_severity(composite)
    return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."

Dimension 1: 攻擊 Sophistication (Weight: 0.10)

Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.

Score	Description	範例
1-2	Trivial, no specialized knowledge	Copy-pasting a known 越獄 prompt
3-4	Basic, uses publicly available tools	Using an off-the-shelf 對抗性 attack library
5-6	Moderate, requires domain expertise	Crafting targeted prompt injections for a specific application
7-8	Advanced, requires ML research skills	Developing novel 對抗性 perturbations or 後門 triggers
9-10	Expert, nation-state level	Training-time attacks requiring 供應鏈 compromise

Dimension 2: Data Sensitivity Exposure (Weight: 0.20)

Measures the sensitivity of data that was exposed, extracted, or compromised. 這是 often the highest-weighted dimension 因為 data exposure drives regulatory, legal, and reputational impact.

Score	Description	範例
1-2	Public or non-sensitive data	Model reveals information already in public 訓練資料
3-4	Internal but non-regulated data	Model leaks internal project names or non-sensitive configs
5-6	Regulated personal data (PII)	Model extraction of user data from 訓練 set
7-8	Highly sensitive personal or financial data	訓練資料 containing SSNs, medical records, financial data
9-10	National 安全 or 安全-critical data	Classified information exposure, 安全 system compromise

Dimension 3: Model Integrity Impact (Weight: 0.20)

Measures the degree to which 模型's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.

Score	Description	範例
1-2	No model change, transient 輸出 issue	Single anomalous response, no persistent effect
3-4	Temporary behavior modification	越獄 that works for one session only
5-6	Persistent but reversible modification	Compromised 系統提示詞 that can be restored
7-8	Weight-level modification detected	Unauthorized 微調 that altered model weights
9-10	Irreversible or widespread corruption	後門 in base model weights, 訓練資料投毒 affecting foundation model

Dimension 4: Blast Radius (Weight: 0.15)

Measures how many users, systems, or downstream applications are affected.

Score	Description	範例
1-2	Single user or session	One user found a 越獄
3-4	Small group of users	Bug affecting a specific tenant
5-6	Significant user population	漏洞 in a feature used by thousands
7-8	Organization-wide impact	Compromised model serving all internal applications
9-10	Cross-organization or public impact	Poisoned foundation model distributed to multiple orgs

Dimension 5: Reversibility (Weight: 0.15)

Measures how easily the incident's effects can be reversed.

Score	Description	範例
1-2	Immediately reversible	Restart the service, clear the cache
3-4	Reversible with moderate effort	Redeploy from known-good checkpoint
5-6	Reversible with significant effort	Retrain model component, notify affected users
7-8	Partially irreversible	Extracted data cannot be "unextracted"
9-10	Fully irreversible	Model weights leaked publicly, 訓練資料 breach

Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)

Measures the regulatory, legal, and compliance implications of the incident.

Score	Description	範例
1-2	No regulatory implications	Internal tool, no regulated data
3-4	Minor compliance considerations	Logging gap that should be documented
5-6	Reportable under existing frameworks	GDPR breach notification may be required
7-8	Significant regulatory action likely	EU AI Act high-risk system violation
9-10	Major enforcement action expected	Class-action potential, regulatory investigation

Dimension 7: Reputational Impact (Weight: 0.10)

Measures potential reputational harm if the incident becomes public.

Score	Description	範例
1-2	No external visibility	Internal incident, no user impact
3-4	Minor, contained to affected users	Bug that a few users noticed
5-6	Moderate, industry 注意力 likely	安全 researchers may publish findings
7-8	Significant media coverage expected	Mainstream tech press coverage
9-10	Major public controversy	Front-page news, congressional scrutiny

Scoring in Practice

範例: 提示詞注入 Data Exfiltration

def score_prompt_injection_exfiltration() -> AISFScore:
    """
    範例: Score a 提示詞注入 incident where 攻擊者
    used indirect 提示詞注入 via a RAG document to extract
    other users' conversation data from a customer support chatbot.
    """
    dimensions = [
        AISFDimension(
            name="攻擊 Sophistication",
            score=5.0,
            weight=0.10,
            rationale="Targeted indirect 提示詞注入 requiring knowledge "
                      "of the RAG pipeline and 輸出 format",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=7.0,
            weight=0.20,
            rationale="Customer support conversations contain PII including "
                      "names, email addresses, and account details",
        ),
        AISFDimension(
            name="Model Integrity",
            score=3.0,
            weight=0.20,
            rationale="Model itself was not modified; attack exploited the "
                      "application architecture",
        ),
        AISFDimension(
            name="Blast Radius",
            score=6.0,
            weight=0.15,
            rationale="Approximately 2,000 customer conversations were "
                      "potentially accessible through the 漏洞",
        ),
        AISFDimension(
            name="Reversibility",
            score=8.0,
            weight=0.15,
            rationale="Extracted data cannot be recalled; affected customers "
                      "must be notified",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=7.0,
            weight=0.10,
            rationale="PII exposure triggers GDPR Article 33 notification "
                      "requirement (72-hour window)",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=6.0,
            weight=0.10,
            rationale="Customer trust impact; potential media coverage of "
                      "AI chatbot leaking customer data",
        ),
    ]
    return compute_aisf_score(dimensions)
 
 
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"總結: {result.summary}")

範例: Model 後門 Discovery

def score_model_backdoor_discovery() -> AISFScore:
    """
    範例: Score the discovery of a 後門 in a third-party
    model that was deployed in a financial fraud 偵測 system.
    The 後門 causes 模型 to classify certain fraudulent
    transactions as legitimate when a specific pattern is present.
    """
    dimensions = [
        AISFDimension(
            name="攻擊 Sophistication",
            score=9.0,
            weight=0.10,
            rationale="Training-time 後門 attack requiring access to the "
                      "model 訓練 pipeline or 供應鏈",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=8.0,
            weight=0.20,
            rationale="Financial transaction data with PII and account details; "
                      "fraudulent transactions enabled by the 後門",
        ),
        AISFDimension(
            name="Model Integrity",
            score=9.0,
            weight=0.20,
            rationale="Model weights contain a persistent 後門; 模型 "
                      "cannot be trusted for any predictions",
        ),
        AISFDimension(
            name="Blast Radius",
            score=8.0,
            weight=0.15,
            rationale="All transactions processed by this model since "
                      "deployment are potentially affected",
        ),
        AISFDimension(
            name="Reversibility",
            score=6.0,
            weight=0.15,
            rationale="Model can be replaced, but all historical predictions "
                      "must be re-evaluated; financial losses may not be recoverable",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=8.0,
            weight=0.10,
            rationale="Financial services regulation requires reporting; "
                      "potential regulatory action for inadequate model validation",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=7.0,
            weight=0.10,
            rationale="Trust in AI-based fraud 偵測 undermined; "
                      "customers affected by enabled fraud",
        ),
    ]
    return compute_aisf_score(dimensions)

Integrating Severity Scoring into IR Workflows

Automated Initial Scoring

For organizations processing high volumes of AI-related alerts, automated initial scoring can triage incidents before human review. The automated scorer uses observable metrics to produce a preliminary score.

def automated_initial_score(alert: dict) -> AISFScore:
    """
    Produce an automated preliminary severity score from alert data.
 
    這是 intended for initial triage only. Human review is required
    for any incident scoring MEDIUM or above.
 
    Expected alert fields:
    - alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
    - users_affected: int
    - data_classification: str ("public", "internal", "confidential", "restricted")
    - model_modified: bool
    - reversible: bool
    - external_visibility: bool
    """
    # Map data classification to sensitivity score
    sensitivity_map = {
        "public": 1.0,
        "internal": 3.0,
        "confidential": 6.0,
        "restricted": 9.0,
    }
 
    # Map user count to blast radius score
    users = alert.get("users_affected", 0)
    if users <= 1:
        blast_score = 2.0
    elif users <= 100:
        blast_score = 4.0
    elif users <= 10000:
        blast_score = 6.0
    elif users <= 1000000:
        blast_score = 8.0
    else:
        blast_score = 10.0
 
    dimensions = [
        AISFDimension("攻擊 Sophistication", 5.0, 0.10, "Default pending investigation"),
        AISFDimension(
            "Data Sensitivity",
            sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
            0.20,
            f"Data classification: {alert.get('data_classification', 'unknown')}",
        ),
        AISFDimension(
            "Model Integrity",
            8.0 if alert.get("model_modified") else 3.0,
            0.20,
            "Model modification detected" if alert.get("model_modified") else "No model modification",
        ),
        AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
        AISFDimension(
            "Reversibility",
            3.0 if alert.get("reversible") else 7.0,
            0.15,
            "Reversible" if alert.get("reversible") else "Not easily reversible",
        ),
        AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
        AISFDimension(
            "Reputational Impact",
            7.0 if alert.get("external_visibility") else 3.0,
            0.10,
            "Externally visible" if alert.get("external_visibility") else "Internal only",
        ),
    ]
 
    return compute_aisf_score(dimensions)

Escalation Matrix

Map AISF severity levels to organizational response actions:

Severity	Response Time	Notification	Staffing	Executive Briefing
Informational	Next business day	Ticket created	On-call review	No
Low	8 hours	Team lead notified	Assigned analyst	No
Medium	4 hours	IR team activated	Dedicated team	Daily summary
High	1 hour	CISO notified	Full IR team	Immediate
Critical	15 minutes	C-suite and board	All hands, external support	Continuous

Score Evolution Tracking

Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's 理解 of the incident evolved.

from datetime import datetime
 
 
@dataclass
class ScoreRevision:
    """A point-in-time severity score revision."""
    timestamp: str
    score: AISFScore
    analyst: str
    reason: str
 
 
def track_severity_evolution(
    revisions: list[ScoreRevision],
) -> dict:
    """
    Analyze how the severity score evolved during the investigation.
 
    Significant score changes indicate that initial assumptions
    were revised as new evidence emerged.
    """
    if not revisions:
        return {"error": "No revisions to analyze"}
 
    scores = [r.score.composite_score for r in revisions]
    return {
        "initial_score": scores[0],
        "final_score": scores[-1],
        "peak_score": max(scores),
        "score_delta": scores[-1] - scores[0],
        "revision_count": len(revisions),
        "escalations": sum(
            1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
        ),
        "deescalations": sum(
            1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
        ),
    }

Mapping to Existing Frameworks

MITRE ATLAS Integration

ATLAS techniques can inform the 攻擊 Sophistication dimension. Map observed techniques to their complexity:

Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
ML 攻擊 Staging (AML.TA0001): Moderate sophistication (4-7)
ML Model Access (AML.TA0003): Varies widely depending on method (3-9)

OWASP LLM Top 10 Integration

The OWASP LLM Top 10 (2025 edition) provides risk ratings that can calibrate AISF scoring. 例如, LLM01 (提示詞注入) carries an inherently high impact rating in OWASP, which should be reflected in the Model Integrity and Data Sensitivity dimensions of AISF.

EU AI Act Risk Categories

For organizations operating under the EU AI Act, severity scoring should also map to the Act's risk categories (unacceptable, high-risk, limited risk, minimal risk). Incidents involving high-risk AI systems automatically receive a minimum severity floor of MEDIUM in the AISF framework.

參考文獻

NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
FIRST. (2024). Common 漏洞 Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
MITRE ATLAS. (2024). 對抗性 Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
OWASP. (2025). OWASP Top 10 for 大型語言模型 Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

AI Incident Severity Scoring

Beginner14 min readUpdated 2026-03-20

Frameworks and methodologies for scoring the severity of AI security incidents, integrating NIST AI RMF, MITRE ATLAS, and traditional CVSS approaches.

ai-forensics-ir severity-scoring incident-response risk-assessment

概覽

Limitations of Traditional Scoring for AI Incidents

CVSS Gaps

CVSS Dimension	AI Gap
攻擊 Vector	Does not distinguish between prompt-level, 訓練-level, and infrastructure-level attacks
攻擊 Complexity	Cannot capture the stochastic nature of AI exploits that succeed probabilistically
Privileges Required	No concept of "model access levels" (API user vs. 微調 access vs. 訓練 pipeline access)
User Interaction	Does not model indirect 提示詞注入 where the "user" is unknowing
Confidentiality Impact	Does not capture 訓練資料 extraction or model weight theft
Integrity Impact	Does not distinguish between transient 輸出 manipulation and persistent model corruption
Availability Impact	Does not capture model degradation (partial availability loss)

NIST AI RMF Context

Likelihood and severity of potential harm
Breadth of impact (number of affected individuals or systems)
Whether harms are reversible or irreversible
Whether the AI system operates in a 安全-critical context

Our severity scoring framework operationalizes these NIST principles into a quantitative scoring methodology.

The AI Incident Severity Framework (AISF)

Score Components

The AISF produces a composite severity score from 0.0 to 10.0 by evaluating seven dimensions, each scored from 0.0 to 10.0 and combined with configurable weights.

"""
AI Incident Severity Framework (AISF) scoring 實作.
 
Produces a composite severity score for AI 安全 incidents
based on seven weighted dimensions.
"""
from dataclasses import dataclass
from enum import Enum
 
 
class SeverityLevel(Enum):
    INFORMATIONAL = "informational"  # 0.0 - 1.9
    LOW = "low"                      # 2.0 - 3.9
    MEDIUM = "medium"                # 4.0 - 5.9
    HIGH = "high"                    # 6.0 - 7.9
    CRITICAL = "critical"            # 8.0 - 10.0
 
 
@dataclass
class AISFDimension:
    """A single dimension of the AISF severity score."""
    name: str
    score: float  # 0.0 to 10.0
    weight: float  # relative weight in composite
    rationale: str  # explanation for the score
 
 
@dataclass
class AISFScore:
    """Complete AISF severity 評估."""
    dimensions: list[AISFDimension]
    composite_score: float
    severity_level: SeverityLevel
    summary: str
 
 
def classify_severity(score: float) -> SeverityLevel:
    if score < 2.0:
        return SeverityLevel.INFORMATIONAL
    if score < 4.0:
        return SeverityLevel.LOW
    if score < 6.0:
        return SeverityLevel.MEDIUM
    if score < 8.0:
        return SeverityLevel.HIGH
    return SeverityLevel.CRITICAL
 
 
def compute_aisf_score(dimensions: list[AISFDimension]) -> AISFScore:
    """
    Compute the composite AISF score from individual dimension scores.
 
    Uses weighted average with normalization.
    """
    total_weight = sum(d.weight for d in dimensions)
    if total_weight == 0:
        raise ValueError("Total weight must be positive")
 
    composite = sum(d.score * d.weight for d in dimensions) / total_weight
    composite = round(min(max(composite, 0.0), 10.0), 1)
 
    return AISFScore(
        dimensions=dimensions,
        composite_score=composite,
        severity_level=classify_severity(composite),
        summary=_generate_summary(dimensions, composite),
    )
 
 
def _generate_summary(dimensions: list[AISFDimension], composite: float) -> str:
    top_dims = sorted(dimensions, key=lambda d: d.score, reverse=True)[:3]
    drivers = ", ".join(f"{d.name} ({d.score})" for d in top_dims)
    level = classify_severity(composite)
    return f"Composite score {composite}/10.0 ({level.value}). Primary drivers: {drivers}."

Dimension 1: 攻擊 Sophistication (Weight: 0.10)

Measures the technical sophistication required to execute the attack. Higher sophistication suggests a more capable threat actor and potentially a more targeted attack.

Score	Description	範例
1-2	Trivial, no specialized knowledge	Copy-pasting a known 越獄 prompt
3-4	Basic, uses publicly available tools	Using an off-the-shelf 對抗性 attack library
5-6	Moderate, requires domain expertise	Crafting targeted prompt injections for a specific application
7-8	Advanced, requires ML research skills	Developing novel 對抗性 perturbations or 後門 triggers
9-10	Expert, nation-state level	Training-time attacks requiring 供應鏈 compromise

Dimension 2: Data Sensitivity Exposure (Weight: 0.20)

Measures the sensitivity of data that was exposed, extracted, or compromised. 這是 often the highest-weighted dimension 因為 data exposure drives regulatory, legal, and reputational impact.

Score	Description	範例
1-2	Public or non-sensitive data	Model reveals information already in public 訓練資料
3-4	Internal but non-regulated data	Model leaks internal project names or non-sensitive configs
5-6	Regulated personal data (PII)	Model extraction of user data from 訓練 set
7-8	Highly sensitive personal or financial data	訓練資料 containing SSNs, medical records, financial data
9-10	National 安全 or 安全-critical data	Classified information exposure, 安全 system compromise

Dimension 3: Model Integrity Impact (Weight: 0.20)

Measures the degree to which 模型's integrity has been compromised -- whether it can still be trusted to produce correct, safe outputs.

Score	Description	範例
1-2	No model change, transient 輸出 issue	Single anomalous response, no persistent effect
3-4	Temporary behavior modification	越獄 that works for one session only
5-6	Persistent but reversible modification	Compromised 系統提示詞 that can be restored
7-8	Weight-level modification detected	Unauthorized 微調 that altered model weights
9-10	Irreversible or widespread corruption	後門 in base model weights, 訓練資料投毒 affecting foundation model

Dimension 4: Blast Radius (Weight: 0.15)

Measures how many users, systems, or downstream applications are affected.

Score	Description	範例
1-2	Single user or session	One user found a 越獄
3-4	Small group of users	Bug affecting a specific tenant
5-6	Significant user population	漏洞 in a feature used by thousands
7-8	Organization-wide impact	Compromised model serving all internal applications
9-10	Cross-organization or public impact	Poisoned foundation model distributed to multiple orgs

Dimension 5: Reversibility (Weight: 0.15)

Measures how easily the incident's effects can be reversed.

Score	Description	範例
1-2	Immediately reversible	Restart the service, clear the cache
3-4	Reversible with moderate effort	Redeploy from known-good checkpoint
5-6	Reversible with significant effort	Retrain model component, notify affected users
7-8	Partially irreversible	Extracted data cannot be "unextracted"
9-10	Fully irreversible	Model weights leaked publicly, 訓練資料 breach

Dimension 6: Regulatory and Legal Exposure (Weight: 0.10)

Measures the regulatory, legal, and compliance implications of the incident.

Score	Description	範例
1-2	No regulatory implications	Internal tool, no regulated data
3-4	Minor compliance considerations	Logging gap that should be documented
5-6	Reportable under existing frameworks	GDPR breach notification may be required
7-8	Significant regulatory action likely	EU AI Act high-risk system violation
9-10	Major enforcement action expected	Class-action potential, regulatory investigation

Dimension 7: Reputational Impact (Weight: 0.10)

Measures potential reputational harm if the incident becomes public.

Score	Description	範例
1-2	No external visibility	Internal incident, no user impact
3-4	Minor, contained to affected users	Bug that a few users noticed
5-6	Moderate, industry 注意力 likely	安全 researchers may publish findings
7-8	Significant media coverage expected	Mainstream tech press coverage
9-10	Major public controversy	Front-page news, congressional scrutiny

Scoring in Practice

範例: 提示詞注入 Data Exfiltration

def score_prompt_injection_exfiltration() -> AISFScore:
    """
    範例: Score a 提示詞注入 incident where 攻擊者
    used indirect 提示詞注入 via a RAG document to extract
    other users' conversation data from a customer support chatbot.
    """
    dimensions = [
        AISFDimension(
            name="攻擊 Sophistication",
            score=5.0,
            weight=0.10,
            rationale="Targeted indirect 提示詞注入 requiring knowledge "
                      "of the RAG pipeline and 輸出 format",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=7.0,
            weight=0.20,
            rationale="Customer support conversations contain PII including "
                      "names, email addresses, and account details",
        ),
        AISFDimension(
            name="Model Integrity",
            score=3.0,
            weight=0.20,
            rationale="Model itself was not modified; attack exploited the "
                      "application architecture",
        ),
        AISFDimension(
            name="Blast Radius",
            score=6.0,
            weight=0.15,
            rationale="Approximately 2,000 customer conversations were "
                      "potentially accessible through the 漏洞",
        ),
        AISFDimension(
            name="Reversibility",
            score=8.0,
            weight=0.15,
            rationale="Extracted data cannot be recalled; affected customers "
                      "must be notified",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=7.0,
            weight=0.10,
            rationale="PII exposure triggers GDPR Article 33 notification "
                      "requirement (72-hour window)",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=6.0,
            weight=0.10,
            rationale="Customer trust impact; potential media coverage of "
                      "AI chatbot leaking customer data",
        ),
    ]
    return compute_aisf_score(dimensions)
 
 
# Run the example
result = score_prompt_injection_exfiltration()
# Expected composite score: approximately 5.8 (MEDIUM-HIGH)
print(f"Severity: {result.severity_level.value}")
print(f"Score: {result.composite_score}")
print(f"總結: {result.summary}")

範例: Model 後門 Discovery

def score_model_backdoor_discovery() -> AISFScore:
    """
    範例: Score the discovery of a 後門 in a third-party
    model that was deployed in a financial fraud 偵測 system.
    The 後門 causes 模型 to classify certain fraudulent
    transactions as legitimate when a specific pattern is present.
    """
    dimensions = [
        AISFDimension(
            name="攻擊 Sophistication",
            score=9.0,
            weight=0.10,
            rationale="Training-time 後門 attack requiring access to the "
                      "model 訓練 pipeline or 供應鏈",
        ),
        AISFDimension(
            name="Data Sensitivity",
            score=8.0,
            weight=0.20,
            rationale="Financial transaction data with PII and account details; "
                      "fraudulent transactions enabled by the 後門",
        ),
        AISFDimension(
            name="Model Integrity",
            score=9.0,
            weight=0.20,
            rationale="Model weights contain a persistent 後門; 模型 "
                      "cannot be trusted for any predictions",
        ),
        AISFDimension(
            name="Blast Radius",
            score=8.0,
            weight=0.15,
            rationale="All transactions processed by this model since "
                      "deployment are potentially affected",
        ),
        AISFDimension(
            name="Reversibility",
            score=6.0,
            weight=0.15,
            rationale="Model can be replaced, but all historical predictions "
                      "must be re-evaluated; financial losses may not be recoverable",
        ),
        AISFDimension(
            name="Regulatory Exposure",
            score=8.0,
            weight=0.10,
            rationale="Financial services regulation requires reporting; "
                      "potential regulatory action for inadequate model validation",
        ),
        AISFDimension(
            name="Reputational Impact",
            score=7.0,
            weight=0.10,
            rationale="Trust in AI-based fraud 偵測 undermined; "
                      "customers affected by enabled fraud",
        ),
    ]
    return compute_aisf_score(dimensions)

Integrating Severity Scoring into IR Workflows

Automated Initial Scoring

def automated_initial_score(alert: dict) -> AISFScore:
    """
    Produce an automated preliminary severity score from alert data.
 
    這是 intended for initial triage only. Human review is required
    for any incident scoring MEDIUM or above.
 
    Expected alert fields:
    - alert_type: str (e.g., "prompt_injection", "model_drift", "data_leak")
    - users_affected: int
    - data_classification: str ("public", "internal", "confidential", "restricted")
    - model_modified: bool
    - reversible: bool
    - external_visibility: bool
    """
    # Map data classification to sensitivity score
    sensitivity_map = {
        "public": 1.0,
        "internal": 3.0,
        "confidential": 6.0,
        "restricted": 9.0,
    }
 
    # Map user count to blast radius score
    users = alert.get("users_affected", 0)
    if users <= 1:
        blast_score = 2.0
    elif users <= 100:
        blast_score = 4.0
    elif users <= 10000:
        blast_score = 6.0
    elif users <= 1000000:
        blast_score = 8.0
    else:
        blast_score = 10.0
 
    dimensions = [
        AISFDimension("攻擊 Sophistication", 5.0, 0.10, "Default pending investigation"),
        AISFDimension(
            "Data Sensitivity",
            sensitivity_map.get(alert.get("data_classification", "internal"), 5.0),
            0.20,
            f"Data classification: {alert.get('data_classification', 'unknown')}",
        ),
        AISFDimension(
            "Model Integrity",
            8.0 if alert.get("model_modified") else 3.0,
            0.20,
            "Model modification detected" if alert.get("model_modified") else "No model modification",
        ),
        AISFDimension("Blast Radius", blast_score, 0.15, f"{users} users affected"),
        AISFDimension(
            "Reversibility",
            3.0 if alert.get("reversible") else 7.0,
            0.15,
            "Reversible" if alert.get("reversible") else "Not easily reversible",
        ),
        AISFDimension("Regulatory Exposure", 5.0, 0.10, "Default pending legal review"),
        AISFDimension(
            "Reputational Impact",
            7.0 if alert.get("external_visibility") else 3.0,
            0.10,
            "Externally visible" if alert.get("external_visibility") else "Internal only",
        ),
    ]
 
    return compute_aisf_score(dimensions)

Escalation Matrix

Map AISF severity levels to organizational response actions:

Severity	Response Time	Notification	Staffing	Executive Briefing
Informational	Next business day	Ticket created	On-call review	No
Low	8 hours	Team lead notified	Assigned analyst	No
Medium	4 hours	IR team activated	Dedicated team	Daily summary
High	1 hour	CISO notified	Full IR team	Immediate
Critical	15 minutes	C-suite and board	All hands, external support	Continuous

Score Evolution Tracking

Severity scores should be reassessed as the investigation progresses and new information emerges. Track score changes over time to document how the team's 理解 of the incident evolved.

from datetime import datetime
 
 
@dataclass
class ScoreRevision:
    """A point-in-time severity score revision."""
    timestamp: str
    score: AISFScore
    analyst: str
    reason: str
 
 
def track_severity_evolution(
    revisions: list[ScoreRevision],
) -> dict:
    """
    Analyze how the severity score evolved during the investigation.
 
    Significant score changes indicate that initial assumptions
    were revised as new evidence emerged.
    """
    if not revisions:
        return {"error": "No revisions to analyze"}
 
    scores = [r.score.composite_score for r in revisions]
    return {
        "initial_score": scores[0],
        "final_score": scores[-1],
        "peak_score": max(scores),
        "score_delta": scores[-1] - scores[0],
        "revision_count": len(revisions),
        "escalations": sum(
            1 for i in range(1, len(scores)) if scores[i] > scores[i-1]
        ),
        "deescalations": sum(
            1 for i in range(1, len(scores)) if scores[i] < scores[i-1]
        ),
    }

Mapping to Existing Frameworks

MITRE ATLAS Integration

ATLAS techniques can inform the 攻擊 Sophistication dimension. Map observed techniques to their complexity:

Initial Access techniques (AML.TA0000): Generally lower sophistication (2-5)
ML 攻擊 Staging (AML.TA0001): Moderate sophistication (4-7)
ML Model Access (AML.TA0003): Varies widely depending on method (3-9)

NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. https://doi.org/10.6028/NIST.AI.100-1
FIRST. (2024). Common 漏洞 Scoring System v4.0 Specification. https://www.first.org/cvss/v4.0/specification-document
MITRE ATLAS. (2024). 對抗性 Threat Landscape for Artificial Intelligence Systems. https://atlas.mitre.org/
OWASP. (2025). OWASP Top 10 for 大型語言模型 Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/

AI Incident Severity Scoring

Related articles

AI Incident Severity Scoring

Related articles