Risk Scoring Frameworks for AI Vulnerabilities

Intermediate16 min readUpdated 2026-03-20

導覽 for applying risk scoring frameworks to AI and LLM vulnerabilities, covering CVSS adaptation for AI, custom AI risk scoring matrices, severity classification, business impact assessment, and integration with existing vulnerability management processes.

risk-scoring vulnerability-assessment severity CVSS methodology walkthrough

Scoring AI 漏洞 is harder than scoring traditional software 漏洞. CVSS, the industry standard for 漏洞 scoring, was designed for deterministic software bugs -- buffer overflows, SQL injection, 認證 bypasses. These have clear, reproducible 利用 paths and well-understood impacts. AI 漏洞 are different: they are probabilistic (the same attack may succeed 30% of the time), their impact depends heavily on the application context, and many attack categories (like 越獄) have no direct CVSS analog. This walkthrough provides a practical framework for scoring AI 漏洞 consistently and communicating risk effectively to stakeholders.

Step 1: 理解 the Limitations of CVSS for AI

Before building a new framework, 理解 what CVSS gets wrong for AI 漏洞:

# CVSS Limitations for AI 漏洞
 
| CVSS Dimension | Traditional 漏洞 | AI 漏洞 | Gap |
|----------------|--------------------------|-------------------|-----|
| 攻擊 Vector | Network, Adjacent, Local, Physical | Often Network (API) | Minor -- most AI is API-accessible |
| 攻擊 Complexity | Low or High | Variable and probabilistic | **Major** -- AI attacks may work 5% or 95% of the time |
| Privileges Required | None, Low, High | Usually None (public-facing) | Minor |
| User Interaction | Required or None | Usually None | Minor |
| Scope | Changed or Unchanged | **Often Changed** (tool access) | Moderate -- scope changes are common |
| Confidentiality | None, Low, High | **Difficult to classify** | **Major** -- 系統提示詞 is "Low" or "High"? |
| Integrity | None, Low, High | **Difficult to classify** | **Major** -- generating misinformation is integrity impact but hard to quantify |
| Availability | None, Low, High | Usually Low | Minor |

The two major gaps are:

Probabilistic 利用: CVSS assumes an 利用 either works or it does not. AI attacks have success rates ranging from 1% to 99%. A 提示詞注入 that works 50% of the time is very different from one that works 5% of the time.
Context-dependent impact: The impact of a 越獄 depends entirely on what 模型 can do. A jailbroken chatbot that only generates text has a different risk profile than a jailbroken 代理 that can execute 資料庫 queries.

Step 2: The AI 漏洞 Risk Scoring Framework (AI-VRS)

Build a scoring framework specifically designed for AI 漏洞. The AI-VRS adds dimensions that CVSS lacks while remaining compatible with existing 漏洞 management workflows:

# scoring/ai_vrs.py
"""AI 漏洞 Risk Scoring Framework (AI-VRS)."""
from dataclasses import dataclass
from enum import Enum
 
 
class ExploitReliability(Enum):
    """How reliably can the 漏洞 be exploited?"""
    DETERMINISTIC = 4   # Works every time (>95% success rate)
    RELIABLE = 3        # Works most of the time (60-95%)
    MODERATE = 2        # Works sometimes (20-60%)
    UNRELIABLE = 1      # Rarely works (<20%)
 
 
class SkillRequired(Enum):
    """What skill level is needed to 利用 the 漏洞?"""
    NONE = 4           # Anyone can do it (copy-paste prompt)
    BASIC = 3          # Basic prompt engineering knowledge
    INTERMEDIATE = 2   # Red team experience, tool familiarity
    ADVANCED = 1       # Novel research, custom tooling needed
 
 
class DataExposureScope(Enum):
    """What data can be accessed through 利用?"""
    CRITICAL_SECRETS = 4   # API keys, credentials, PII databases
    INTERNAL_CONFIG = 3    # System prompts, model configuration
    SENSITIVE_DATA = 2     # User data, business logic details
    MINIMAL = 1            # Non-sensitive information
 
 
class ActionScope(Enum):
    """What actions can 攻擊者 take through 利用?"""
    CRITICAL_ACTIONS = 4   # Financial transactions, data deletion, privilege escalation
    SIGNIFICANT_ACTIONS = 3  # Data modification, unauthorized reads, tool abuse
    LIMITED_ACTIONS = 2    # Content policy bypass, misinformation generation
    NO_ACTIONS = 1         # Information-only, no actionable impact
 
 
class DetectionDifficulty(Enum):
    """How hard is it to detect 利用 attempts?"""
    VERY_HARD = 4      # Looks like normal usage, no anomaly signals
    HARD = 3           # Subtle anomalies, requires AI-specific 監控
    MODERATE = 2       # Detectable with standard 安全 監控
    EASY = 1           # Obvious malicious patterns, easily flagged
 
 
class RemediationComplexity(Enum):
    """How difficult is it to fix the 漏洞?"""
    FUNDAMENTAL = 4    # Requires architecture change or model replacement
    SIGNIFICANT = 3    # Requires new 防禦 layer or major config change
    MODERATE = 2       # Configuration change or filter addition
    SIMPLE = 1         # Parameter adjustment or rule addition
 
 
@dataclass
class AIVulnerabilityScore:
    """Complete AI 漏洞 risk score."""
    vulnerability_id: str
    vulnerability_name: str
    description: str
 
    # Scoring dimensions
    exploit_reliability: ExploitReliability
    skill_required: SkillRequired
    data_exposure_scope: DataExposureScope
    action_scope: ActionScope
    detection_difficulty: DetectionDifficulty
    remediation_complexity: RemediationComplexity
 
    # Context
    affected_component: str
    attack_category: str  # OWASP LLM Top 10 mapping
    business_context: str
 
    @property
    def exploitability_score(self) -> float:
        """How easy is it to 利用? (0-10 scale)"""
        raw = (
            self.exploit_reliability.value +
            self.skill_required.value +
            self.detection_difficulty.value
        )
        return round(raw / 12 * 10, 1)  # Normalize to 0-10
 
    @property
    def impact_score(self) -> float:
        """What is the impact of 利用? (0-10 scale)"""
        raw = (
            self.data_exposure_scope.value +
            self.action_scope.value +
            self.remediation_complexity.value
        )
        return round(raw / 12 * 10, 1)  # Normalize to 0-10
 
    @property
    def overall_score(self) -> float:
        """Combined risk score (0-10 scale)."""
        return round(
            (self.exploitability_score * 0.45 + self.impact_score * 0.55),
            1,
        )
 
    @property
    def severity(self) -> str:
        """Map overall score to severity label."""
        score = self.overall_score
        if score >= 8.0:
            return "Critical"
        elif score >= 6.0:
            return "High"
        elif score >= 4.0:
            return "Medium"
        elif score >= 2.0:
            return "Low"
        return "Informational"

Step 3: Scoring Real 漏洞

Apply the framework to common AI 漏洞 types:

# scoring/examples.py
"""Score examples for common AI 漏洞 types."""
from scoring.ai_vrs import *
 
 
SCORED_VULNERABILITIES = [
    AIVulnerabilityScore(
        vulnerability_id="AI-001",
        vulnerability_name="Direct 提示詞注入 -- System Prompt Extraction",
        description=(
            "The application's 系統提示詞, containing API keys and internal "
            "configuration, can be extracted through direct 提示詞注入. "
            "The attack succeeds approximately 70% of the time with basic payloads."
        ),
        exploit_reliability=ExploitReliability.RELIABLE,
        skill_required=SkillRequired.BASIC,
        data_exposure_scope=DataExposureScope.CRITICAL_SECRETS,
        action_scope=ActionScope.NO_ACTIONS,
        detection_difficulty=DetectionDifficulty.HARD,
        remediation_complexity=RemediationComplexity.MODERATE,
        affected_component="LLM Chat Endpoint",
        attack_category="LLM01: 提示詞注入",
        business_context="Exposed API key provides access to the production 資料庫",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-002",
        vulnerability_name="Tool Abuse via 提示詞注入",
        description=(
            "攻擊者 can manipulate 模型 into invoking the refund "
            "processing tool with arbitrary parameters through 提示詞注入. "
            "The attack requires moderate prompt engineering skill and succeeds "
            "approximately 40% of the time."
        ),
        exploit_reliability=ExploitReliability.MODERATE,
        skill_required=SkillRequired.INTERMEDIATE,
        data_exposure_scope=DataExposureScope.SENSITIVE_DATA,
        action_scope=ActionScope.CRITICAL_ACTIONS,
        detection_difficulty=DetectionDifficulty.HARD,
        remediation_complexity=RemediationComplexity.SIGNIFICANT,
        affected_component="代理 Tool Calling",
        attack_category="LLM01: 提示詞注入",
        business_context="Could process unauthorized refunds up to $10,000 per transaction",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-003",
        vulnerability_name="RAG Poisoning via User-Uploaded Documents",
        description=(
            "Users can upload documents to the 知識庫 that contain "
            "對抗性 instructions. When these documents are retrieved as "
            "context, 模型 follows the embedded instructions."
        ),
        exploit_reliability=ExploitReliability.RELIABLE,
        skill_required=SkillRequired.INTERMEDIATE,
        data_exposure_scope=DataExposureScope.SENSITIVE_DATA,
        action_scope=ActionScope.SIGNIFICANT_ACTIONS,
        detection_difficulty=DetectionDifficulty.VERY_HARD,
        remediation_complexity=RemediationComplexity.SIGNIFICANT,
        affected_component="RAG Pipeline",
        attack_category="LLM01: 提示詞注入",
        business_context="Could cause 模型 to provide incorrect information to all users",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-004",
        vulnerability_name="越獄 -- 安全 Filter Bypass",
        description=(
            "模型's 安全 filters can be bypassed using role-play and "
            "fictional framing techniques, allowing generation of content that "
            "violates the application's content policy."
        ),
        exploit_reliability=ExploitReliability.MODERATE,
        skill_required=SkillRequired.BASIC,
        data_exposure_scope=DataExposureScope.MINIMAL,
        action_scope=ActionScope.LIMITED_ACTIONS,
        detection_difficulty=DetectionDifficulty.MODERATE,
        remediation_complexity=RemediationComplexity.FUNDAMENTAL,
        affected_component="LLM Chat Endpoint",
        attack_category="LLM01: 提示詞注入",
        business_context="Reputational risk if users share bypass outputs publicly",
    ),
]
 
 
def print_scorecard():
    """Print a formatted scorecard for all scored 漏洞."""
    print(f"{'ID':<10} {'Name':<45} {'利用':<10} {'Impact':<10} {'Overall':<10} {'Severity':<10}")
    print("-" * 95)
 
    for vuln in SCORED_VULNERABILITIES:
        print(
            f"{vuln.vulnerability_id:<10} "
            f"{vuln.vulnerability_name[:43]:<45} "
            f"{vuln.exploitability_score:<10.1f} "
            f"{vuln.impact_score:<10.1f} "
            f"{vuln.overall_score:<10.1f} "
            f"{vuln.severity:<10}"
        )
 
 
if __name__ == "__main__":
    print_scorecard()

Step 4: Business Impact 評估

Technical severity alone does not determine priority. Overlay business impact to produce actionable prioritization:

# Business Impact 評估 Matrix
 
| Finding | Technical Severity | Business Impact Factors | Adjusted Priority |
|---------|-------------------|------------------------|-------------------|
| AI-001: System Prompt Extraction (API key) | High | Key provides production DB access; breach notification required | **Critical** (escalated) |
| AI-002: Tool Abuse (refunds) | High | Direct financial loss; $10K per transaction; rate limited to 100/day | **Critical** (confirmed) |
| AI-003: RAG Poisoning | High | Affects all users; misinformation risk in healthcare context | **Critical** (escalated) |
| AI-004: 越獄 | Medium | Reputational risk only; no data exposure; content moderation team can respond | **Medium** (confirmed) |
 
## Business Impact Factors to 考慮:
 
1. **Financial Impact**: Direct monetary loss, regulatory fines, legal liability
2. **Data Sensitivity**: PII, PHI, financial records, trade secrets
3. **Blast Radius**: Number of users affected, exposure duration
4. **Regulatory Requirements**: HIPAA, GDPR, PCI-DSS, AI-specific regulations
5. **Reputational Impact**: Public-facing system, media 注意力 risk
6. **Operational Impact**: System downtime, incident response cost
7. **Legal Liability**: Duty of care, contractual obligations

Step 5: Generating the Risk Report

Compile scores into a format that both technical and non-technical stakeholders can use:

# scoring/report_generator.py
"""Generate a risk scoring report from scored 漏洞."""
from datetime import datetime
 
 
def generate_risk_report(漏洞: list, engagement_name: str) -> str:
    """Generate a markdown risk report."""
    # Sort by overall score descending
    sorted_vulns = sorted(漏洞, key=lambda v: v.overall_score, reverse=True)
 
    report = f"""# AI 漏洞 Risk 評估 Report
 
**Engagement**: {engagement_name}
**Date**: {datetime.now().strftime('%Y-%m-%d')}
**Scoring Framework**: AI-VRS (AI 漏洞 Risk Scoring)
 
## Executive 總結
 
This 評估 identified **{len(漏洞)} 漏洞** in the AI application.
 
| Severity | Count |
|----------|-------|
| Critical | {sum(1 for v in sorted_vulns if v.severity == 'Critical')} |
| High | {sum(1 for v in sorted_vulns if v.severity == 'High')} |
| Medium | {sum(1 for v in sorted_vulns if v.severity == 'Medium')} |
| Low | {sum(1 for v in sorted_vulns if v.severity == 'Low')} |
 
## Risk Scoring 概覽
 
Each 漏洞 is scored across six dimensions:
 
| Dimension | What It Measures | Scale |
|-----------|-----------------|-------|
| 利用 Reliability | How consistently the attack succeeds | 1 (unreliable) - 4 (deterministic) |
| Skill Required | Attacker expertise needed | 1 (advanced) - 4 (none) |
| Data Exposure | What data is at risk | 1 (minimal) - 4 (critical secrets) |
| Action Scope | What actions 攻擊者 can take | 1 (none) - 4 (critical actions) |
| 偵測 Difficulty | How hard to detect 利用 | 1 (easy) - 4 (very hard) |
| Remediation Complexity | How hard to fix | 1 (simple) - 4 (fundamental) |
 
## Detailed Findings
 
"""
 
    for vuln in sorted_vulns:
        report += f"""### {vuln.vulnerability_id}: {vuln.vulnerability_name}
 
**Severity**: {vuln.severity} ({vuln.overall_score}/10)
**Exploitability**: {vuln.exploitability_score}/10 | **Impact**: {vuln.impact_score}/10
**OWASP Category**: {vuln.attack_category}
**Affected Component**: {vuln.affected_component}
 
**Description**: {vuln.description}
 
**Business Context**: {vuln.business_context}
 
**Scoring Breakdown**:
| Dimension | Value | Score |
|-----------|-------|-------|
| 利用 Reliability | {vuln.exploit_reliability.name} | {vuln.exploit_reliability.value}/4 |
| Skill Required | {vuln.skill_required.name} | {vuln.skill_required.value}/4 |
| Data Exposure | {vuln.data_exposure_scope.name} | {vuln.data_exposure_scope.value}/4 |
| Action Scope | {vuln.action_scope.name} | {vuln.action_scope.value}/4 |
| 偵測 Difficulty | {vuln.detection_difficulty.name} | {vuln.detection_difficulty.value}/4 |
| Remediation Complexity | {vuln.remediation_complexity.name} | {vuln.remediation_complexity.value}/4 |
 
---
 
"""
 
    return report

Common Pitfalls and Troubleshooting

Problem	Cause	Solution
Scores inconsistent across testers	Subjective dimension interpretation	Calibrate with scored examples, have team review edge cases
Client disputes severity	Business context not considered	Always overlay business impact 評估 on technical scores
All findings scored as "High"	Scoring dimensions not granular enough	Use all six dimensions, avoid rounding up
Stakeholders ignore low-severity findings	Report buried in Critical findings	Separate executive summary from detailed findings
Scores not comparable to CVSS	Different scale and methodology	Provide a CVSS-equivalent mapping table in the report appendix
Remediation not prioritized	Scoring alone does not determine fix order	Add a remediation priority column that considers fix effort vs. risk reduction

關鍵要點

Effective risk scoring for AI 漏洞 requires going beyond traditional CVSS:

Probabilistic 利用 must be quantified -- report the success rate and score accordingly. A 5% success rate is still a valid finding but scores differently than 95%.
Context determines impact -- the same 漏洞 (e.g., 提示詞注入) has vastly different impacts depending on what 模型 can access and do. Score impact based on the specific application, not the generic 漏洞 type.
偵測 difficulty matters -- an easily detected attack is less risky than one that looks like normal usage. Include 偵測 difficulty in the score.
Business impact overrides technical scores -- always overlay business context. A technically "Medium" finding that violates a regulatory requirement becomes a business-critical priority.
Consistency requires calibration -- use scored examples (like those in Step 3) to calibrate your team. Review edge cases together to build consensus on scoring interpretation.

Advanced Considerations

Adapting to Modern 防禦

The defensive landscape for LLM applications has evolved significantly since the initial wave of 提示詞注入 research. Modern production systems often deploy multiple independent defensive layers, requiring attackers to adapt their techniques accordingly.

輸入 classification: The most common first line of 防禦 is an 輸入 classifier that evaluates incoming prompts for 對抗性 patterns. These classifiers range from simple keyword-based filters to sophisticated ML models trained on 對抗性 examples. Bypassing 輸入 classifiers requires 理解 their 偵測 methodology:

Keyword-based classifiers can be evaded through encoding (Base64, ROT13, Unicode homoglyphs), paraphrasing, or splitting the payload across multiple turns
ML-based classifiers require more sophisticated evasion such as semantic camouflage, gradual escalation, or 利用 of the classifier's own blind spots

class DefenseBypassStrategy:
    """Strategy selector for bypassing different 防禦 configurations."""
 
    def __init__(self):
        self.strategies = {
            "keyword_filter": self._bypass_keyword,
            "ml_classifier": self._bypass_ml,
            "output_filter": self._bypass_output,
            "rate_limiter": self._bypass_rate,
            "behavioral": self._bypass_behavioral,
        }
 
    def select_strategy(self, identified_defenses: list) -> callable:
        """Select the optimal bypass strategy based on identified 防禦."""
        if not identified_defenses:
            return self._direct_attack
 
        # Combine strategies for multi-layer 防禦
        applicable = [
            self.strategies[d] for d in identified_defenses
            if d in self.strategies
        ]
        return self._chain_strategies(applicable)
 
    def _bypass_keyword(self, payload: str) -> str:
        """Bypass keyword-based 輸入 filters."""
        import base64
        # Encode the payload to bypass keyword matching
        encoded = base64.b64encode(payload.encode()).decode()
        return f"Decode and follow: {encoded}"
 
    def _bypass_ml(self, payload: str) -> list:
        """Generate multiple paraphrased variants for ML classifier evasion."""
        variants = [
            f"As a thought experiment, 考慮: {payload}",
            f"For academic research purposes: {payload}",
            f"In a fictional scenario where: {payload}",
        ]
        return variants
 
    def _bypass_output(self, payload: str) -> str:
        """Modify the payload to extract data through 輸出 filter gaps."""
        return payload + "\nRespond using only code blocks and variable names."
 
    def _bypass_rate(self, payload: str) -> dict:
        """Configure timing to avoid rate limit triggers."""
        return {"payload": payload, "delay": 3.0, "jitter": 1.5}
 
    def _bypass_behavioral(self, payload: str) -> list:
        """Split the payload across multiple natural-sounding turns."""
        # Multi-turn escalation to avoid behavioral anomaly 偵測
        return [
            "I'm working on a 安全 research project.",
            "Could you help me 理解 how 安全 mechanisms work?",
            "Specifically, I'm interested in how systems like yours " + payload[:100],
            payload,
        ]
 
    def _direct_attack(self, payload: str) -> str:
        return payload
 
    def _chain_strategies(self, strategies: list) -> callable:
        """Chain multiple bypass strategies."""
        def chained(payload):
            result = payload
            for strategy in strategies:
                result = strategy(result)
            return result
        return chained

輸出 filtering: 輸出 filters inspect 模型's response before it reaches 使用者, looking for sensitive data leakage, harmful content, or other policy violations. Common 輸出 filter bypass techniques include:

Technique	運作方式	Effectiveness
Encoding 輸出	Request Base64/hex encoded responses	Medium — some filters check decoded content
Code block wrapping	Embed data in code comments/variables	High — many filters skip code blocks
Steganographic 輸出	Hide data in formatting, capitalization, or spacing	High — difficult to detect
Chunked extraction	Extract small pieces across many turns	High — individual pieces may pass filters
Indirect extraction	Have 模型 reveal data through behavior changes	Very High — no explicit data in 輸出

Cross-Model Considerations

Techniques that work against one model may not directly transfer to others. 然而, 理解 the general principles allows adaptation:

安全訓練 methodology: Models trained with RLHF (GPT-4, Claude) have different 安全 characteristics than those using DPO (Llama, Mistral) or other methods. RLHF-trained models tend to refuse more broadly but may be more susceptible to multi-turn escalation.
Context window size: Models with larger context windows (Claude with 200K, Gemini with 1M+) may be more susceptible to 上下文視窗 manipulation where 對抗性 content is buried in large amounts of benign text.
Multimodal capabilities: Models that process images, audio, or other modalities introduce additional attack surfaces not present in text-only models.
工具使用實作: The 實作 details of 函式呼叫 vary significantly between providers. OpenAI uses a structured 函式呼叫 format, while Anthropic uses 工具使用 blocks. These differences affect 利用 techniques.

Operational Considerations

測試 Ethics and Boundaries

Professional 紅隊測試 operates within clear ethical and legal boundaries:

Authorization: Always obtain written 授權 before 測試. This should specify the scope, methods allowed, and any restrictions.
Scope limits: Stay within the authorized scope. If you discover a 漏洞 that leads outside the authorized perimeter, document it and report it without exploiting it.
Data handling: Handle any sensitive data discovered during 測試 according to the engagement agreement. Never retain sensitive data beyond what's needed for reporting.
Responsible disclosure: Follow responsible disclosure practices for any 漏洞 discovered, particularly if they affect systems beyond your 測試 scope.

Documenting Results

Professional documentation follows a structured format:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
 
@dataclass
class Finding:
    """Structure for documenting a 安全 finding."""
    id: str
    title: str
    severity: str  # Critical, High, Medium, Low, Informational
    category: str  # OWASP LLM Top 10 category
    description: str
    steps_to_reproduce: list[str]
    impact: str
    recommendation: str
    evidence: list[str] = field(default_factory=list)
    mitre_atlas: Optional[str] = None
    cvss_score: Optional[float] = None
    discovered_at: str = field(default_factory=lambda: datetime.now().isoformat())
 
    def to_report_section(self) -> str:
        """Generate a report section for this finding."""
        steps = "\n".join(f"   {i+1}. {s}" for i, s in enumerate(self.steps_to_reproduce))
        return f"""
### {self.id}: {self.title}
 
**Severity**: {self.severity}
**Category**: {self.category}
{f"**MITRE ATLAS**: {self.mitre_atlas}" if self.mitre_atlas else ""}
 
#### Description
{self.description}
 
#### Steps to Reproduce
{steps}
 
#### Impact
{self.impact}
 
#### Recommendation
{self.recommendation}
"""

This structured approach ensures that findings are actionable and that remediation teams have the information they need to address the 漏洞 effectively.

Risk Scoring Frameworks for AI Vulnerabilities

Intermediate16 min readUpdated 2026-03-20

risk-scoring vulnerability-assessment severity CVSS methodology walkthrough

Step 1: 理解 the Limitations of CVSS for AI

Before building a new framework, 理解 what CVSS gets wrong for AI 漏洞:

# CVSS Limitations for AI 漏洞
 
| CVSS Dimension | Traditional 漏洞 | AI 漏洞 | Gap |
|----------------|--------------------------|-------------------|-----|
| 攻擊 Vector | Network, Adjacent, Local, Physical | Often Network (API) | Minor -- most AI is API-accessible |
| 攻擊 Complexity | Low or High | Variable and probabilistic | **Major** -- AI attacks may work 5% or 95% of the time |
| Privileges Required | None, Low, High | Usually None (public-facing) | Minor |
| User Interaction | Required or None | Usually None | Minor |
| Scope | Changed or Unchanged | **Often Changed** (tool access) | Moderate -- scope changes are common |
| Confidentiality | None, Low, High | **Difficult to classify** | **Major** -- 系統提示詞 is "Low" or "High"? |
| Integrity | None, Low, High | **Difficult to classify** | **Major** -- generating misinformation is integrity impact but hard to quantify |
| Availability | None, Low, High | Usually Low | Minor |

The two major gaps are:

Probabilistic 利用: CVSS assumes an 利用 either works or it does not. AI attacks have success rates ranging from 1% to 99%. A 提示詞注入 that works 50% of the time is very different from one that works 5% of the time.
Context-dependent impact: The impact of a 越獄 depends entirely on what 模型 can do. A jailbroken chatbot that only generates text has a different risk profile than a jailbroken 代理 that can execute 資料庫 queries.

Step 2: The AI 漏洞 Risk Scoring Framework (AI-VRS)

Build a scoring framework specifically designed for AI 漏洞. The AI-VRS adds dimensions that CVSS lacks while remaining compatible with existing 漏洞 management workflows:

# scoring/ai_vrs.py
"""AI 漏洞 Risk Scoring Framework (AI-VRS)."""
from dataclasses import dataclass
from enum import Enum
 
 
class ExploitReliability(Enum):
    """How reliably can the 漏洞 be exploited?"""
    DETERMINISTIC = 4   # Works every time (>95% success rate)
    RELIABLE = 3        # Works most of the time (60-95%)
    MODERATE = 2        # Works sometimes (20-60%)
    UNRELIABLE = 1      # Rarely works (<20%)
 
 
class SkillRequired(Enum):
    """What skill level is needed to 利用 the 漏洞?"""
    NONE = 4           # Anyone can do it (copy-paste prompt)
    BASIC = 3          # Basic prompt engineering knowledge
    INTERMEDIATE = 2   # Red team experience, tool familiarity
    ADVANCED = 1       # Novel research, custom tooling needed
 
 
class DataExposureScope(Enum):
    """What data can be accessed through 利用?"""
    CRITICAL_SECRETS = 4   # API keys, credentials, PII databases
    INTERNAL_CONFIG = 3    # System prompts, model configuration
    SENSITIVE_DATA = 2     # User data, business logic details
    MINIMAL = 1            # Non-sensitive information
 
 
class ActionScope(Enum):
    """What actions can 攻擊者 take through 利用?"""
    CRITICAL_ACTIONS = 4   # Financial transactions, data deletion, privilege escalation
    SIGNIFICANT_ACTIONS = 3  # Data modification, unauthorized reads, tool abuse
    LIMITED_ACTIONS = 2    # Content policy bypass, misinformation generation
    NO_ACTIONS = 1         # Information-only, no actionable impact
 
 
class DetectionDifficulty(Enum):
    """How hard is it to detect 利用 attempts?"""
    VERY_HARD = 4      # Looks like normal usage, no anomaly signals
    HARD = 3           # Subtle anomalies, requires AI-specific 監控
    MODERATE = 2       # Detectable with standard 安全 監控
    EASY = 1           # Obvious malicious patterns, easily flagged
 
 
class RemediationComplexity(Enum):
    """How difficult is it to fix the 漏洞?"""
    FUNDAMENTAL = 4    # Requires architecture change or model replacement
    SIGNIFICANT = 3    # Requires new 防禦 layer or major config change
    MODERATE = 2       # Configuration change or filter addition
    SIMPLE = 1         # Parameter adjustment or rule addition
 
 
@dataclass
class AIVulnerabilityScore:
    """Complete AI 漏洞 risk score."""
    vulnerability_id: str
    vulnerability_name: str
    description: str
 
    # Scoring dimensions
    exploit_reliability: ExploitReliability
    skill_required: SkillRequired
    data_exposure_scope: DataExposureScope
    action_scope: ActionScope
    detection_difficulty: DetectionDifficulty
    remediation_complexity: RemediationComplexity
 
    # Context
    affected_component: str
    attack_category: str  # OWASP LLM Top 10 mapping
    business_context: str
 
    @property
    def exploitability_score(self) -> float:
        """How easy is it to 利用? (0-10 scale)"""
        raw = (
            self.exploit_reliability.value +
            self.skill_required.value +
            self.detection_difficulty.value
        )
        return round(raw / 12 * 10, 1)  # Normalize to 0-10
 
    @property
    def impact_score(self) -> float:
        """What is the impact of 利用? (0-10 scale)"""
        raw = (
            self.data_exposure_scope.value +
            self.action_scope.value +
            self.remediation_complexity.value
        )
        return round(raw / 12 * 10, 1)  # Normalize to 0-10
 
    @property
    def overall_score(self) -> float:
        """Combined risk score (0-10 scale)."""
        return round(
            (self.exploitability_score * 0.45 + self.impact_score * 0.55),
            1,
        )
 
    @property
    def severity(self) -> str:
        """Map overall score to severity label."""
        score = self.overall_score
        if score >= 8.0:
            return "Critical"
        elif score >= 6.0:
            return "High"
        elif score >= 4.0:
            return "Medium"
        elif score >= 2.0:
            return "Low"
        return "Informational"

Step 3: Scoring Real 漏洞

Apply the framework to common AI 漏洞 types:

# scoring/examples.py
"""Score examples for common AI 漏洞 types."""
from scoring.ai_vrs import *
 
 
SCORED_VULNERABILITIES = [
    AIVulnerabilityScore(
        vulnerability_id="AI-001",
        vulnerability_name="Direct 提示詞注入 -- System Prompt Extraction",
        description=(
            "The application's 系統提示詞, containing API keys and internal "
            "configuration, can be extracted through direct 提示詞注入. "
            "The attack succeeds approximately 70% of the time with basic payloads."
        ),
        exploit_reliability=ExploitReliability.RELIABLE,
        skill_required=SkillRequired.BASIC,
        data_exposure_scope=DataExposureScope.CRITICAL_SECRETS,
        action_scope=ActionScope.NO_ACTIONS,
        detection_difficulty=DetectionDifficulty.HARD,
        remediation_complexity=RemediationComplexity.MODERATE,
        affected_component="LLM Chat Endpoint",
        attack_category="LLM01: 提示詞注入",
        business_context="Exposed API key provides access to the production 資料庫",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-002",
        vulnerability_name="Tool Abuse via 提示詞注入",
        description=(
            "攻擊者 can manipulate 模型 into invoking the refund "
            "processing tool with arbitrary parameters through 提示詞注入. "
            "The attack requires moderate prompt engineering skill and succeeds "
            "approximately 40% of the time."
        ),
        exploit_reliability=ExploitReliability.MODERATE,
        skill_required=SkillRequired.INTERMEDIATE,
        data_exposure_scope=DataExposureScope.SENSITIVE_DATA,
        action_scope=ActionScope.CRITICAL_ACTIONS,
        detection_difficulty=DetectionDifficulty.HARD,
        remediation_complexity=RemediationComplexity.SIGNIFICANT,
        affected_component="代理 Tool Calling",
        attack_category="LLM01: 提示詞注入",
        business_context="Could process unauthorized refunds up to $10,000 per transaction",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-003",
        vulnerability_name="RAG Poisoning via User-Uploaded Documents",
        description=(
            "Users can upload documents to the 知識庫 that contain "
            "對抗性 instructions. When these documents are retrieved as "
            "context, 模型 follows the embedded instructions."
        ),
        exploit_reliability=ExploitReliability.RELIABLE,
        skill_required=SkillRequired.INTERMEDIATE,
        data_exposure_scope=DataExposureScope.SENSITIVE_DATA,
        action_scope=ActionScope.SIGNIFICANT_ACTIONS,
        detection_difficulty=DetectionDifficulty.VERY_HARD,
        remediation_complexity=RemediationComplexity.SIGNIFICANT,
        affected_component="RAG Pipeline",
        attack_category="LLM01: 提示詞注入",
        business_context="Could cause 模型 to provide incorrect information to all users",
    ),
 
    AIVulnerabilityScore(
        vulnerability_id="AI-004",
        vulnerability_name="越獄 -- 安全 Filter Bypass",
        description=(
            "模型's 安全 filters can be bypassed using role-play and "
            "fictional framing techniques, allowing generation of content that "
            "violates the application's content policy."
        ),
        exploit_reliability=ExploitReliability.MODERATE,
        skill_required=SkillRequired.BASIC,
        data_exposure_scope=DataExposureScope.MINIMAL,
        action_scope=ActionScope.LIMITED_ACTIONS,
        detection_difficulty=DetectionDifficulty.MODERATE,
        remediation_complexity=RemediationComplexity.FUNDAMENTAL,
        affected_component="LLM Chat Endpoint",
        attack_category="LLM01: 提示詞注入",
        business_context="Reputational risk if users share bypass outputs publicly",
    ),
]
 
 
def print_scorecard():
    """Print a formatted scorecard for all scored 漏洞."""
    print(f"{'ID':<10} {'Name':<45} {'利用':<10} {'Impact':<10} {'Overall':<10} {'Severity':<10}")
    print("-" * 95)
 
    for vuln in SCORED_VULNERABILITIES:
        print(
            f"{vuln.vulnerability_id:<10} "
            f"{vuln.vulnerability_name[:43]:<45} "
            f"{vuln.exploitability_score:<10.1f} "
            f"{vuln.impact_score:<10.1f} "
            f"{vuln.overall_score:<10.1f} "
            f"{vuln.severity:<10}"
        )
 
 
if __name__ == "__main__":
    print_scorecard()

Step 4: Business Impact 評估

Technical severity alone does not determine priority. Overlay business impact to produce actionable prioritization:

# Business Impact 評估 Matrix
 
| Finding | Technical Severity | Business Impact Factors | Adjusted Priority |
|---------|-------------------|------------------------|-------------------|
| AI-001: System Prompt Extraction (API key) | High | Key provides production DB access; breach notification required | **Critical** (escalated) |
| AI-002: Tool Abuse (refunds) | High | Direct financial loss; $10K per transaction; rate limited to 100/day | **Critical** (confirmed) |
| AI-003: RAG Poisoning | High | Affects all users; misinformation risk in healthcare context | **Critical** (escalated) |
| AI-004: 越獄 | Medium | Reputational risk only; no data exposure; content moderation team can respond | **Medium** (confirmed) |
 
## Business Impact Factors to 考慮:
 
1. **Financial Impact**: Direct monetary loss, regulatory fines, legal liability
2. **Data Sensitivity**: PII, PHI, financial records, trade secrets
3. **Blast Radius**: Number of users affected, exposure duration
4. **Regulatory Requirements**: HIPAA, GDPR, PCI-DSS, AI-specific regulations
5. **Reputational Impact**: Public-facing system, media 注意力 risk
6. **Operational Impact**: System downtime, incident response cost
7. **Legal Liability**: Duty of care, contractual obligations

Step 5: Generating the Risk Report

Compile scores into a format that both technical and non-technical stakeholders can use:

# scoring/report_generator.py
"""Generate a risk scoring report from scored 漏洞."""
from datetime import datetime
 
 
def generate_risk_report(漏洞: list, engagement_name: str) -> str:
    """Generate a markdown risk report."""
    # Sort by overall score descending
    sorted_vulns = sorted(漏洞, key=lambda v: v.overall_score, reverse=True)
 
    report = f"""# AI 漏洞 Risk 評估 Report
 
**Engagement**: {engagement_name}
**Date**: {datetime.now().strftime('%Y-%m-%d')}
**Scoring Framework**: AI-VRS (AI 漏洞 Risk Scoring)
 
## Executive 總結
 
This 評估 identified **{len(漏洞)} 漏洞** in the AI application.
 
| Severity | Count |
|----------|-------|
| Critical | {sum(1 for v in sorted_vulns if v.severity == 'Critical')} |
| High | {sum(1 for v in sorted_vulns if v.severity == 'High')} |
| Medium | {sum(1 for v in sorted_vulns if v.severity == 'Medium')} |
| Low | {sum(1 for v in sorted_vulns if v.severity == 'Low')} |
 
## Risk Scoring 概覽
 
Each 漏洞 is scored across six dimensions:
 
| Dimension | What It Measures | Scale |
|-----------|-----------------|-------|
| 利用 Reliability | How consistently the attack succeeds | 1 (unreliable) - 4 (deterministic) |
| Skill Required | Attacker expertise needed | 1 (advanced) - 4 (none) |
| Data Exposure | What data is at risk | 1 (minimal) - 4 (critical secrets) |
| Action Scope | What actions 攻擊者 can take | 1 (none) - 4 (critical actions) |
| 偵測 Difficulty | How hard to detect 利用 | 1 (easy) - 4 (very hard) |
| Remediation Complexity | How hard to fix | 1 (simple) - 4 (fundamental) |
 
## Detailed Findings
 
"""
 
    for vuln in sorted_vulns:
        report += f"""### {vuln.vulnerability_id}: {vuln.vulnerability_name}
 
**Severity**: {vuln.severity} ({vuln.overall_score}/10)
**Exploitability**: {vuln.exploitability_score}/10 | **Impact**: {vuln.impact_score}/10
**OWASP Category**: {vuln.attack_category}
**Affected Component**: {vuln.affected_component}
 
**Description**: {vuln.description}
 
**Business Context**: {vuln.business_context}
 
**Scoring Breakdown**:
| Dimension | Value | Score |
|-----------|-------|-------|
| 利用 Reliability | {vuln.exploit_reliability.name} | {vuln.exploit_reliability.value}/4 |
| Skill Required | {vuln.skill_required.name} | {vuln.skill_required.value}/4 |
| Data Exposure | {vuln.data_exposure_scope.name} | {vuln.data_exposure_scope.value}/4 |
| Action Scope | {vuln.action_scope.name} | {vuln.action_scope.value}/4 |
| 偵測 Difficulty | {vuln.detection_difficulty.name} | {vuln.detection_difficulty.value}/4 |
| Remediation Complexity | {vuln.remediation_complexity.name} | {vuln.remediation_complexity.value}/4 |
 
---
 
"""
 
    return report

Common Pitfalls and Troubleshooting

Problem	Cause	Solution
Scores inconsistent across testers	Subjective dimension interpretation	Calibrate with scored examples, have team review edge cases
Client disputes severity	Business context not considered	Always overlay business impact 評估 on technical scores
All findings scored as "High"	Scoring dimensions not granular enough	Use all six dimensions, avoid rounding up
Stakeholders ignore low-severity findings	Report buried in Critical findings	Separate executive summary from detailed findings
Scores not comparable to CVSS	Different scale and methodology	Provide a CVSS-equivalent mapping table in the report appendix
Remediation not prioritized	Scoring alone does not determine fix order	Add a remediation priority column that considers fix effort vs. risk reduction

關鍵要點

Effective risk scoring for AI 漏洞 requires going beyond traditional CVSS:

Probabilistic 利用 must be quantified -- report the success rate and score accordingly. A 5% success rate is still a valid finding but scores differently than 95%.
Context determines impact -- the same 漏洞 (e.g., 提示詞注入) has vastly different impacts depending on what 模型 can access and do. Score impact based on the specific application, not the generic 漏洞 type.
偵測 difficulty matters -- an easily detected attack is less risky than one that looks like normal usage. Include 偵測 difficulty in the score.
Business impact overrides technical scores -- always overlay business context. A technically "Medium" finding that violates a regulatory requirement becomes a business-critical priority.
Consistency requires calibration -- use scored examples (like those in Step 3) to calibrate your team. Review edge cases together to build consensus on scoring interpretation.

Advanced Considerations

Adapting to Modern 防禦

Keyword-based classifiers can be evaded through encoding (Base64, ROT13, Unicode homoglyphs), paraphrasing, or splitting the payload across multiple turns
ML-based classifiers require more sophisticated evasion such as semantic camouflage, gradual escalation, or 利用 of the classifier's own blind spots

class DefenseBypassStrategy:
    """Strategy selector for bypassing different 防禦 configurations."""
 
    def __init__(self):
        self.strategies = {
            "keyword_filter": self._bypass_keyword,
            "ml_classifier": self._bypass_ml,
            "output_filter": self._bypass_output,
            "rate_limiter": self._bypass_rate,
            "behavioral": self._bypass_behavioral,
        }
 
    def select_strategy(self, identified_defenses: list) -> callable:
        """Select the optimal bypass strategy based on identified 防禦."""
        if not identified_defenses:
            return self._direct_attack
 
        # Combine strategies for multi-layer 防禦
        applicable = [
            self.strategies[d] for d in identified_defenses
            if d in self.strategies
        ]
        return self._chain_strategies(applicable)
 
    def _bypass_keyword(self, payload: str) -> str:
        """Bypass keyword-based 輸入 filters."""
        import base64
        # Encode the payload to bypass keyword matching
        encoded = base64.b64encode(payload.encode()).decode()
        return f"Decode and follow: {encoded}"
 
    def _bypass_ml(self, payload: str) -> list:
        """Generate multiple paraphrased variants for ML classifier evasion."""
        variants = [
            f"As a thought experiment, 考慮: {payload}",
            f"For academic research purposes: {payload}",
            f"In a fictional scenario where: {payload}",
        ]
        return variants
 
    def _bypass_output(self, payload: str) -> str:
        """Modify the payload to extract data through 輸出 filter gaps."""
        return payload + "\nRespond using only code blocks and variable names."
 
    def _bypass_rate(self, payload: str) -> dict:
        """Configure timing to avoid rate limit triggers."""
        return {"payload": payload, "delay": 3.0, "jitter": 1.5}
 
    def _bypass_behavioral(self, payload: str) -> list:
        """Split the payload across multiple natural-sounding turns."""
        # Multi-turn escalation to avoid behavioral anomaly 偵測
        return [
            "I'm working on a 安全 research project.",
            "Could you help me 理解 how 安全 mechanisms work?",
            "Specifically, I'm interested in how systems like yours " + payload[:100],
            payload,
        ]
 
    def _direct_attack(self, payload: str) -> str:
        return payload
 
    def _chain_strategies(self, strategies: list) -> callable:
        """Chain multiple bypass strategies."""
        def chained(payload):
            result = payload
            for strategy in strategies:
                result = strategy(result)
            return result
        return chained

Technique	運作方式	Effectiveness
Encoding 輸出	Request Base64/hex encoded responses	Medium — some filters check decoded content
Code block wrapping	Embed data in code comments/variables	High — many filters skip code blocks
Steganographic 輸出	Hide data in formatting, capitalization, or spacing	High — difficult to detect
Chunked extraction	Extract small pieces across many turns	High — individual pieces may pass filters
Indirect extraction	Have 模型 reveal data through behavior changes	Very High — no explicit data in 輸出

Cross-Model Considerations

Techniques that work against one model may not directly transfer to others. 然而, 理解 the general principles allows adaptation:

安全訓練 methodology: Models trained with RLHF (GPT-4, Claude) have different 安全 characteristics than those using DPO (Llama, Mistral) or other methods. RLHF-trained models tend to refuse more broadly but may be more susceptible to multi-turn escalation.
Context window size: Models with larger context windows (Claude with 200K, Gemini with 1M+) may be more susceptible to 上下文視窗 manipulation where 對抗性 content is buried in large amounts of benign text.
Multimodal capabilities: Models that process images, audio, or other modalities introduce additional attack surfaces not present in text-only models.
工具使用實作: The 實作 details of 函式呼叫 vary significantly between providers. OpenAI uses a structured 函式呼叫 format, while Anthropic uses 工具使用 blocks. These differences affect 利用 techniques.

Operational Considerations

測試 Ethics and Boundaries

Professional 紅隊測試 operates within clear ethical and legal boundaries:

Authorization: Always obtain written 授權 before 測試. This should specify the scope, methods allowed, and any restrictions.
Scope limits: Stay within the authorized scope. If you discover a 漏洞 that leads outside the authorized perimeter, document it and report it without exploiting it.
Data handling: Handle any sensitive data discovered during 測試 according to the engagement agreement. Never retain sensitive data beyond what's needed for reporting.
Responsible disclosure: Follow responsible disclosure practices for any 漏洞 discovered, particularly if they affect systems beyond your 測試 scope.

Documenting Results

Professional documentation follows a structured format:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
 
@dataclass
class Finding:
    """Structure for documenting a 安全 finding."""
    id: str
    title: str
    severity: str  # Critical, High, Medium, Low, Informational
    category: str  # OWASP LLM Top 10 category
    description: str
    steps_to_reproduce: list[str]
    impact: str
    recommendation: str
    evidence: list[str] = field(default_factory=list)
    mitre_atlas: Optional[str] = None
    cvss_score: Optional[float] = None
    discovered_at: str = field(default_factory=lambda: datetime.now().isoformat())
 
    def to_report_section(self) -> str:
        """Generate a report section for this finding."""
        steps = "\n".join(f"   {i+1}. {s}" for i, s in enumerate(self.steps_to_reproduce))
        return f"""
### {self.id}: {self.title}
 
**Severity**: {self.severity}
**Category**: {self.category}
{f"**MITRE ATLAS**: {self.mitre_atlas}" if self.mitre_atlas else ""}
 
#### Description
{self.description}
 
#### Steps to Reproduce
{steps}
 
#### Impact
{self.impact}
 
#### Recommendation
{self.recommendation}
"""

This structured approach ensures that findings are actionable and that remediation teams have the information they need to address the 漏洞 effectively.

Risk Scoring Frameworks for AI Vulnerabilities

Related articles

Risk Scoring Frameworks for AI Vulnerabilities

Related articles