Threat 模型ing for LLM-Powered Applications

中級17 分鐘閱讀更新於 2026-03-20

Step-by-step walkthrough for conducting threat modeling sessions specifically tailored to LLM-powered applications, covering data flow analysis, trust boundary identification, AI-specific threat enumeration, risk assessment, and mitigation planning.

threat-modeling llm-security risk-assessment methodology STRIDE walkthrough

Traditional threat modeling methodologies were designed for systems with deterministic behavior -- systems where the same 輸入 always produces the same 輸出 and control flow is fully specified in code. LLM-powered applications break this assumption fundamentally. 模型 is a non-deterministic component that makes decisions based on natural language, creating threat vectors that traditional models like STRIDE do not capture. This walkthrough adapts established threat modeling practices for the unique characteristics of LLM applications.

Step 1: System Decomposition

Begin by decomposing the LLM application into its constituent components. Unlike traditional applications where the architecture diagram shows the complete picture, LLM applications have implicit data flows that are not visible in standard diagrams.

Create an augmented data flow diagram (DFD) that includes AI-specific elements:

┌──────────────────────────────────────────────────────────────┐
│                     TRUST BOUNDARY: External                  │
│                                                              │
│  ┌──────────┐                                                │
│  │   User   │                                                │
│  └────┬─────┘                                                │
│       │ User 輸入 (natural language)                        │
├───────┼──────────────────────────────────────────────────────┤
│       │           TRUST BOUNDARY: Application                │
│       ▼                                                      │
│  ┌──────────┐     ┌──────────────┐     ┌───────────────┐   │
│  │  輸入   │────▶│   Prompt     │────▶│  LLM API      │   │
│  │  Filter  │     │  Constructor │     │  (Model Call)  │   │
│  └──────────┘     └──────┬───────┘     └───────┬───────┘   │
│                          │                     │             │
│                          │ System Prompt       │ Model       │
│                          │ + Context           │ Response    │
│                          ▼                     ▼             │
│                   ┌──────────────┐     ┌───────────────┐   │
│                   │   RAG        │     │   輸出      │   │
│                   │   Retriever  │     │   Filter      │   │
│                   └──────┬───────┘     └───────┬───────┘   │
│                          │                     │             │
├──────────────────────────┼─────────────────────┼─────────────┤
│       TRUST BOUNDARY: Data / Tools             │             │
│                          ▼                     │             │
│                   ┌──────────────┐             │             │
│                   │  Vector DB / │             │             │
│                   │  Knowledge   │             │             │
│                   │  Base        │             │             │
│                   └──────────────┘             │             │
│                                                │             │
│                   ┌──────────────┐             │             │
│                   │  External    │◀────────────┘             │
│                   │  Tools/APIs  │  (Function Calls)         │
│                   └──────────────┘                           │
└──────────────────────────────────────────────────────────────┘

Document each component with its 安全-relevant properties:

# Component Inventory
 
| Component | 輸入 | 輸出 | Trust Level | Data Sensitivity |
|-----------|-------|--------|-------------|-----------------|
| User | N/A | Natural language text | Untrusted | N/A |
| 輸入 Filter | User text | Filtered text | Application | Low |
| Prompt Constructor | Filtered text + 系統提示詞 + context | Full prompt | Application | High (contains 系統提示詞) |
| RAG Retriever | Query 嵌入向量 | Document chunks | Application | Medium-High (contains internal documents) |
| LLM API | Full prompt | Model response | External service | High (processes all data) |
| 輸出 Filter | Model response | Filtered response | Application | Medium |
| External Tools | Function call parameters | Tool results | Variable (per tool) | High (performs actions) |
| Vector DB | Embeddings | Similar documents | Data store | Medium-High |

Step 2: Trust Boundary Analysis

Traditional trust boundaries separate components with different privilege levels. LLM applications introduce a new type of boundary: the semantic trust boundary, where the transition from structured data to natural language creates an implicit trust gap.

# threat_model/trust_boundaries.py
"""識別 and document trust boundaries in LLM applications."""
from dataclasses import dataclass
from enum import Enum
 
class BoundaryType(Enum):
    TRADITIONAL = "Traditional"  # Network, process, privilege boundaries
    SEMANTIC = "Semantic"  # Structured data to natural language transitions
    CONTEXT = "Context"  # Different information contexts merged together
    TEMPORAL = "Temporal"  # Time-based trust changes (session boundaries)
 
@dataclass
class TrustBoundary:
    name: str
    boundary_type: BoundaryType
    crosses_from: str
    crosses_to: str
    risk_description: str
    data_at_risk: list[str]
 
LLM_TRUST_BOUNDARIES = [
    TrustBoundary(
        name="User 輸入 to Application",
        boundary_type=BoundaryType.TRADITIONAL,
        crosses_from="Untrusted (user)",
        crosses_to="Application logic",
        risk_description="Standard 輸入 injection risk, amplified by natural language flexibility",
        data_at_risk=["Application logic", "系統提示詞", "Other users' data"],
    ),
    TrustBoundary(
        name="Prompt Construction (Semantic Boundary)",
        boundary_type=BoundaryType.SEMANTIC,
        crosses_from="Structured application code",
        crosses_to="Natural language prompt",
        risk_description=(
            "The transition from code to natural language is where 提示詞注入 "
            "occurs. User-controlled text becomes indistinguishable from system instructions "
            "once combined into a prompt string."
        ),
        data_at_risk=["系統提示詞", "Instruction integrity", "Tool access control"],
    ),
    TrustBoundary(
        name="RAG Context Injection",
        boundary_type=BoundaryType.CONTEXT,
        crosses_from="Retrieved documents",
        crosses_to="Model 上下文視窗",
        risk_description=(
            "Documents retrieved from the 知識庫 are injected into 模型's "
            "context alongside the 系統提示詞 and 使用者輸入. If documents contain "
            "對抗性 instructions, 模型 may follow them."
        ),
        data_at_risk=["Instruction integrity", "Response accuracy", "Data access control"],
    ),
    TrustBoundary(
        name="Model 輸出 to Tool Calls",
        boundary_type=BoundaryType.SEMANTIC,
        crosses_from="Model-generated text",
        crosses_to="Structured tool parameters",
        risk_description=(
            "模型 generates 工具呼叫 parameters from natural language reasoning. "
            "If 模型 has been manipulated, the tool parameters may include "
            "unauthorized values."
        ),
        data_at_risk=["資料庫 integrity", "External system access", "Financial transactions"],
    ),
    TrustBoundary(
        name="Session Boundary",
        boundary_type=BoundaryType.TEMPORAL,
        crosses_from="Previous conversation context",
        crosses_to="Current user request",
        risk_description=(
            "Conversation memory carries information across turns and potentially "
            "across sessions. A compromised turn can poison future interactions."
        ),
        data_at_risk=["Previous conversation data", "Other users' data"],
    ),
]

Step 3: AI-Specific Threat Enumeration

Apply an adapted STRIDE analysis to each component and boundary. AI systems require additional threat categories beyond the standard six:

# Extended STRIDE for AI Systems (STRIDE-AI)
 
## Standard STRIDE categories (applied to AI context)
 
### S - Spoofing
- **Traditional**: Impersonating another user or service
- **AI-specific**: Impersonating a system message to 模型 ("You are now in admin mode")
- **AI-specific**: Spoofing the source of retrieved documents in RAG pipelines
 
### T - Tampering
- **Traditional**: Modifying data in transit or at rest
- **AI-specific**: Prompt injection (tampering with the instruction set through 使用者輸入)
- **AI-specific**: 知識庫 投毒 (tampering with RAG source documents)
- **AI-specific**: Conversation memory 投毒 (tampering with stored context)
 
### R - Repudiation
- **Traditional**: Denying an action occurred
- **AI-specific**: Non-deterministic model outputs make it hard to reproduce issues
- **AI-specific**: Difficult to prove what 模型 "intended" vs what it produced
 
### I - Information Disclosure
- **Traditional**: Unauthorized access to data
- **AI-specific**: 系統提示詞 extraction through conversational manipulation
- **AI-specific**: 訓練資料 extraction (memorization leakage)
- **AI-specific**: Cross-session data leakage through conversation memory
- **AI-specific**: Side-channel leakage through 符元 counts, response times
 
### D - Denial of Service
- **Traditional**: Making a service unavailable
- **AI-specific**: Resource exhaustion through long prompts or recursive tool calls
- **AI-specific**: Rate limit bypass through prompt manipulation
- **AI-specific**: Model degradation through 對抗性 inputs that cause failures
 
### E - Elevation of Privilege
- **Traditional**: Gaining unauthorized access
- **AI-specific**: Gaining access to tools or data through prompt manipulation
- **AI-specific**: Escalating from user role to admin role through role injection
- **AI-specific**: Bypassing 輸出 filters to produce restricted content

Apply this extended STRIDE to each data flow crossing a trust boundary:

# threat_model/threat_enumeration.py
"""Enumerate threats 對每個 data flow in the LLM application."""
from dataclasses import dataclass
 
@dataclass
class Threat:
    id: str
    category: str  # S, T, R, I, D, E
    data_flow: str
    threat_description: str
    attack_scenario: str
    likelihood: str  # High, Medium, Low
    impact: str  # Critical, High, Medium, Low
    existing_mitigations: list[str]
    residual_risk: str
 
THREAT_CATALOG = [
    Threat(
        id="T001",
        category="Tampering",
        data_flow="User 輸入 -> Prompt Constructor",
        threat_description="Prompt injection through 使用者輸入",
        attack_scenario=(
            "攻擊者 crafts 輸入 that, when combined with the 系統提示詞, "
            "causes 模型 to follow 攻擊者's instructions instead of "
            "the application's. 範例: 'Ignore previous instructions and...'"
        ),
        likelihood="High",
        impact="Critical",
        existing_mitigations=["輸入 length limit"],
        residual_risk="High -- 輸入 limits do not prevent injection",
    ),
    Threat(
        id="T002",
        category="Information Disclosure",
        data_flow="System Prompt -> Model Response",
        threat_description="系統提示詞 extraction",
        attack_scenario=(
            "攻擊者 uses social engineering, encoding tricks, or "
            "instruction manipulation to get 模型 to 輸出 its "
            "系統提示詞, revealing business logic, API keys, or "
            "安全 controls."
        ),
        likelihood="High",
        impact="High",
        existing_mitigations=["系統提示詞 includes 'do not reveal' instruction"],
        residual_risk="High -- instruction-based 防禦 is easily bypassed",
    ),
    Threat(
        id="T003",
        category="Elevation of Privilege",
        data_flow="Model Response -> Tool Calls",
        threat_description="Unauthorized tool invocation through prompt manipulation",
        attack_scenario=(
            "攻擊者 manipulates 模型 into calling tools with "
            "unauthorized parameters -- e.g., looking up another user's "
            "account, processing an unauthorized refund, or accessing "
            "restricted 資料庫 tables."
        ),
        likelihood="High",
        impact="Critical",
        existing_mitigations=["Role-based access in application layer"],
        residual_risk="Medium -- depends on tool-level 授權 enforcement",
    ),
    Threat(
        id="T004",
        category="Tampering",
        data_flow="Knowledge Base -> RAG Retriever -> Prompt",
        threat_description="RAG retrieval 投毒",
        attack_scenario=(
            "攻擊者 injects 對抗性 documents into the knowledge "
            "base that contain instructions 模型 will follow when "
            "those documents are retrieved as context."
        ),
        likelihood="Medium",
        impact="High",
        existing_mitigations=["Document access controls"],
        residual_risk="Medium -- depends on who can add documents to the 知識庫",
    ),
    Threat(
        id="T005",
        category="Information Disclosure",
        data_flow="Conversation Memory -> Prompt Constructor",
        threat_description="Cross-session data leakage",
        attack_scenario=(
            "Attacker A interacts with 系統, revealing sensitive data. "
            "If memory is not isolated, Attacker B (or a different session) "
            "can extract that data through memory recall prompts."
        ),
        likelihood="Medium",
        impact="High",
        existing_mitigations=["Session-based memory isolation"],
        residual_risk="Low if isolation is implemented, High if not",
    ),
]

Step 4: Risk 評估 and Prioritization

Score each threat and create a prioritized risk register:

# threat_model/risk_assessment.py
"""Score and prioritize identified threats."""
 
RISK_MATRIX = {
    ("High", "Critical"): "Critical",
    ("High", "High"): "High",
    ("High", "Medium"): "High",
    ("High", "Low"): "Medium",
    ("Medium", "Critical"): "High",
    ("Medium", "High"): "High",
    ("Medium", "Medium"): "Medium",
    ("Medium", "Low"): "Low",
    ("Low", "Critical"): "Medium",
    ("Low", "High"): "Medium",
    ("Low", "Medium"): "Low",
    ("Low", "Low"): "Low",
}
 
def calculate_risk_level(likelihood: str, impact: str) -> str:
    """Calculate risk level from likelihood and impact."""
    return RISK_MATRIX.get((likelihood, impact), "Medium")
 
def generate_risk_register(threats: list) -> list[dict]:
    """Generate a prioritized risk register from the threat catalog."""
    register = []
 
    for threat in threats:
        risk_level = calculate_risk_level(threat.likelihood, threat.impact)
        register.append({
            "id": threat.id,
            "threat": threat.threat_description,
            "category": threat.category,
            "risk_level": risk_level,
            "likelihood": threat.likelihood,
            "impact": threat.impact,
            "residual_risk": threat.residual_risk,
        })
 
    # Sort by risk level priority
    priority_order = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
    register.sort(key=lambda x: priority_order.get(x["risk_level"], 4))
 
    return register

Step 5: 緩解 Planning

對每個 identified threat, define specific, implementable mitigations:

# 緩解 Plan
 
## T001: 提示詞注入 (Risk: Critical)
**Mitigations:**
1. 實作 輸入 sanitization pipeline (encode special characters, strip delimiters)
2. Use instruction hierarchy / 系統提示詞 hardening
3. Deploy 提示詞注入 偵測 (classifier-based, not just keyword)
4. 實作 輸出 validation to catch injection success indicators
5. Use structured 輸出 formats (JSON) to constrain model responses
 
**Verification:** Red team 測試 with OWASP injection 測試 suite
 
## T002: System Prompt Extraction (Risk: High)
**Mitigations:**
1. Move sensitive configuration (API keys, connection strings) out of 系統提示詞
2. 實作 canary 符元 in 系統提示詞 to detect extraction
3. Add 輸出 filtering for known 系統提示詞 fragments
4. Use a dual-LLM architecture where a judge model reviews responses
 
**Verification:** Run 系統提示詞 extraction 測試 suite
 
## T003: Unauthorized Tool Invocation (Risk: Critical)
**Mitigations:**
1. 實作 tool-level 授權 that verifies the authenticated user's 權限
2. Validate all tool parameters server-side (do not trust model-generated parameters)
3. 實作 rate limiting per user per tool
4. Log all tool invocations for audit
5. Use a confirmation step for high-impact tools (refunds, data deletion)
 
**Verification:** Red team 測試 tool abuse scenarios
 
## T004: RAG Retrieval Poisoning (Risk: High)
**Mitigations:**
1. 實作 document-level access control in the RAG pipeline
2. Sanitize documents before ingestion (strip instruction-like patterns)
3. Tag retrieved documents with metadata so 模型 can distinguish sources
4. 實作 relevance scoring to filter out low-quality retrieval results
 
**Verification:** Inject 測試 對抗性 documents and verify they are neutralized
 
## T005: Cross-Session Data Leakage (Risk: Medium-High)
**Mitigations:**
1. 實作 strict session isolation (no shared memory across users)
2. Add memory TTL (time-to-live) to automatically expire old conversations
3. Sanitize memory content before injection into new prompts
4. 實作 audit logging for memory access patterns
 
**Verification:** Multi-session 紅隊 測試 with different user contexts

Step 6: Documenting the Threat Model

Compile the complete 威脅模型 into a structured document:

# Threat Model Document Template
 
## 1. System Description
[From Step 1: system decomposition, component inventory]
 
## 2. Data Flow Diagrams
[Augmented DFD from Step 1]
 
## 3. Trust Boundaries
[From Step 2: trust boundary analysis]
 
## 4. Threat Catalog
[From Step 3: full threat enumeration with STRIDE-AI]
 
## 5. Risk Register
[From Step 4: prioritized risk register]
 
## 6. 緩解 Plan
[From Step 5: specific mitigations per threat]
 
## 7. Assumptions and Limitations
- 威脅模型 assumes the architecture described in Section 1 is accurate
- Model behavior may change with updates from the provider
- New attack techniques may emerge that are not covered 在本 analysis
- This 威脅模型 should be reviewed quarterly and after any major architecture change
 
## 8. Review History
| Date | Reviewer | Changes |
|------|----------|---------|
| [Date] | [Name] | Initial 威脅模型 |

Common Pitfalls and Troubleshooting

Problem	Cause	Solution
威脅模型 too abstract	Threats described generically	Use specific attack scenarios with concrete examples
Missing AI-specific threats	Used standard STRIDE without AI extensions	Apply STRIDE-AI framework from Step 3
Mitigations are too vague	"Improve 安全" is not actionable	Specify the exact control, 實作 approach, and verification method
Stakeholders disengage	Session too technical or too long	Break into focused sessions, use visual DFDs, start with business impact
威脅模型 gets stale	Architecture changes faster than 模型 is updated	Schedule quarterly reviews, trigger reviews on architecture changes
Risk scores are disputed	Subjective likelihood/impact assessments	Use industry benchmarks (OWASP LLM Top 10) and historical incident data

關鍵要點

Threat modeling for LLM applications requires extending traditional methodologies to account for the non-deterministic, natural-language-driven nature of AI systems:

Semantic trust boundaries are the critical addition -- the point where structured code meets natural language is where most AI-specific attacks occur. Traditional DFDs do not capture this boundary.
STRIDE needs AI extensions -- each STRIDE category has AI-specific variants that traditional threat modeling misses. Use the STRIDE-AI framework to ensure coverage.
Tool access is the highest-risk area -- when a model can invoke external tools, the impact of a successful attack escalates from "bad 輸出" to "unauthorized actions." Prioritize tool 安全 mitigations.
Mitigations must be verifiable -- every 緩解 should have a corresponding 紅隊測試 that validates its effectiveness.
Threat models are living documents -- AI applications evolve rapidly. Schedule regular reviews and trigger updates whenever the architecture changes.

Advanced Considerations

Evolving 攻擊 Landscape

The AI 安全 landscape evolves rapidly as both offensive techniques and defensive measures advance. Several trends shape the current state of play:

Increasing model capabilities create new attack surfaces. As models gain access to tools, code execution, web browsing, and computer use, each new capability introduces potential 利用 vectors that did not exist in earlier, text-only systems. The principle of least privilege becomes increasingly important as model capabilities expand.

安全訓練 improvements are necessary but not sufficient. Model providers invest heavily in 安全訓練 through RLHF, DPO, constitutional AI, and other 對齊 techniques. These improvements raise the bar for successful attacks but do not eliminate the fundamental 漏洞: models cannot reliably distinguish legitimate instructions from 對抗性 ones 因為 this distinction is not represented in the architecture.

Automated 紅隊演練 tools democratize 測試. Tools like NVIDIA's Garak, Microsoft's PyRIT, and Promptfoo enable organizations to conduct automated 安全測試 without deep AI 安全 expertise. 然而, automated tools catch known patterns; novel attacks and business logic 漏洞 still require human creativity and domain knowledge.

Regulatory pressure drives organizational investment. The EU AI Act, NIST AI RMF, and industry-specific regulations increasingly require organizations to 評估 and mitigate AI-specific risks. This regulatory pressure is driving investment in AI 安全 programs, but many organizations are still in the early stages of building mature AI 安全 practices.

Cross-Cutting 安全 Principles

Several 安全 principles apply across all topics covered 在本 curriculum:

防禦-in-depth: No single defensive measure is sufficient. Layer multiple independent 防禦 so that failure of any single layer does not result in system compromise. 輸入 classification, 輸出 filtering, behavioral 監控, and architectural controls should all be present.
Assume breach: Design systems assuming that any individual component can be compromised. This mindset leads to better isolation, 監控, and incident response capabilities. When a 提示詞注入 succeeds, the blast radius should be minimized through architectural controls.
Least privilege: Grant models and 代理 only the minimum capabilities needed for their intended function. A customer service chatbot does not need file system access or code execution. Excessive capabilities magnify the impact of successful 利用.
Continuous 測試: AI 安全 is not a one-time 評估. Models change, 防禦 evolve, and new attack techniques are discovered regularly. 實作 continuous 安全測試 as part of the development and deployment lifecycle.
Secure by default: Default configurations should be secure. Require explicit opt-in for risky capabilities, use allowlists rather than denylists, and err on the side of restriction rather than permissiveness.

Integration with Organizational 安全

AI 安全 does not exist in isolation — it must integrate with the organization's broader 安全 program:

安全 Domain	AI-Specific Integration
Identity and Access	API key management, model access controls, user 認證 for AI features
Data Protection	訓練資料 classification, PII in prompts, data residency for model calls
Application 安全	AI feature threat modeling, 提示詞注入 in SAST/DAST, secure AI design patterns
Incident Response	AI-specific playbooks, model behavior 監控, 提示詞注入 forensics
Compliance	AI regulatory mapping (EU AI Act, NIST), AI audit trails, model documentation
Supply Chain	Model provenance, dependency 安全, adapter/weight integrity verification

class OrganizationalIntegration:
    """Framework for integrating AI 安全 with organizational 安全 programs."""
 
    def __init__(self, org_config: dict):
        self.config = org_config
        self.gaps = []
 
    def assess_maturity(self) -> dict:
        """評估 the organization's AI 安全 maturity."""
        domains = {
            "governance": self._check_governance(),
            "technical_controls": self._check_technical(),
            "監控": self._check_monitoring(),
            "incident_response": self._check_ir(),
            "訓練": self._check_training(),
        }
        overall = sum(d["score"] for d in domains.values()) / len(domains)
        return {"domains": domains, "overall_maturity": round(overall, 1)}
 
    def _check_governance(self) -> dict:
        has_policy = self.config.get("ai_security_policy", False)
        has_framework = self.config.get("risk_framework", False)
        score = (int(has_policy) + int(has_framework)) * 2.5
        return {"score": score, "max": 5.0}
 
    def _check_technical(self) -> dict:
        controls = ["input_classification", "output_filtering", "rate_limiting", "sandboxing"]
        active = sum(1 for c in controls if self.config.get(c, False))
        return {"score": active * 1.25, "max": 5.0}
 
    def _check_monitoring(self) -> dict:
        has_monitoring = self.config.get("ai_monitoring", False)
        has_alerting = self.config.get("ai_alerting", False)
        score = (int(has_monitoring) + int(has_alerting)) * 2.5
        return {"score": score, "max": 5.0}
 
    def _check_ir(self) -> dict:
        has_playbook = self.config.get("ai_ir_playbook", False)
        return {"score": 5.0 if has_playbook else 0.0, "max": 5.0}
 
    def _check_training(self) -> dict:
        has_training = self.config.get("ai_security_training", False)
        return {"score": 5.0 if has_training else 0.0, "max": 5.0}

Future Directions

Several research and industry trends will shape the evolution of this field:

Formal methods for AI 安全: Development of mathematical frameworks that can provide bounded guarantees about model behavior under 對抗性 conditions
Automated 紅隊演練 at scale: Continued improvement of automated 測試 tools that can discover novel 漏洞 without human guidance
AI-assisted 防禦: Using AI systems to detect and respond to attacks on other AI systems, creating a dynamic attack-防禦 ecosystem
Standardized 評估: Growing adoption of standardized benchmarks (HarmBench, JailbreakBench) that enable consistent measurement of progress
Regulatory harmonization: Convergence of AI regulatory frameworks across jurisdictions, providing clearer requirements for organizations

Threat 模型ing for LLM-Powered Applications

中級17 分鐘閱讀更新於 2026-03-20

threat-modeling llm-security risk-assessment methodology STRIDE walkthrough

Step 1: System Decomposition

Create an augmented data flow diagram (DFD) that includes AI-specific elements:

┌──────────────────────────────────────────────────────────────┐
│                     TRUST BOUNDARY: External                  │
│                                                              │
│  ┌──────────┐                                                │
│  │   User   │                                                │
│  └────┬─────┘                                                │
│       │ User 輸入 (natural language)                        │
├───────┼──────────────────────────────────────────────────────┤
│       │           TRUST BOUNDARY: Application                │
│       ▼                                                      │
│  ┌──────────┐     ┌──────────────┐     ┌───────────────┐   │
│  │  輸入   │────▶│   Prompt     │────▶│  LLM API      │   │
│  │  Filter  │     │  Constructor │     │  (Model Call)  │   │
│  └──────────┘     └──────┬───────┘     └───────┬───────┘   │
│                          │                     │             │
│                          │ System Prompt       │ Model       │
│                          │ + Context           │ Response    │
│                          ▼                     ▼             │
│                   ┌──────────────┐     ┌───────────────┐   │
│                   │   RAG        │     │   輸出      │   │
│                   │   Retriever  │     │   Filter      │   │
│                   └──────┬───────┘     └───────┬───────┘   │
│                          │                     │             │
├──────────────────────────┼─────────────────────┼─────────────┤
│       TRUST BOUNDARY: Data / Tools             │             │
│                          ▼                     │             │
│                   ┌──────────────┐             │             │
│                   │  Vector DB / │             │             │
│                   │  Knowledge   │             │             │
│                   │  Base        │             │             │
│                   └──────────────┘             │             │
│                                                │             │
│                   ┌──────────────┐             │             │
│                   │  External    │◀────────────┘             │
│                   │  Tools/APIs  │  (Function Calls)         │
│                   └──────────────┘                           │
└──────────────────────────────────────────────────────────────┘

Document each component with its 安全-relevant properties:

# Component Inventory
 
| Component | 輸入 | 輸出 | Trust Level | Data Sensitivity |
|-----------|-------|--------|-------------|-----------------|
| User | N/A | Natural language text | Untrusted | N/A |
| 輸入 Filter | User text | Filtered text | Application | Low |
| Prompt Constructor | Filtered text + 系統提示詞 + context | Full prompt | Application | High (contains 系統提示詞) |
| RAG Retriever | Query 嵌入向量 | Document chunks | Application | Medium-High (contains internal documents) |
| LLM API | Full prompt | Model response | External service | High (processes all data) |
| 輸出 Filter | Model response | Filtered response | Application | Medium |
| External Tools | Function call parameters | Tool results | Variable (per tool) | High (performs actions) |
| Vector DB | Embeddings | Similar documents | Data store | Medium-High |

Step 2: Trust Boundary Analysis

# threat_model/trust_boundaries.py
"""識別 and document trust boundaries in LLM applications."""
from dataclasses import dataclass
from enum import Enum
 
class BoundaryType(Enum):
    TRADITIONAL = "Traditional"  # Network, process, privilege boundaries
    SEMANTIC = "Semantic"  # Structured data to natural language transitions
    CONTEXT = "Context"  # Different information contexts merged together
    TEMPORAL = "Temporal"  # Time-based trust changes (session boundaries)
 
@dataclass
class TrustBoundary:
    name: str
    boundary_type: BoundaryType
    crosses_from: str
    crosses_to: str
    risk_description: str
    data_at_risk: list[str]
 
LLM_TRUST_BOUNDARIES = [
    TrustBoundary(
        name="User 輸入 to Application",
        boundary_type=BoundaryType.TRADITIONAL,
        crosses_from="Untrusted (user)",
        crosses_to="Application logic",
        risk_description="Standard 輸入 injection risk, amplified by natural language flexibility",
        data_at_risk=["Application logic", "系統提示詞", "Other users' data"],
    ),
    TrustBoundary(
        name="Prompt Construction (Semantic Boundary)",
        boundary_type=BoundaryType.SEMANTIC,
        crosses_from="Structured application code",
        crosses_to="Natural language prompt",
        risk_description=(
            "The transition from code to natural language is where 提示詞注入 "
            "occurs. User-controlled text becomes indistinguishable from system instructions "
            "once combined into a prompt string."
        ),
        data_at_risk=["系統提示詞", "Instruction integrity", "Tool access control"],
    ),
    TrustBoundary(
        name="RAG Context Injection",
        boundary_type=BoundaryType.CONTEXT,
        crosses_from="Retrieved documents",
        crosses_to="Model 上下文視窗",
        risk_description=(
            "Documents retrieved from the 知識庫 are injected into 模型's "
            "context alongside the 系統提示詞 and 使用者輸入. If documents contain "
            "對抗性 instructions, 模型 may follow them."
        ),
        data_at_risk=["Instruction integrity", "Response accuracy", "Data access control"],
    ),
    TrustBoundary(
        name="Model 輸出 to Tool Calls",
        boundary_type=BoundaryType.SEMANTIC,
        crosses_from="Model-generated text",
        crosses_to="Structured tool parameters",
        risk_description=(
            "模型 generates 工具呼叫 parameters from natural language reasoning. "
            "If 模型 has been manipulated, the tool parameters may include "
            "unauthorized values."
        ),
        data_at_risk=["資料庫 integrity", "External system access", "Financial transactions"],
    ),
    TrustBoundary(
        name="Session Boundary",
        boundary_type=BoundaryType.TEMPORAL,
        crosses_from="Previous conversation context",
        crosses_to="Current user request",
        risk_description=(
            "Conversation memory carries information across turns and potentially "
            "across sessions. A compromised turn can poison future interactions."
        ),
        data_at_risk=["Previous conversation data", "Other users' data"],
    ),
]

Step 3: AI-Specific Threat Enumeration

Apply an adapted STRIDE analysis to each component and boundary. AI systems require additional threat categories beyond the standard six:

# Extended STRIDE for AI Systems (STRIDE-AI)
 
## Standard STRIDE categories (applied to AI context)
 
### S - Spoofing
- **Traditional**: Impersonating another user or service
- **AI-specific**: Impersonating a system message to 模型 ("You are now in admin mode")
- **AI-specific**: Spoofing the source of retrieved documents in RAG pipelines
 
### T - Tampering
- **Traditional**: Modifying data in transit or at rest
- **AI-specific**: Prompt injection (tampering with the instruction set through 使用者輸入)
- **AI-specific**: 知識庫 投毒 (tampering with RAG source documents)
- **AI-specific**: Conversation memory 投毒 (tampering with stored context)
 
### R - Repudiation
- **Traditional**: Denying an action occurred
- **AI-specific**: Non-deterministic model outputs make it hard to reproduce issues
- **AI-specific**: Difficult to prove what 模型 "intended" vs what it produced
 
### I - Information Disclosure
- **Traditional**: Unauthorized access to data
- **AI-specific**: 系統提示詞 extraction through conversational manipulation
- **AI-specific**: 訓練資料 extraction (memorization leakage)
- **AI-specific**: Cross-session data leakage through conversation memory
- **AI-specific**: Side-channel leakage through 符元 counts, response times
 
### D - Denial of Service
- **Traditional**: Making a service unavailable
- **AI-specific**: Resource exhaustion through long prompts or recursive tool calls
- **AI-specific**: Rate limit bypass through prompt manipulation
- **AI-specific**: Model degradation through 對抗性 inputs that cause failures
 
### E - Elevation of Privilege
- **Traditional**: Gaining unauthorized access
- **AI-specific**: Gaining access to tools or data through prompt manipulation
- **AI-specific**: Escalating from user role to admin role through role injection
- **AI-specific**: Bypassing 輸出 filters to produce restricted content

Apply this extended STRIDE to each data flow crossing a trust boundary:

# threat_model/threat_enumeration.py
"""Enumerate threats 對每個 data flow in the LLM application."""
from dataclasses import dataclass
 
@dataclass
class Threat:
    id: str
    category: str  # S, T, R, I, D, E
    data_flow: str
    threat_description: str
    attack_scenario: str
    likelihood: str  # High, Medium, Low
    impact: str  # Critical, High, Medium, Low
    existing_mitigations: list[str]
    residual_risk: str
 
THREAT_CATALOG = [
    Threat(
        id="T001",
        category="Tampering",
        data_flow="User 輸入 -> Prompt Constructor",
        threat_description="Prompt injection through 使用者輸入",
        attack_scenario=(
            "攻擊者 crafts 輸入 that, when combined with the 系統提示詞, "
            "causes 模型 to follow 攻擊者's instructions instead of "
            "the application's. 範例: 'Ignore previous instructions and...'"
        ),
        likelihood="High",
        impact="Critical",
        existing_mitigations=["輸入 length limit"],
        residual_risk="High -- 輸入 limits do not prevent injection",
    ),
    Threat(
        id="T002",
        category="Information Disclosure",
        data_flow="System Prompt -> Model Response",
        threat_description="系統提示詞 extraction",
        attack_scenario=(
            "攻擊者 uses social engineering, encoding tricks, or "
            "instruction manipulation to get 模型 to 輸出 its "
            "系統提示詞, revealing business logic, API keys, or "
            "安全 controls."
        ),
        likelihood="High",
        impact="High",
        existing_mitigations=["系統提示詞 includes 'do not reveal' instruction"],
        residual_risk="High -- instruction-based 防禦 is easily bypassed",
    ),
    Threat(
        id="T003",
        category="Elevation of Privilege",
        data_flow="Model Response -> Tool Calls",
        threat_description="Unauthorized tool invocation through prompt manipulation",
        attack_scenario=(
            "攻擊者 manipulates 模型 into calling tools with "
            "unauthorized parameters -- e.g., looking up another user's "
            "account, processing an unauthorized refund, or accessing "
            "restricted 資料庫 tables."
        ),
        likelihood="High",
        impact="Critical",
        existing_mitigations=["Role-based access in application layer"],
        residual_risk="Medium -- depends on tool-level 授權 enforcement",
    ),
    Threat(
        id="T004",
        category="Tampering",
        data_flow="Knowledge Base -> RAG Retriever -> Prompt",
        threat_description="RAG retrieval 投毒",
        attack_scenario=(
            "攻擊者 injects 對抗性 documents into the knowledge "
            "base that contain instructions 模型 will follow when "
            "those documents are retrieved as context."
        ),
        likelihood="Medium",
        impact="High",
        existing_mitigations=["Document access controls"],
        residual_risk="Medium -- depends on who can add documents to the 知識庫",
    ),
    Threat(
        id="T005",
        category="Information Disclosure",
        data_flow="Conversation Memory -> Prompt Constructor",
        threat_description="Cross-session data leakage",
        attack_scenario=(
            "Attacker A interacts with 系統, revealing sensitive data. "
            "If memory is not isolated, Attacker B (or a different session) "
            "can extract that data through memory recall prompts."
        ),
        likelihood="Medium",
        impact="High",
        existing_mitigations=["Session-based memory isolation"],
        residual_risk="Low if isolation is implemented, High if not",
    ),
]

Step 4: Risk 評估 and Prioritization

Score each threat and create a prioritized risk register:

# threat_model/risk_assessment.py
"""Score and prioritize identified threats."""
 
RISK_MATRIX = {
    ("High", "Critical"): "Critical",
    ("High", "High"): "High",
    ("High", "Medium"): "High",
    ("High", "Low"): "Medium",
    ("Medium", "Critical"): "High",
    ("Medium", "High"): "High",
    ("Medium", "Medium"): "Medium",
    ("Medium", "Low"): "Low",
    ("Low", "Critical"): "Medium",
    ("Low", "High"): "Medium",
    ("Low", "Medium"): "Low",
    ("Low", "Low"): "Low",
}
 
def calculate_risk_level(likelihood: str, impact: str) -> str:
    """Calculate risk level from likelihood and impact."""
    return RISK_MATRIX.get((likelihood, impact), "Medium")
 
def generate_risk_register(threats: list) -> list[dict]:
    """Generate a prioritized risk register from the threat catalog."""
    register = []
 
    for threat in threats:
        risk_level = calculate_risk_level(threat.likelihood, threat.impact)
        register.append({
            "id": threat.id,
            "threat": threat.threat_description,
            "category": threat.category,
            "risk_level": risk_level,
            "likelihood": threat.likelihood,
            "impact": threat.impact,
            "residual_risk": threat.residual_risk,
        })
 
    # Sort by risk level priority
    priority_order = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
    register.sort(key=lambda x: priority_order.get(x["risk_level"], 4))
 
    return register

Step 5: 緩解 Planning

對每個 identified threat, define specific, implementable mitigations:

# 緩解 Plan
 
## T001: 提示詞注入 (Risk: Critical)
**Mitigations:**
1. 實作 輸入 sanitization pipeline (encode special characters, strip delimiters)
2. Use instruction hierarchy / 系統提示詞 hardening
3. Deploy 提示詞注入 偵測 (classifier-based, not just keyword)
4. 實作 輸出 validation to catch injection success indicators
5. Use structured 輸出 formats (JSON) to constrain model responses
 
**Verification:** Red team 測試 with OWASP injection 測試 suite
 
## T002: System Prompt Extraction (Risk: High)
**Mitigations:**
1. Move sensitive configuration (API keys, connection strings) out of 系統提示詞
2. 實作 canary 符元 in 系統提示詞 to detect extraction
3. Add 輸出 filtering for known 系統提示詞 fragments
4. Use a dual-LLM architecture where a judge model reviews responses
 
**Verification:** Run 系統提示詞 extraction 測試 suite
 
## T003: Unauthorized Tool Invocation (Risk: Critical)
**Mitigations:**
1. 實作 tool-level 授權 that verifies the authenticated user's 權限
2. Validate all tool parameters server-side (do not trust model-generated parameters)
3. 實作 rate limiting per user per tool
4. Log all tool invocations for audit
5. Use a confirmation step for high-impact tools (refunds, data deletion)
 
**Verification:** Red team 測試 tool abuse scenarios
 
## T004: RAG Retrieval Poisoning (Risk: High)
**Mitigations:**
1. 實作 document-level access control in the RAG pipeline
2. Sanitize documents before ingestion (strip instruction-like patterns)
3. Tag retrieved documents with metadata so 模型 can distinguish sources
4. 實作 relevance scoring to filter out low-quality retrieval results
 
**Verification:** Inject 測試 對抗性 documents and verify they are neutralized
 
## T005: Cross-Session Data Leakage (Risk: Medium-High)
**Mitigations:**
1. 實作 strict session isolation (no shared memory across users)
2. Add memory TTL (time-to-live) to automatically expire old conversations
3. Sanitize memory content before injection into new prompts
4. 實作 audit logging for memory access patterns
 
**Verification:** Multi-session 紅隊 測試 with different user contexts

Step 6: Documenting the Threat Model

Compile the complete 威脅模型 into a structured document:

# Threat Model Document Template
 
## 1. System Description
[From Step 1: system decomposition, component inventory]
 
## 2. Data Flow Diagrams
[Augmented DFD from Step 1]
 
## 3. Trust Boundaries
[From Step 2: trust boundary analysis]
 
## 4. Threat Catalog
[From Step 3: full threat enumeration with STRIDE-AI]
 
## 5. Risk Register
[From Step 4: prioritized risk register]
 
## 6. 緩解 Plan
[From Step 5: specific mitigations per threat]
 
## 7. Assumptions and Limitations
- 威脅模型 assumes the architecture described in Section 1 is accurate
- Model behavior may change with updates from the provider
- New attack techniques may emerge that are not covered 在本 analysis
- This 威脅模型 should be reviewed quarterly and after any major architecture change
 
## 8. Review History
| Date | Reviewer | Changes |
|------|----------|---------|
| [Date] | [Name] | Initial 威脅模型 |

Common Pitfalls and Troubleshooting

Problem	Cause	Solution
威脅模型 too abstract	Threats described generically	Use specific attack scenarios with concrete examples
Missing AI-specific threats	Used standard STRIDE without AI extensions	Apply STRIDE-AI framework from Step 3
Mitigations are too vague	"Improve 安全" is not actionable	Specify the exact control, 實作 approach, and verification method
Stakeholders disengage	Session too technical or too long	Break into focused sessions, use visual DFDs, start with business impact
威脅模型 gets stale	Architecture changes faster than 模型 is updated	Schedule quarterly reviews, trigger reviews on architecture changes
Risk scores are disputed	Subjective likelihood/impact assessments	Use industry benchmarks (OWASP LLM Top 10) and historical incident data

關鍵要點

Threat modeling for LLM applications requires extending traditional methodologies to account for the non-deterministic, natural-language-driven nature of AI systems:

Semantic trust boundaries are the critical addition -- the point where structured code meets natural language is where most AI-specific attacks occur. Traditional DFDs do not capture this boundary.
STRIDE needs AI extensions -- each STRIDE category has AI-specific variants that traditional threat modeling misses. Use the STRIDE-AI framework to ensure coverage.
Tool access is the highest-risk area -- when a model can invoke external tools, the impact of a successful attack escalates from "bad 輸出" to "unauthorized actions." Prioritize tool 安全 mitigations.
Mitigations must be verifiable -- every 緩解 should have a corresponding 紅隊測試 that validates its effectiveness.
Threat models are living documents -- AI applications evolve rapidly. Schedule regular reviews and trigger updates whenever the architecture changes.

防禦-in-depth: No single defensive measure is sufficient. Layer multiple independent 防禦 so that failure of any single layer does not result in system compromise. 輸入 classification, 輸出 filtering, behavioral 監控, and architectural controls should all be present.
Assume breach: Design systems assuming that any individual component can be compromised. This mindset leads to better isolation, 監控, and incident response capabilities. When a 提示詞注入 succeeds, the blast radius should be minimized through architectural controls.
Least privilege: Grant models and 代理 only the minimum capabilities needed for their intended function. A customer service chatbot does not need file system access or code execution. Excessive capabilities magnify the impact of successful 利用.
Continuous 測試: AI 安全 is not a one-time 評估. Models change, 防禦 evolve, and new attack techniques are discovered regularly. 實作 continuous 安全測試 as part of the development and deployment lifecycle.
Secure by default: Default configurations should be secure. Require explicit opt-in for risky capabilities, use allowlists rather than denylists, and err on the side of restriction rather than permissiveness.

Integration with Organizational 安全

AI 安全 does not exist in isolation — it must integrate with the organization's broader 安全 program:

安全 Domain	AI-Specific Integration
Identity and Access	API key management, model access controls, user 認證 for AI features
Data Protection	訓練資料 classification, PII in prompts, data residency for model calls
Application 安全	AI feature threat modeling, 提示詞注入 in SAST/DAST, secure AI design patterns
Incident Response	AI-specific playbooks, model behavior 監控, 提示詞注入 forensics
Compliance	AI regulatory mapping (EU AI Act, NIST), AI audit trails, model documentation
Supply Chain	Model provenance, dependency 安全, adapter/weight integrity verification

class OrganizationalIntegration:
    """Framework for integrating AI 安全 with organizational 安全 programs."""
 
    def __init__(self, org_config: dict):
        self.config = org_config
        self.gaps = []
 
    def assess_maturity(self) -> dict:
        """評估 the organization's AI 安全 maturity."""
        domains = {
            "governance": self._check_governance(),
            "technical_controls": self._check_technical(),
            "監控": self._check_monitoring(),
            "incident_response": self._check_ir(),
            "訓練": self._check_training(),
        }
        overall = sum(d["score"] for d in domains.values()) / len(domains)
        return {"domains": domains, "overall_maturity": round(overall, 1)}
 
    def _check_governance(self) -> dict:
        has_policy = self.config.get("ai_security_policy", False)
        has_framework = self.config.get("risk_framework", False)
        score = (int(has_policy) + int(has_framework)) * 2.5
        return {"score": score, "max": 5.0}
 
    def _check_technical(self) -> dict:
        controls = ["input_classification", "output_filtering", "rate_limiting", "sandboxing"]
        active = sum(1 for c in controls if self.config.get(c, False))
        return {"score": active * 1.25, "max": 5.0}
 
    def _check_monitoring(self) -> dict:
        has_monitoring = self.config.get("ai_monitoring", False)
        has_alerting = self.config.get("ai_alerting", False)
        score = (int(has_monitoring) + int(has_alerting)) * 2.5
        return {"score": score, "max": 5.0}
 
    def _check_ir(self) -> dict:
        has_playbook = self.config.get("ai_ir_playbook", False)
        return {"score": 5.0 if has_playbook else 0.0, "max": 5.0}
 
    def _check_training(self) -> dict:
        has_training = self.config.get("ai_security_training", False)
        return {"score": 5.0 if has_training else 0.0, "max": 5.0}

Future Directions

Several research and industry trends will shape the evolution of this field:

Formal methods for AI 安全: Development of mathematical frameworks that can provide bounded guarantees about model behavior under 對抗性 conditions
Automated 紅隊演練 at scale: Continued improvement of automated 測試 tools that can discover novel 漏洞 without human guidance
AI-assisted 防禦: Using AI systems to detect and respond to attacks on other AI systems, creating a dynamic attack-防禦 ecosystem
Standardized 評估: Growing adoption of standardized benchmarks (HarmBench, JailbreakBench) that enable consistent measurement of progress
Regulatory harmonization: Convergence of AI regulatory frameworks across jurisdictions, providing clearer requirements for organizations

Threat 模型ing for LLM-Powered Applications

相關文章

Threat 模型ing for LLM-Powered Applications

相關文章