Threat Modeling for LLM-Powered Applications
Step-by-step walkthrough for conducting threat modeling sessions specifically tailored to LLM-powered applications, covering data flow analysis, trust boundary identification, AI-specific threat enumeration, risk assessment, and mitigation planning.
Traditional threat modeling methodologies were designed for systems with deterministic behavior -- systems where the same input always produces the same output and control flow is fully specified in code. LLM-powered applications break this assumption fundamentally. The model is a non-deterministic component that makes decisions based on natural language, creating threat vectors that traditional models like STRIDE do not capture. This walkthrough adapts established threat modeling practices for the unique characteristics of LLM applications.
Step 1: System Decomposition
Begin by decomposing the LLM application into its constituent components. Unlike traditional applications where the architecture diagram shows the complete picture, LLM applications have implicit data flows that are not visible in standard diagrams.
Create an augmented data flow diagram (DFD) that includes AI-specific elements:
┌──────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: External │
│ │
│ ┌──────────┐ │
│ │ User │ │
│ └────┬─────┘ │
│ │ User Input (natural language) │
├───────┼──────────────────────────────────────────────────────┤
│ │ TRUST BOUNDARY: Application │
│ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Input │────▶│ Prompt │────▶│ LLM API │ │
│ │ Filter │ │ Constructor │ │ (Model Call) │ │
│ └──────────┘ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ │ System Prompt │ Model │
│ │ + Context │ Response │
│ ▼ ▼ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ RAG │ │ Output │ │
│ │ Retriever │ │ Filter │ │
│ └──────┬───────┘ └───────┬───────┘ │
│ │ │ │
├──────────────────────────┼─────────────────────┼─────────────┤
│ TRUST BOUNDARY: Data / Tools │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ Vector DB / │ │ │
│ │ Knowledge │ │ │
│ │ Base │ │ │
│ └──────────────┘ │ │
│ │ │
│ ┌──────────────┐ │ │
│ │ External │◀────────────┘ │
│ │ Tools/APIs │ (Function Calls) │
│ └──────────────┘ │
└──────────────────────────────────────────────────────────────┘
Document each component with its security-relevant properties:
# Component Inventory
| Component | Input | Output | Trust Level | Data Sensitivity |
|-----------|-------|--------|-------------|-----------------|
| User | N/A | Natural language text | Untrusted | N/A |
| Input Filter | User text | Filtered text | Application | Low |
| Prompt Constructor | Filtered text + system prompt + context | Full prompt | Application | High (contains system prompt) |
| RAG Retriever | Query embedding | Document chunks | Application | Medium-High (contains internal documents) |
| LLM API | Full prompt | Model response | External service | High (processes all data) |
| Output Filter | Model response | Filtered response | Application | Medium |
| External Tools | Function call parameters | Tool results | Variable (per tool) | High (performs actions) |
| Vector DB | Embeddings | Similar documents | Data store | Medium-High |Step 2: Trust Boundary Analysis
Traditional trust boundaries separate components with different privilege levels. LLM applications introduce a new type of boundary: the semantic trust boundary, where the transition from structured data to natural language creates an implicit trust gap.
# threat_model/trust_boundaries.py
"""Identify and document trust boundaries in LLM applications."""
from dataclasses import dataclass
from enum import Enum
class BoundaryType(Enum):
TRADITIONAL = "Traditional" # Network, process, privilege boundaries
SEMANTIC = "Semantic" # Structured data to natural language transitions
CONTEXT = "Context" # Different information contexts merged together
TEMPORAL = "Temporal" # Time-based trust changes (session boundaries)
@dataclass
class TrustBoundary:
name: str
boundary_type: BoundaryType
crosses_from: str
crosses_to: str
risk_description: str
data_at_risk: list[str]
LLM_TRUST_BOUNDARIES = [
TrustBoundary(
name="User Input to Application",
boundary_type=BoundaryType.TRADITIONAL,
crosses_from="Untrusted (user)",
crosses_to="Application logic",
risk_description="Standard input injection risk, amplified by natural language flexibility",
data_at_risk=["Application logic", "System prompt", "Other users' data"],
),
TrustBoundary(
name="Prompt Construction (Semantic Boundary)",
boundary_type=BoundaryType.SEMANTIC,
crosses_from="Structured application code",
crosses_to="Natural language prompt",
risk_description=(
"The transition from code to natural language is where prompt injection "
"occurs. User-controlled text becomes indistinguishable from system instructions "
"once combined into a prompt string."
),
data_at_risk=["System prompt", "Instruction integrity", "Tool access control"],
),
TrustBoundary(
name="RAG Context Injection",
boundary_type=BoundaryType.CONTEXT,
crosses_from="Retrieved documents",
crosses_to="Model context window",
risk_description=(
"Documents retrieved from the knowledge base are injected into the model's "
"context alongside the system prompt and user input. If documents contain "
"adversarial instructions, the model may follow them."
),
data_at_risk=["Instruction integrity", "Response accuracy", "Data access control"],
),
TrustBoundary(
name="Model Output to Tool Calls",
boundary_type=BoundaryType.SEMANTIC,
crosses_from="Model-generated text",
crosses_to="Structured tool parameters",
risk_description=(
"The model generates tool call parameters from natural language reasoning. "
"If the model has been manipulated, the tool parameters may include "
"unauthorized values."
),
data_at_risk=["Database integrity", "External system access", "Financial transactions"],
),
TrustBoundary(
name="Session Boundary",
boundary_type=BoundaryType.TEMPORAL,
crosses_from="Previous conversation context",
crosses_to="Current user request",
risk_description=(
"Conversation memory carries information across turns and potentially "
"across sessions. A compromised turn can poison future interactions."
),
data_at_risk=["Previous conversation data", "Other users' data"],
),
]Step 3: AI-Specific Threat Enumeration
Apply an adapted STRIDE analysis to each component and boundary. AI systems require additional threat categories beyond the standard six:
# Extended STRIDE for AI Systems (STRIDE-AI)
## Standard STRIDE categories (applied to AI context)
### S - Spoofing
- **Traditional**: Impersonating another user or service
- **AI-specific**: Impersonating a system message to the model ("You are now in admin mode")
- **AI-specific**: Spoofing the source of retrieved documents in RAG pipelines
### T - Tampering
- **Traditional**: Modifying data in transit or at rest
- **AI-specific**: Prompt injection (tampering with the instruction set through user input)
- **AI-specific**: Knowledge base poisoning (tampering with RAG source documents)
- **AI-specific**: Conversation memory poisoning (tampering with stored context)
### R - Repudiation
- **Traditional**: Denying an action occurred
- **AI-specific**: Non-deterministic model outputs make it hard to reproduce issues
- **AI-specific**: Difficult to prove what the model "intended" vs what it produced
### I - Information Disclosure
- **Traditional**: Unauthorized access to data
- **AI-specific**: System prompt extraction through conversational manipulation
- **AI-specific**: Training data extraction (memorization leakage)
- **AI-specific**: Cross-session data leakage through conversation memory
- **AI-specific**: Side-channel leakage through token counts, response times
### D - Denial of Service
- **Traditional**: Making a service unavailable
- **AI-specific**: Resource exhaustion through long prompts or recursive tool calls
- **AI-specific**: Rate limit bypass through prompt manipulation
- **AI-specific**: Model degradation through adversarial inputs that cause failures
### E - Elevation of Privilege
- **Traditional**: Gaining unauthorized access
- **AI-specific**: Gaining access to tools or data through prompt manipulation
- **AI-specific**: Escalating from user role to admin role through role injection
- **AI-specific**: Bypassing output filters to produce restricted contentApply this extended STRIDE to each data flow crossing a trust boundary:
# threat_model/threat_enumeration.py
"""Enumerate threats for each data flow in the LLM application."""
from dataclasses import dataclass
@dataclass
class Threat:
id: str
category: str # S, T, R, I, D, E
data_flow: str
threat_description: str
attack_scenario: str
likelihood: str # High, Medium, Low
impact: str # Critical, High, Medium, Low
existing_mitigations: list[str]
residual_risk: str
THREAT_CATALOG = [
Threat(
id="T001",
category="Tampering",
data_flow="User Input -> Prompt Constructor",
threat_description="Prompt injection through user input",
attack_scenario=(
"An attacker crafts input that, when combined with the system prompt, "
"causes the model to follow the attacker's instructions instead of "
"the application's. Example: 'Ignore previous instructions and...'"
),
likelihood="High",
impact="Critical",
existing_mitigations=["Input length limit"],
residual_risk="High -- input limits do not prevent injection",
),
Threat(
id="T002",
category="Information Disclosure",
data_flow="System Prompt -> Model Response",
threat_description="System prompt extraction",
attack_scenario=(
"An attacker uses social engineering, encoding tricks, or "
"instruction manipulation to get the model to output its "
"system prompt, revealing business logic, API keys, or "
"security controls."
),
likelihood="High",
impact="High",
existing_mitigations=["System prompt includes 'do not reveal' instruction"],
residual_risk="High -- instruction-based defense is easily bypassed",
),
Threat(
id="T003",
category="Elevation of Privilege",
data_flow="Model Response -> Tool Calls",
threat_description="Unauthorized tool invocation through prompt manipulation",
attack_scenario=(
"An attacker manipulates the model into calling tools with "
"unauthorized parameters -- e.g., looking up another user's "
"account, processing an unauthorized refund, or accessing "
"restricted database tables."
),
likelihood="High",
impact="Critical",
existing_mitigations=["Role-based access in application layer"],
residual_risk="Medium -- depends on tool-level authorization enforcement",
),
Threat(
id="T004",
category="Tampering",
data_flow="Knowledge Base -> RAG Retriever -> Prompt",
threat_description="RAG retrieval poisoning",
attack_scenario=(
"An attacker injects adversarial documents into the knowledge "
"base that contain instructions the model will follow when "
"those documents are retrieved as context."
),
likelihood="Medium",
impact="High",
existing_mitigations=["Document access controls"],
residual_risk="Medium -- depends on who can add documents to the knowledge base",
),
Threat(
id="T005",
category="Information Disclosure",
data_flow="Conversation Memory -> Prompt Constructor",
threat_description="Cross-session data leakage",
attack_scenario=(
"Attacker A interacts with the system, revealing sensitive data. "
"If memory is not isolated, Attacker B (or a different session) "
"can extract that data through memory recall prompts."
),
likelihood="Medium",
impact="High",
existing_mitigations=["Session-based memory isolation"],
residual_risk="Low if isolation is implemented, High if not",
),
]Step 4: Risk Assessment and Prioritization
Score each threat and create a prioritized risk register:
# threat_model/risk_assessment.py
"""Score and prioritize identified threats."""
RISK_MATRIX = {
("High", "Critical"): "Critical",
("High", "High"): "High",
("High", "Medium"): "High",
("High", "Low"): "Medium",
("Medium", "Critical"): "High",
("Medium", "High"): "High",
("Medium", "Medium"): "Medium",
("Medium", "Low"): "Low",
("Low", "Critical"): "Medium",
("Low", "High"): "Medium",
("Low", "Medium"): "Low",
("Low", "Low"): "Low",
}
def calculate_risk_level(likelihood: str, impact: str) -> str:
"""Calculate risk level from likelihood and impact."""
return RISK_MATRIX.get((likelihood, impact), "Medium")
def generate_risk_register(threats: list) -> list[dict]:
"""Generate a prioritized risk register from the threat catalog."""
register = []
for threat in threats:
risk_level = calculate_risk_level(threat.likelihood, threat.impact)
register.append({
"id": threat.id,
"threat": threat.threat_description,
"category": threat.category,
"risk_level": risk_level,
"likelihood": threat.likelihood,
"impact": threat.impact,
"residual_risk": threat.residual_risk,
})
# Sort by risk level priority
priority_order = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
register.sort(key=lambda x: priority_order.get(x["risk_level"], 4))
return registerStep 5: Mitigation Planning
For each identified threat, define specific, implementable mitigations:
# Mitigation Plan
## T001: Prompt Injection (Risk: Critical)
**Mitigations:**
1. Implement input sanitization pipeline (encode special characters, strip delimiters)
2. Use instruction hierarchy / system prompt hardening
3. Deploy prompt injection detection (classifier-based, not just keyword)
4. Implement output validation to catch injection success indicators
5. Use structured output formats (JSON) to constrain model responses
**Verification:** Red team test with OWASP injection test suite
## T002: System Prompt Extraction (Risk: High)
**Mitigations:**
1. Move sensitive configuration (API keys, connection strings) out of system prompt
2. Implement canary tokens in system prompt to detect extraction
3. Add output filtering for known system prompt fragments
4. Use a dual-LLM architecture where a judge model reviews responses
**Verification:** Run system prompt extraction test suite
## T003: Unauthorized Tool Invocation (Risk: Critical)
**Mitigations:**
1. Implement tool-level authorization that verifies the authenticated user's permissions
2. Validate all tool parameters server-side (do not trust model-generated parameters)
3. Implement rate limiting per user per tool
4. Log all tool invocations for audit
5. Use a confirmation step for high-impact tools (refunds, data deletion)
**Verification:** Red team test tool abuse scenarios
## T004: RAG Retrieval Poisoning (Risk: High)
**Mitigations:**
1. Implement document-level access control in the RAG pipeline
2. Sanitize documents before ingestion (strip instruction-like patterns)
3. Tag retrieved documents with metadata so the model can distinguish sources
4. Implement relevance scoring to filter out low-quality retrieval results
**Verification:** Inject test adversarial documents and verify they are neutralized
## T005: Cross-Session Data Leakage (Risk: Medium-High)
**Mitigations:**
1. Implement strict session isolation (no shared memory across users)
2. Add memory TTL (time-to-live) to automatically expire old conversations
3. Sanitize memory content before injection into new prompts
4. Implement audit logging for memory access patterns
**Verification:** Multi-session red team test with different user contextsStep 6: Documenting the Threat Model
Compile the complete threat model into a structured document:
# Threat Model Document Template
## 1. System Description
[From Step 1: system decomposition, component inventory]
## 2. Data Flow Diagrams
[Augmented DFD from Step 1]
## 3. Trust Boundaries
[From Step 2: trust boundary analysis]
## 4. Threat Catalog
[From Step 3: full threat enumeration with STRIDE-AI]
## 5. Risk Register
[From Step 4: prioritized risk register]
## 6. Mitigation Plan
[From Step 5: specific mitigations per threat]
## 7. Assumptions and Limitations
- Threat model assumes the architecture described in Section 1 is accurate
- Model behavior may change with updates from the provider
- New attack techniques may emerge that are not covered in this analysis
- This threat model should be reviewed quarterly and after any major architecture change
## 8. Review History
| Date | Reviewer | Changes |
|------|----------|---------|
| [Date] | [Name] | Initial threat model |Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Threat model too abstract | Threats described generically | Use specific attack scenarios with concrete examples |
| Missing AI-specific threats | Used standard STRIDE without AI extensions | Apply STRIDE-AI framework from Step 3 |
| Mitigations are too vague | "Improve security" is not actionable | Specify the exact control, implementation approach, and verification method |
| Stakeholders disengage | Session too technical or too long | Break into focused sessions, use visual DFDs, start with business impact |
| Threat model gets stale | Architecture changes faster than the model is updated | Schedule quarterly reviews, trigger reviews on architecture changes |
| Risk scores are disputed | Subjective likelihood/impact assessments | Use industry benchmarks (OWASP LLM Top 10) and historical incident data |
Key Takeaways
Threat modeling for LLM applications requires extending traditional methodologies to account for the non-deterministic, natural-language-driven nature of AI systems:
- Semantic trust boundaries are the critical addition -- the point where structured code meets natural language is where most AI-specific attacks occur. Traditional DFDs do not capture this boundary.
- STRIDE needs AI extensions -- each STRIDE category has AI-specific variants that traditional threat modeling misses. Use the STRIDE-AI framework to ensure coverage.
- Tool access is the highest-risk area -- when a model can invoke external tools, the impact of a successful attack escalates from "bad output" to "unauthorized actions." Prioritize tool security mitigations.
- Mitigations must be verifiable -- every mitigation should have a corresponding red team test that validates its effectiveness.
- Threat models are living documents -- AI applications evolve rapidly. Schedule regular reviews and trigger updates whenever the architecture changes.
Advanced Considerations
Evolving Attack Landscape
The AI security landscape evolves rapidly as both offensive techniques and defensive measures advance. Several trends shape the current state of play:
Increasing model capabilities create new attack surfaces. As models gain access to tools, code execution, web browsing, and computer use, each new capability introduces potential exploitation vectors that did not exist in earlier, text-only systems. The principle of least privilege becomes increasingly important as model capabilities expand.
Safety training improvements are necessary but not sufficient. Model providers invest heavily in safety training through RLHF, DPO, constitutional AI, and other alignment techniques. These improvements raise the bar for successful attacks but do not eliminate the fundamental vulnerability: models cannot reliably distinguish legitimate instructions from adversarial ones because this distinction is not represented in the architecture.
Automated red teaming tools democratize testing. Tools like NVIDIA's Garak, Microsoft's PyRIT, and Promptfoo enable organizations to conduct automated security testing without deep AI security expertise. However, automated tools catch known patterns; novel attacks and business logic vulnerabilities still require human creativity and domain knowledge.
Regulatory pressure drives organizational investment. The EU AI Act, NIST AI RMF, and industry-specific regulations increasingly require organizations to assess and mitigate AI-specific risks. This regulatory pressure is driving investment in AI security programs, but many organizations are still in the early stages of building mature AI security practices.
Cross-Cutting Security Principles
Several security principles apply across all topics covered in this curriculum:
-
Defense-in-depth: No single defensive measure is sufficient. Layer multiple independent defenses so that failure of any single layer does not result in system compromise. Input classification, output filtering, behavioral monitoring, and architectural controls should all be present.
-
Assume breach: Design systems assuming that any individual component can be compromised. This mindset leads to better isolation, monitoring, and incident response capabilities. When a prompt injection succeeds, the blast radius should be minimized through architectural controls.
-
Least privilege: Grant models and agents only the minimum capabilities needed for their intended function. A customer service chatbot does not need file system access or code execution. Excessive capabilities magnify the impact of successful exploitation.
-
Continuous testing: AI security is not a one-time assessment. Models change, defenses evolve, and new attack techniques are discovered regularly. Implement continuous security testing as part of the development and deployment lifecycle.
-
Secure by default: Default configurations should be secure. Require explicit opt-in for risky capabilities, use allowlists rather than denylists, and err on the side of restriction rather than permissiveness.
Integration with Organizational Security
AI security does not exist in isolation — it must integrate with the organization's broader security program:
| Security Domain | AI-Specific Integration |
|---|---|
| Identity and Access | API key management, model access controls, user authentication for AI features |
| Data Protection | Training data classification, PII in prompts, data residency for model calls |
| Application Security | AI feature threat modeling, prompt injection in SAST/DAST, secure AI design patterns |
| Incident Response | AI-specific playbooks, model behavior monitoring, prompt injection forensics |
| Compliance | AI regulatory mapping (EU AI Act, NIST), AI audit trails, model documentation |
| Supply Chain | Model provenance, dependency security, adapter/weight integrity verification |
class OrganizationalIntegration:
"""Framework for integrating AI security with organizational security programs."""
def __init__(self, org_config: dict):
self.config = org_config
self.gaps = []
def assess_maturity(self) -> dict:
"""Assess the organization's AI security maturity."""
domains = {
"governance": self._check_governance(),
"technical_controls": self._check_technical(),
"monitoring": self._check_monitoring(),
"incident_response": self._check_ir(),
"training": self._check_training(),
}
overall = sum(d["score"] for d in domains.values()) / len(domains)
return {"domains": domains, "overall_maturity": round(overall, 1)}
def _check_governance(self) -> dict:
has_policy = self.config.get("ai_security_policy", False)
has_framework = self.config.get("risk_framework", False)
score = (int(has_policy) + int(has_framework)) * 2.5
return {"score": score, "max": 5.0}
def _check_technical(self) -> dict:
controls = ["input_classification", "output_filtering", "rate_limiting", "sandboxing"]
active = sum(1 for c in controls if self.config.get(c, False))
return {"score": active * 1.25, "max": 5.0}
def _check_monitoring(self) -> dict:
has_monitoring = self.config.get("ai_monitoring", False)
has_alerting = self.config.get("ai_alerting", False)
score = (int(has_monitoring) + int(has_alerting)) * 2.5
return {"score": score, "max": 5.0}
def _check_ir(self) -> dict:
has_playbook = self.config.get("ai_ir_playbook", False)
return {"score": 5.0 if has_playbook else 0.0, "max": 5.0}
def _check_training(self) -> dict:
has_training = self.config.get("ai_security_training", False)
return {"score": 5.0 if has_training else 0.0, "max": 5.0}Future Directions
Several research and industry trends will shape the evolution of this field:
- Formal methods for AI safety: Development of mathematical frameworks that can provide bounded guarantees about model behavior under adversarial conditions
- Automated red teaming at scale: Continued improvement of automated testing tools that can discover novel vulnerabilities without human guidance
- AI-assisted defense: Using AI systems to detect and respond to attacks on other AI systems, creating a dynamic attack-defense ecosystem
- Standardized evaluation: Growing adoption of standardized benchmarks (HarmBench, JailbreakBench) that enable consistent measurement of progress
- Regulatory harmonization: Convergence of AI regulatory frameworks across jurisdictions, providing clearer requirements for organizations