AI-Specific Threat Modeling (Tradecraft)
Applying ATLAS, STRIDE, and attack tree methodologies to AI systems. Trust boundary analysis for agentic architectures, data flow analysis, and MCP threat modeling.
AI-Specific Threat Modeling
Traditional threat modeling frameworks were built for conventional software. AI systems introduce novel threat categories: adversarial inputs, model manipulation, training data attacks, and emergent risks of agentic tool use. This page covers how to adapt STRIDE and ATLAS for AI, build attack trees for LLM applications, and analyze trust boundaries in agentic and MCP architectures.
MITRE ATLAS Framework
ATLAS provides the standard taxonomy for AI-specific attack techniques.
ATLAS Tactics Mapped to Red Team Phases
| ATLAS Tactic | Red Team Phase | Key Techniques |
|---|---|---|
| Reconnaissance | Scoping & Recon | Model fingerprinting, API probing, training data inference |
| Resource Development | Preparation | Adversarial sample generation, proxy model training |
| Initial Access | Exploitation | Prompt injection, adversarial inputs, supply chain compromise |
| ML Attack Staging | Exploitation | Inference API access, data poisoning setup |
| Execution | Exploitation | Adversarial ML attacks, model evasion, extraction |
| Persistence | Post-Exploitation | Backdoor insertion, training data manipulation |
| Exfiltration | Post-Exploitation | Model stealing, training data extraction |
| Impact | Impact Assessment | Model degradation, denial of service, integrity violations |
Key ATLAS Techniques
| ID | Name | Mitigations |
|---|---|---|
| AML.T0048 | Prompt Injection | Input filtering, prompt hardening, output monitoring |
| AML.T0049 | Indirect Prompt Injection | Content sanitization, instruction hierarchy, sandboxing |
| AML.T0054 | LLM Jailbreak | Constitutional AI, RLHF, output filtering |
| AML.T0024 | Exfiltration via Inference API | Rate limiting, query auditing, differential privacy |
| AML.T0047 | ML Supply Chain Compromise | Artifact signing, provenance tracking, dependency scanning |
| AML.T0043 | Craft Adversarial Data | Input sanitization, anomaly detection, human review |
Attack Trees for AI Systems
Structure
Root Goal: Exfiltrate PII from RAG chatbot
├── OR: Direct prompt injection
│ ├── AND: Extract system prompt (cost: LOW)
│ └── AND: Craft data exfil payload (cost: LOW)
├── OR: Indirect injection via knowledge base
│ ├── AND: Upload poisoned document (cost: MEDIUM)
│ └── AND: Trigger retrieval (cost: LOW)
├── OR: API exploitation
│ ├── AND: Discover hidden endpoints (cost: LOW)
│ └── AND: Bypass authentication (cost: HIGH)
└── OR: Supply chain compromise
└── Poison embedding model (cost: VERY HIGH)Analyzing Attack Trees
| Analysis | AND Nodes | OR Nodes |
|---|---|---|
| Cost | Sum of child costs | Minimum child cost |
| Probability | Product of child probabilities | 1 - product of (1 - child probabilities) |
| Cheapest path | Must include all children | Pick cheapest child |
Trust Boundary Analysis for Agentic Architectures
Agentic AI systems have complex trust boundaries that differ fundamentally from traditional applications because the LLM itself acts as a decision-maker routing data across boundaries.
Agentic Trust Zones
USER ZONE
└─▶ ORCHESTRATION ZONE
├── Agent Router ──▶ LLM (Planning)
├── Tool Router ◄── CRITICAL BOUNDARY
└── Memory / Context
└─▶ TOOL EXECUTION ZONE
├── Code Exec ├── Web API
├── Database ├── File I/O
├── Email └── MCP ServerThreats at Each Boundary
| Boundary | Threat | Impact | Key Controls |
|---|---|---|---|
| User → Orchestrator | Direct prompt injection | Agent performs unintended actions | Input sanitization, intent classification |
| User → Orchestrator | Role confusion escalation | Elevated access via natural language | Role from auth, not prompt content |
| Orchestrator → LLM | Context window manipulation | Safety instructions pushed out | Context budget management, instruction repetition |
| Orchestrator → LLM | Tool definition injection | LLM selects wrong tools | Static tool definitions, schema validation |
| LLM → Tool Router | Unauthorized tool invocation | Privilege escalation | Per-user tool allowlists, authorization layer |
| LLM → Tool Router | Parameter injection | SQLi, command injection, SSRF via tools | Parameter validation, parameterized queries |
| Tool → External | Data exfiltration | Data breach via model-mediated request | Outbound URL allowlisting, DLP, HITL |
| Tool → External | SSRF through web tools | Internal network recon | Internal IP blocking, DNS rebinding protection |
| External → Context | Indirect prompt injection | Full agent compromise via untrusted data | Content sanitization, separate processing contexts |
MCP Threat Modeling
MCP creates new, specific threat surfaces that most organizations have not yet modeled.
MCP-Specific Threats
| Threat | Category | Likelihood | Impact |
|---|---|---|---|
| Tool definition poisoning -- malicious server injects prompt injection into tool descriptions | Server compromise | Medium | Full agent behavior hijack |
| Cross-server escalation -- lower-trust server leverages shared context to access higher-trust server's data | Trust boundary violation | High | Privilege escalation |
| Resource URI injection -- path traversal or SSRF payloads in MCP resource URIs | Input validation | High | Unauthorized data access |
| Sampling manipulation -- malicious server uses MCP sampling to inject prompts against host LLM | Sampling abuse | Medium | Information leakage, quota exhaustion |
| Transport layer attacks -- HTTP without TLS allows interception of tool calls | Communication security | Medium | Man-in-the-middle |
MCP Mitigations Checklist
- Tool definition allowlisting and integrity verification
- Per-server context isolation
- URI allowlist validation with path canonicalization
- User approval required for sampling requests
- Rate-limit sampling calls per server
- Enforce TLS for HTTP transport; mutual TLS for high-security
- Pin MCP server versions and verify checksums
Data Flow Threat Matrix
For AI systems, track how data flows through the architecture and identify where untrusted data can influence model behavior or sensitive outputs can leak.
| Data Flow | Confidentiality Threat | Integrity Threat | Availability Threat |
|---|---|---|---|
| User → Model | N/A | Prompt injection | Context flooding |
| System Prompt → Model | Extraction | Override via injection | Context displacement |
| RAG → Model | Data exfiltration | Indirect injection | Poisoned retrieval |
| Model → Tool | Sensitive data in params | Parameter injection | Infinite tool loops |
| Tool → Model | Response data leakage | Response manipulation | Timeout/hang |
| Model → User | Training data leakage | Hallucination | Refusal DoS |
| Memory → Model | Cross-session leakage | Memory poisoning | Memory exhaustion |
Related Topics
- Advanced Recon — Threat models inform recon priorities and methodology
- Full Engagement — End-to-end engagement methodology applies threat models
In an attack tree for a RAG chatbot, the cheapest attack path is typically:
References
- MITRE ATLAS (Adversarial Threat Landscape for AI Systems) — AI-specific threat framework
- OWASP AI Security and Privacy Guide — Comprehensive AI threat modeling guide
- NIST AI Risk Management Framework (AI RMF 1.0, 2023) — Risk framework for AI threat assessment