AI-Specific Threat Modeling (Tradecraft)

expert8 min readUpdated 2026-03-11

Applying ATLAS, STRIDE, and attack tree methodologies to AI systems. Trust boundary analysis for agentic architectures, data flow analysis, and MCP threat modeling.

threat-modeling atlas stride attack-trees trust-boundaries mcp agentic data-flow

AI-Specific Threat Modeling

Traditional threat modeling frameworks were built for conventional software. AI systems introduce novel threat categories: adversarial inputs, model manipulation, training data attacks, and emergent risks of agentic tool use. This page covers how to adapt STRIDE and ATLAS for AI, build attack trees for LLM applications, and analyze trust boundaries in agentic and MCP architectures.

MITRE ATLAS Framework

ATLAS provides the standard taxonomy for AI-specific attack techniques.

ATLAS Tactics Mapped to Red Team Phases

ATLAS Tactic	Red Team Phase	Key Techniques
Reconnaissance	Scoping & Recon	Model fingerprinting, API probing, training data inference
Resource Development	Preparation	Adversarial sample generation, proxy model training
Initial Access	Exploitation	Prompt injection, adversarial inputs, supply chain compromise
ML Attack Staging	Exploitation	Inference API access, data poisoning setup
Execution	Exploitation	Adversarial ML attacks, model evasion, extraction
Persistence	Post-Exploitation	Backdoor insertion, training data manipulation
Exfiltration	Post-Exploitation	Model stealing, training data extraction
Impact	Impact Assessment	Model degradation, denial of service, integrity violations

Key ATLAS Techniques

ID	Name	Mitigations
AML.T0048	Prompt Injection	Input filtering, prompt hardening, output monitoring
AML.T0049	Indirect Prompt Injection	Content sanitization, instruction hierarchy, sandboxing
AML.T0054	LLM Jailbreak	Constitutional AI, RLHF, output filtering
AML.T0024	Exfiltration via Inference API	Rate limiting, query auditing, differential privacy
AML.T0047	ML Supply Chain Compromise	Artifact signing, provenance tracking, dependency scanning
AML.T0043	Craft Adversarial Data	Input sanitization, anomaly detection, human review

Attack Trees for AI Systems

Structure

Root Goal: Exfiltrate PII from RAG chatbot
├── OR: Direct prompt injection
│   ├── AND: Extract system prompt (cost: LOW)
│   └── AND: Craft data exfil payload (cost: LOW)
├── OR: Indirect injection via knowledge base
│   ├── AND: Upload poisoned document (cost: MEDIUM)
│   └── AND: Trigger retrieval (cost: LOW)
├── OR: API exploitation
│   ├── AND: Discover hidden endpoints (cost: LOW)
│   └── AND: Bypass authentication (cost: HIGH)
└── OR: Supply chain compromise
    └── Poison embedding model (cost: VERY HIGH)

Analyzing Attack Trees

Analysis	AND Nodes	OR Nodes
Cost	Sum of child costs	Minimum child cost
Probability	Product of child probabilities	1 - product of (1 - child probabilities)
Cheapest path	Must include all children	Pick cheapest child

Trust Boundary Analysis for Agentic Architectures

Agentic AI systems have complex trust boundaries that differ fundamentally from traditional applications because the LLM itself acts as a decision-maker routing data across boundaries.

Agentic Trust Zones

USER ZONE
  └─▶ ORCHESTRATION ZONE
       ├── Agent Router ──▶ LLM (Planning)
       ├── Tool Router       ◄── CRITICAL BOUNDARY
       └── Memory / Context
            └─▶ TOOL EXECUTION ZONE
                 ├── Code Exec  ├── Web API
                 ├── Database   ├── File I/O
                 ├── Email      └── MCP Server

Threats at Each Boundary

Boundary	Threat	Impact	Key Controls
User → Orchestrator	Direct prompt injection	Agent performs unintended actions	Input sanitization, intent classification
User → Orchestrator	Role confusion escalation	Elevated access via natural language	Role from auth, not prompt content
Orchestrator → LLM	Context window manipulation	Safety instructions pushed out	Context budget management, instruction repetition
Orchestrator → LLM	Tool definition injection	LLM selects wrong tools	Static tool definitions, schema validation
LLM → Tool Router	Unauthorized tool invocation	Privilege escalation	Per-user tool allowlists, authorization layer
LLM → Tool Router	Parameter injection	SQLi, command injection, SSRF via tools	Parameter validation, parameterized queries
Tool → External	Data exfiltration	Data breach via model-mediated request	Outbound URL allowlisting, DLP, HITL
Tool → External	SSRF through web tools	Internal network recon	Internal IP blocking, DNS rebinding protection
External → Context	Indirect prompt injection	Full agent compromise via untrusted data	Content sanitization, separate processing contexts

MCP Threat Modeling

MCP creates new, specific threat surfaces that most organizations have not yet modeled.

MCP-Specific Threats

Threat	Category	Likelihood	Impact
Tool definition poisoning -- malicious server injects prompt injection into tool descriptions	Server compromise	Medium	Full agent behavior hijack
Cross-server escalation -- lower-trust server leverages shared context to access higher-trust server's data	Trust boundary violation	High	Privilege escalation
Resource URI injection -- path traversal or SSRF payloads in MCP resource URIs	Input validation	High	Unauthorized data access
Sampling manipulation -- malicious server uses MCP sampling to inject prompts against host LLM	Sampling abuse	Medium	Information leakage, quota exhaustion
Transport layer attacks -- HTTP without TLS allows interception of tool calls	Communication security	Medium	Man-in-the-middle

MCP Mitigations Checklist

Tool definition allowlisting and integrity verification
Per-server context isolation
URI allowlist validation with path canonicalization
User approval required for sampling requests
Rate-limit sampling calls per server
Enforce TLS for HTTP transport; mutual TLS for high-security
Pin MCP server versions and verify checksums

Data Flow Threat Matrix

For AI systems, track how data flows through the architecture and identify where untrusted data can influence model behavior or sensitive outputs can leak.

Data Flow	Confidentiality Threat	Integrity Threat	Availability Threat
User → Model	N/A	Prompt injection	Context flooding
System Prompt → Model	Extraction	Override via injection	Context displacement
RAG → Model	Data exfiltration	Indirect injection	Poisoned retrieval
Model → Tool	Sensitive data in params	Parameter injection	Infinite tool loops
Tool → Model	Response data leakage	Response manipulation	Timeout/hang
Model → User	Training data leakage	Hallucination	Refusal DoS
Memory → Model	Cross-session leakage	Memory poisoning	Memory exhaustion

Advanced Recon — Threat models inform recon priorities and methodology
Full Engagement — End-to-end engagement methodology applies threat models

Knowledge Check

In an attack tree for a RAG chatbot, the cheapest attack path is typically:

References

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) — AI-specific threat framework
OWASP AI Security and Privacy Guide — Comprehensive AI threat modeling guide
NIST AI Risk Management Framework (AI RMF 1.0, 2023) — Risk framework for AI threat assessment

AI-Specific Threat Modeling (Tradecraft)

Related articles

AI-Specific Threat Modeling (Tradecraft)

Related articles