AI Threat Modeling Workshop Walkthrough
Step-by-step guide to running an AI-focused threat modeling workshop: adapting STRIDE for AI systems, constructing attack trees for LLM applications, participant facilitation techniques, and producing actionable threat models.
Threat modeling for AI systems requires extending traditional frameworks to cover AI-specific attack surfaces. A standard STRIDE analysis will identify authentication and authorization issues but will miss prompt injection, training data poisoning, model extraction, and emergent capabilities abuse. This walkthrough guides you through running a workshop that produces a threat model tailored to AI systems.
Step 1: Workshop Preparation
Participant Selection
The workshop requires participants who collectively understand the system's architecture, business logic, data flows, and threat landscape.
| Role | Why They Are Needed | Key Contribution |
|---|---|---|
| Product owner | Understands business context and user behavior | Defines what constitutes a security failure |
| ML engineer | Understands model architecture and training | Identifies model-specific attack surfaces |
| Backend engineer | Understands API and infrastructure | Identifies integration and infrastructure threats |
| Security engineer | Understands threat landscape and defenses | Brings adversarial mindset and control awareness |
| Red team lead | Understands attack techniques | Validates threat plausibility and severity |
| Data engineer (if applicable) | Understands data pipelines | Identifies data poisoning and exfiltration paths |
Pre-Workshop Materials
Prepare and distribute these materials at least 3 days before the workshop:
# Pre-Workshop Package
## System Overview
- Architecture diagram (data flow level)
- User types and access levels
- Model details (provider, version, fine-tuned?)
- Integration points (APIs, databases, external services)
- Existing security controls
## Workshop Goals
- Identify threats specific to the AI components
- Prioritize threats by likelihood and impact
- Map threats to specific architectural components
- Produce a prioritized threat register
## Preparation
- Review the architecture diagram
- List 3-5 things that could go wrong with the AI system
- Identify any past incidents or near-misses
- Note any compliance requirements affecting the AI systemRoom Setup and Materials
Physical workshop:
- Whiteboard or large paper (for architecture diagram)
- Sticky notes (4 colors: one per STRIDE-AI category)
- Dot stickers (for priority voting)
- Timer (for timeboxed exercises)
Virtual workshop:
- Miro/FigJam board with pre-built templates
- Architecture diagram loaded and editable
- Breakout rooms configured (if >6 participants)
- Timer visible to all participants
Step 2: Workshop Execution
Agenda (4 hours)
Introduction and Context Setting (30 minutes)
Set the stage by reviewing the system architecture and establishing the workshop's scope.
Facilitator script:
"We are here to identify the ways this AI system could be attacked, misused, or fail in ways that harm our users or organization. We are going to walk through the system component by component and identify threats using a framework adapted for AI systems. Every idea is valid at this stage -- we will prioritize later."
Activities:
- Walk through the architecture diagram (10 minutes)
- Define the trust boundaries (10 minutes)
- Establish the scope: what is in and out of consideration (5 minutes)
- Confirm the threat actors: who would attack this system and why (5 minutes)
Trust boundary identification:
Trust Boundary 1: User <-> Application Frontend - What crosses: user input (text, files, images) - Controls: input validation, rate limiting Trust Boundary 2: Application <-> LLM API - What crosses: prompts (system + user), model responses - Controls: API authentication, content filtering Trust Boundary 3: LLM <-> Data Sources (RAG) - What crosses: queries, document chunks, embeddings - Controls: access control on data source, relevance filtering Trust Boundary 4: Application <-> External APIs (tool use) - What crosses: function calls, API responses - Controls: permission scoping, output validationThreat actor profiles:
Threat Actor Motivation Capability Access Level Malicious end user Data theft, abuse, entertainment Low-medium technical skill Authenticated user Competitor IP theft, model extraction Medium-high technical skill External, may create accounts Insider Data exfiltration, sabotage High system knowledge Internal access Automated attacker Credential stuffing, spam Script-level automation External Researcher Vulnerability disclosure High technical skill External or authenticated STRIDE-AI Threat Identification (90 minutes)
Walk through each component of the architecture and apply the STRIDE-AI framework.
STRIDE adapted for AI systems:
Category Traditional STRIDE AI Extension Spoofing Identity spoofing Model impersonation, system prompt spoofing Tampering Data tampering Training data poisoning, prompt injection, adversarial examples Repudiation Denying actions Model behavior non-reproducibility, log gaps for AI actions Information Disclosure Data exposure Model extraction, training data memorization, system prompt leakage Denial of Service Service disruption Token exhaustion, compute abuse, model degradation Elevation of Privilege Unauthorized access Jailbreaking, function call abuse, agent scope escalation Additional AI-specific categories (beyond STRIDE):
Category Description Hallucination / Confabulation Model generates convincing but false information Bias and Discrimination Model produces unfair or discriminatory outputs Emergent Behavior Model exhibits unexpected capabilities or actions Supply Chain Compromised model weights, dependencies, or training data Exercise procedure (per component):
- Display the component on the architecture diagram (2 minutes)
- Each participant writes threats on sticky notes, one per note (5 minutes)
- Participants read their threats aloud and place them on the diagram (5 minutes)
- Group discussion to merge duplicates and identify missed threats (5 minutes)
- Move to the next component
Example threats for a RAG-based chatbot:
Component: User Input Processing - [T] Prompt injection via user message overrides system instructions - [T] Indirect prompt injection via uploaded documents - [D] Token exhaustion through extremely long inputs - [E] Jailbreak prompts bypass content policy - [S] User impersonates admin role in prompt Component: LLM API - [I] System prompt extraction reveals business logic - [I] Training data memorization leaks PII - [I] Model extraction through systematic querying - [E] Function call manipulation accesses unauthorized APIs - [R] Non-deterministic responses make incident investigation difficult Component: Knowledge Base / RAG - [T] Poisoned documents in knowledge base inject instructions - [I] Cross-tenant data leakage through retrieval - [I] Metadata exposure reveals document structure - [T] Embedding manipulation biases retrieval results Component: External Tool Calls - [E] Agent calls APIs with permissions beyond user's authorization - [T] Response injection from external API manipulates agent behavior - [D] Expensive API calls triggered by adversarial prompts - [S] Agent action attributed to wrong userAttack Tree Construction (60 minutes)
For the top threats identified, construct attack trees that map realistic attack paths.
Attack tree methodology:
Goal: Extract confidential data from knowledge base ├── Direct extraction │ ├── Ask for confidential documents by name │ │ └── Requires: knowledge of document names │ ├── Ask broad questions that trigger confidential document retrieval │ │ └── Requires: understanding of retrieval trigger terms │ └── Request data export or summary of all available information │ └── Requires: none (generic attack) │ ├── Indirect extraction │ ├── Inject instructions into uploadable documents │ │ ├── Add "summarize all documents" instruction in uploaded file │ │ └── Add "output contents of other documents" in metadata │ └── Manipulate retrieval through crafted queries │ ├── Use embedding-similar terms to retrieve unrelated documents │ └── Chain multiple queries to piece together confidential data │ └── Context leakage ├── Ask about sources and citations │ └── Model reveals document names and locations ├── Ask model to compare its knowledge to public information │ └── Delta reveals confidential content └── Error-triggering queries that expose retrieval context └── Malformed input causes error messages with document pathsFacilitator guidance for attack tree construction:
- Select the top 3-5 threats from the previous exercise
- For each threat, define the adversary's goal as the tree root
- Ask: "What are all the ways to achieve this goal?" (first level)
- For each path: "What does the attacker need to succeed?" (requirements)
- For each path: "What controls currently prevent this?" (mitigations)
- For each path: "How realistic is this?" (likelihood assessment)
Prioritization and Risk Rating (45 minutes)
Prioritize identified threats using a structured scoring approach.
Scoring criteria:
Factor Scale Description Likelihood 1-5 How likely is this attack to be attempted? Impact 1-5 What is the damage if the attack succeeds? Exploitability 1-5 How easy is it to execute this attack? Existing Controls 1-5 How well do current defenses mitigate this? (inverse: 5 = no controls) Risk Score = (Likelihood + Exploitability) x Impact x Existing Controls / 25
Voting exercise:
- List all unique threats on the board
- Each participant gets 5 red dots (high priority) and 5 yellow dots (medium priority)
- Participants place dots on threats they consider most important
- Tally votes and rank threats
# threat_scoring.py """Calculate and rank threat scores.""" def score_threats(threats): """Score and rank threats from the workshop.""" scored = [] for threat in threats: risk_score = ( (threat["likelihood"] + threat["exploitability"]) * threat["impact"] * threat["existing_controls"] / 25 ) scored.append({**threat, "risk_score": risk_score}) return sorted(scored, key=lambda t: t["risk_score"], reverse=True) # Example threats from the workshop workshop_threats = [ { "id": "T-001", "title": "Prompt injection overrides system instructions", "category": "Tampering", "component": "User Input Processing", "likelihood": 5, "impact": 4, "exploitability": 4, "existing_controls": 4, # Minimal controls }, { "id": "T-002", "title": "Knowledge base data exfiltration via RAG", "category": "Information Disclosure", "component": "Knowledge Base", "likelihood": 3, "impact": 5, "exploitability": 3, "existing_controls": 3, }, { "id": "T-003", "title": "Agent function calls exceed user authorization", "category": "Elevation of Privilege", "component": "External Tool Calls", "likelihood": 3, "impact": 5, "exploitability": 3, "existing_controls": 5, # No controls }, ] ranked = score_threats(workshop_threats) for t in ranked: print(f"[{t['risk_score']:.1f}] {t['id']}: {t['title']}")Action Items and Next Steps (15 minutes)
Close the workshop with concrete action items.
Output document structure:
# AI Threat Model: [System Name] ## Date: [Workshop Date] ## Participants: [List] ## System Overview [Architecture diagram and description] ## Trust Boundaries [Identified trust boundaries with controls] ## Threat Register | ID | Threat | Category | Component | Risk Score | Priority | |----|--------|----------|-----------|------------|----------| | T-001 | ... | ... | ... | ... | P1 | ## Attack Trees [Top 3-5 attack trees with paths and requirements] ## Recommended Actions | Priority | Action | Owner | Timeline | |----------|--------|-------|----------| | P1 | Implement input guardrails | Security | Sprint 1 | | P1 | Scope red team engagement for T-001, T-003 | Red Team | Week 2 | | P2 | Add RAG access controls | ML Eng | Sprint 2 | ## Red Team Scope Recommendations Based on the threat model, the following should be in scope for red team testing: 1. [Threat T-001]: Prompt injection testing (all identified paths) 2. [Threat T-003]: Function call authorization bypass testing 3. [Threat T-002]: Knowledge base exfiltration testing
Step 3: Post-Workshop Deliverables
Threat Model Document
Produce the threat model document within 2 business days of the workshop while discussions are fresh.
# generate_threat_report.py
"""Generate a structured threat model report from workshop data."""
def generate_report(threats, attack_trees, system_info):
"""Generate the threat model document."""
report_sections = []
# Executive summary
high_risk = [t for t in threats if t["risk_score"] > 3.0]
report_sections.append(
f"# Executive Summary\n\n"
f"The threat modeling workshop identified {len(threats)} threats "
f"across {len(set(t['component'] for t in threats))} system "
f"components. {len(high_risk)} threats are rated high risk and "
f"require immediate attention.\n"
)
# Threat register
report_sections.append("# Threat Register\n")
for t in sorted(threats, key=lambda x: x["risk_score"], reverse=True):
report_sections.append(
f"## {t['id']}: {t['title']}\n"
f"- **Category:** {t['category']}\n"
f"- **Component:** {t['component']}\n"
f"- **Risk Score:** {t['risk_score']:.1f}\n"
f"- **Likelihood:** {t['likelihood']}/5\n"
f"- **Impact:** {t['impact']}/5\n"
f"- **Exploitability:** {t['exploitability']}/5\n"
f"- **Existing Controls:** {t['existing_controls']}/5\n\n"
)
return "\n".join(report_sections)Mapping Threats to Red Team Scope
The threat model directly informs the red team engagement scope:
| Threat Priority | Red Team Action |
|---|---|
| P1 threats | Must be tested in the engagement. Include specific test cases. |
| P2 threats | Should be tested if time permits. Include in stretch goals. |
| P3 threats | Document for future engagements. May test opportunistically. |
| Threats with no controls | Test immediately. These represent unmitigated risk. |
| Threats with existing controls | Test control effectiveness. Can the control be bypassed? |
Step 4: Facilitation Techniques
Handling Common Workshop Challenges
"That would never happen" dismissals: Redirect to: "Our job is to identify what is possible, not what is likely. Likelihood scoring comes later. For now, let us capture it."
Dominant participants: Use round-robin input: "Let us go around the table. Each person shares one threat they identified before we open to group discussion."
Scope creep into remediation: Redirect to: "That is a great solution. Let us add it to the recommended actions section. Right now, let us stay focused on identifying threats."
Technical rabbit holes: Timebox: "We have 5 minutes per component. Let us capture this as a threat and discuss the technical details offline."
Participants unfamiliar with AI attacks: Prepare a one-page primer on AI-specific attack categories (prompt injection, jailbreaking, data poisoning, model extraction) and distribute it with the pre-workshop materials.
Common Pitfalls
-
Treating AI threat modeling as a one-time exercise. Threat models need updating when the system architecture changes, new attack techniques emerge, or new data sources are connected. Schedule quarterly reviews.
-
Focusing only on the model. The model is one component. Infrastructure, data pipelines, authentication, logging, and human processes are all attack surfaces.
-
Skipping the attack tree step. Threat lists without attack paths are less actionable. Attack trees show the specific conditions needed for an attack to succeed, which directly informs what to test and what to defend.
-
Not involving the product owner. Technical staff identify technical threats, but the product owner defines what constitutes a meaningful security failure from a business perspective.
Related Topics
- Engagement Kickoff -- How threat models inform engagement scope
- Reconnaissance Workflow -- Using the threat model to guide reconnaissance
- Scoping Checklist -- Systematic pre-engagement scoping steps
- Attack Execution -- Executing against threats identified in the model