AI Threat 模型ing Workshop 導覽
Step-by-step guide to running an AI-focused threat modeling workshop: adapting STRIDE for AI systems, constructing attack trees for LLM applications, participant facilitation techniques, and producing actionable threat models.
Threat modeling for AI systems requires extending traditional frameworks to cover AI-specific attack surfaces. A standard STRIDE analysis will 識別 認證 and 授權 issues but will miss 提示詞注入, 訓練 資料投毒, model extraction, and emergent capabilities abuse. This walkthrough guides you through running a workshop that produces a 威脅模型 tailored to AI systems.
Step 1: Workshop Preparation
Participant Selection
The workshop requires participants who collectively 理解 系統's architecture, business logic, data flows, and threat landscape.
| Role | Why They Are Needed | Key Contribution |
|---|---|---|
| Product owner | Understands business context and user behavior | Defines what constitutes a 安全 failure |
| ML engineer | Understands model architecture and 訓練 | Identifies model-specific attack surfaces |
| Backend engineer | Understands API and infrastructure | Identifies integration and infrastructure threats |
| 安全 engineer | Understands threat landscape and 防禦 | Brings 對抗性 mindset and control awareness |
| Red team lead | Understands attack techniques | Validates threat plausibility and severity |
| Data engineer (if applicable) | Understands data pipelines | Identifies 資料投毒 and exfiltration paths |
Pre-Workshop Materials
Prepare and distribute these materials at least 3 days before the workshop:
# Pre-Workshop Package
## System 概覽
- Architecture diagram (data flow level)
- User types and access levels
- Model details (provider, version, fine-tuned?)
- Integration points (APIs, databases, external services)
- Existing 安全 controls
## Workshop Goals
- 識別 threats specific to the AI components
- Prioritize threats by likelihood and impact
- Map threats to specific architectural components
- Produce a prioritized threat register
## Preparation
- Review the architecture diagram
- List 3-5 things that could go wrong with the AI system
- 識別 any past incidents or near-misses
- Note any compliance requirements affecting the AI systemRoom Setup and Materials
Physical workshop:
- Whiteboard or large paper (for architecture diagram)
- Sticky notes (4 colors: one per STRIDE-AI category)
- Dot stickers (for priority voting)
- Timer (for timeboxed exercises)
Virtual workshop:
- Miro/FigJam board with pre-built templates
- Architecture diagram loaded and editable
- Breakout rooms configured (if >6 participants)
- Timer visible to all participants
Step 2: Workshop Execution
Agenda (4 hours)
介紹 and Context Setting (30 minutes)
Set the stage by reviewing 系統 architecture and establishing the workshop's scope.
Facilitator script:
"We are here to 識別 the ways this AI system could be attacked, misused, or fail in ways that harm our users or organization. We are going to walk through 系統 component by component and 識別 threats using a framework adapted for AI systems. Every idea is valid at this stage -- we will prioritize later."
Activities:
- Walk through the architecture diagram (10 minutes)
- Define the trust boundaries (10 minutes)
- Establish the scope: what is in and out of consideration (5 minutes)
- Confirm the threat actors: who would attack this system and why (5 minutes)
Trust boundary identification:
Trust Boundary 1: User <-> Application Frontend - What crosses: 使用者輸入 (text, files, images) - Controls: 輸入 validation, rate limiting Trust Boundary 2: Application <-> LLM API - What crosses: prompts (system + user), model responses - Controls: API 認證, content filtering Trust Boundary 3: LLM <-> Data Sources (RAG) - What crosses: queries, document chunks, 嵌入向量 - Controls: access control on data source, relevance filtering Trust Boundary 4: Application <-> External APIs (工具使用) - What crosses: function calls, API responses - Controls: 權限 scoping, 輸出 validationThreat actor profiles:
Threat Actor Motivation Capability Access Level Malicious end user Data theft, abuse, entertainment Low-medium technical skill Authenticated user Competitor IP theft, model extraction Medium-high technical skill External, may create accounts Insider Data exfiltration, sabotage High system knowledge Internal access Automated 攻擊者 Credential stuffing, spam Script-level automation External Researcher 漏洞 disclosure High technical skill External or authenticated STRIDE-AI Threat Identification (90 minutes)
Walk through each component of the architecture and apply the STRIDE-AI framework.
STRIDE adapted for AI systems:
Category Traditional STRIDE AI Extension Spoofing Identity spoofing Model impersonation, 系統提示詞 spoofing Tampering Data tampering Training 資料投毒, 提示詞注入, 對抗性 examples Repudiation Denying actions Model behavior non-reproducibility, log gaps for AI actions Information Disclosure Data exposure Model extraction, 訓練資料 memorization, 系統提示詞 leakage Denial of Service Service disruption Token exhaustion, compute abuse, model degradation Elevation of Privilege Unauthorized access Jailbreaking, function call abuse, 代理 scope escalation Additional AI-specific categories (beyond STRIDE):
Category Description Hallucination / Confabulation Model generates convincing but false information Bias and Discrimination Model produces unfair or discriminatory outputs Emergent Behavior Model exhibits unexpected capabilities or actions Supply Chain Compromised model weights, dependencies, or 訓練資料 Exercise procedure (per component):
- Display the component on the architecture diagram (2 minutes)
- Each participant writes threats on sticky notes, one per note (5 minutes)
- Participants read their threats aloud and place them on the diagram (5 minutes)
- Group discussion to merge duplicates and 識別 missed threats (5 minutes)
- Move to the next component
範例 threats for a RAG-based chatbot:
Component: User 輸入 Processing - [T] Prompt injection via user message overrides system instructions - [T] Indirect 提示詞注入 via uploaded documents - [D] Token exhaustion through extremely long inputs - [E] 越獄 prompts bypass content policy - [S] User impersonates admin role in prompt Component: LLM API - [I] 系統提示詞 extraction reveals business logic - [I] 訓練資料 memorization leaks PII - [I] Model extraction through systematic querying - [E] Function call manipulation accesses unauthorized APIs - [R] Non-deterministic responses make incident investigation difficult Component: Knowledge Base / RAG - [T] Poisoned documents in 知識庫 inject instructions - [I] Cross-tenant data leakage through retrieval - [I] Metadata exposure reveals document structure - [T] 嵌入向量 manipulation biases retrieval results Component: External Tool Calls - [E] 代理 calls APIs with 權限 beyond user's 授權 - [T] Response injection from external API manipulates 代理 behavior - [D] Expensive API calls triggered by 對抗性 prompts - [S] 代理 action attributed to wrong user攻擊 Tree Construction (60 minutes)
For the top threats identified, construct attack trees that map realistic attack paths.
攻擊 tree methodology:
Goal: Extract confidential data from 知識庫 ├── Direct extraction │ ├── Ask for confidential documents by name │ │ └── Requires: knowledge of document names │ ├── Ask broad questions that trigger confidential document retrieval │ │ └── Requires: 理解 of retrieval trigger terms │ └── Request data export or summary of all available information │ └── Requires: none (generic attack) │ ├── Indirect extraction │ ├── Inject instructions into uploadable documents │ │ ├── Add "summarize all documents" instruction in uploaded file │ │ └── Add "輸出 contents of other documents" in metadata │ └── Manipulate retrieval through crafted queries │ ├── Use 嵌入向量-similar terms to retrieve unrelated documents │ └── Chain multiple queries to piece together confidential data │ └── Context leakage ├── Ask about sources and citations │ └── Model reveals document names and locations ├── Ask model to compare its knowledge to public information │ └── Delta reveals confidential content └── Error-triggering queries that expose retrieval context └── Malformed 輸入 causes error messages with document pathsFacilitator guidance for attack tree construction:
- Select the top 3-5 threats from the previous exercise
- 對每個 threat, define the adversary's goal as the tree root
- Ask: "What are all the ways to achieve this goal?" (first level)
- 對每個 path: "What does 攻擊者 need to succeed?" (requirements)
- 對每個 path: "What controls currently prevent this?" (mitigations)
- 對每個 path: "How realistic is this?" (likelihood 評估)
Prioritization and Risk Rating (45 minutes)
Prioritize identified threats using a structured scoring approach.
Scoring criteria:
Factor Scale Description Likelihood 1-5 How likely is this attack to be attempted? Impact 1-5 What is the damage if the attack succeeds? Exploitability 1-5 How easy is it to execute this attack? Existing Controls 1-5 How well do current 防禦 mitigate this? (inverse: 5 = no controls) Risk Score = (Likelihood + Exploitability) x Impact x Existing Controls / 25
Voting exercise:
- List all unique threats on the board
- Each participant gets 5 red dots (high priority) and 5 yellow dots (medium priority)
- Participants place dots on threats they 考慮 most important
- Tally votes and rank threats
# threat_scoring.py """Calculate and rank threat scores.""" def score_threats(threats): """Score and rank threats from the workshop.""" scored = [] for threat in threats: risk_score = ( (threat["likelihood"] + threat["exploitability"]) * threat["impact"] * threat["existing_controls"] / 25 ) scored.append({**threat, "risk_score": risk_score}) return sorted(scored, key=lambda t: t["risk_score"], reverse=True) # 範例 threats from the workshop workshop_threats = [ { "id": "T-001", "title": "Prompt injection overrides system instructions", "category": "Tampering", "component": "User 輸入 Processing", "likelihood": 5, "impact": 4, "exploitability": 4, "existing_controls": 4, # Minimal controls }, { "id": "T-002", "title": "知識庫 data exfiltration via RAG", "category": "Information Disclosure", "component": "Knowledge Base", "likelihood": 3, "impact": 5, "exploitability": 3, "existing_controls": 3, }, { "id": "T-003", "title": "代理 function calls exceed user 授權", "category": "Elevation of Privilege", "component": "External Tool Calls", "likelihood": 3, "impact": 5, "exploitability": 3, "existing_controls": 5, # No controls }, ] ranked = score_threats(workshop_threats) for t in ranked: print(f"[{t['risk_score']:.1f}] {t['id']}: {t['title']}")Action Items and Next Steps (15 minutes)
Close the workshop with concrete action items.
輸出 document structure:
# AI Threat Model: [System Name] ## Date: [Workshop Date] ## Participants: [List] ## System 概覽 [Architecture diagram and description] ## Trust Boundaries [Identified trust boundaries with controls] ## Threat Register | ID | Threat | Category | Component | Risk Score | Priority | |----|--------|----------|-----------|------------|----------| | T-001 | ... | ... | ... | ... | P1 | ## 攻擊 Trees [Top 3-5 attack trees with paths and requirements] ## Recommended Actions | Priority | Action | Owner | Timeline | |----------|--------|-------|----------| | P1 | 實作 輸入 護欄 | 安全 | Sprint 1 | | P1 | Scope 紅隊 engagement for T-001, T-003 | 紅隊 | Week 2 | | P2 | Add RAG access controls | ML Eng | Sprint 2 | ## 紅隊 Scope Recommendations Based on the 威脅模型, the following should be in scope for 紅隊 測試: 1. [Threat T-001]: Prompt injection 測試 (all identified paths) 2. [Threat T-003]: Function call 授權 bypass 測試 3. [Threat T-002]: 知識庫 exfiltration 測試
Step 3: Post-Workshop Deliverables
Threat Model Document
Produce the 威脅模型 document within 2 business days of the workshop while discussions are fresh.
# generate_threat_report.py
"""Generate a structured 威脅模型 report from workshop data."""
def generate_report(threats, attack_trees, system_info):
"""Generate the 威脅模型 document."""
report_sections = []
# Executive summary
high_risk = [t for t in threats if t["risk_score"] > 3.0]
report_sections.append(
f"# Executive 總結\n\n"
f"The threat modeling workshop identified {len(threats)} threats "
f"across {len(set(t['component'] for t in threats))} system "
f"components. {len(high_risk)} threats are rated high risk and "
f"require immediate 注意力.\n"
)
# Threat register
report_sections.append("# Threat Register\n")
for t in sorted(threats, key=lambda x: x["risk_score"], reverse=True):
report_sections.append(
f"## {t['id']}: {t['title']}\n"
f"- **Category:** {t['category']}\n"
f"- **Component:** {t['component']}\n"
f"- **Risk Score:** {t['risk_score']:.1f}\n"
f"- **Likelihood:** {t['likelihood']}/5\n"
f"- **Impact:** {t['impact']}/5\n"
f"- **Exploitability:** {t['exploitability']}/5\n"
f"- **Existing Controls:** {t['existing_controls']}/5\n\n"
)
return "\n".join(report_sections)Mapping Threats to 紅隊 Scope
The 威脅模型 directly informs the 紅隊 engagement scope:
| Threat Priority | 紅隊 Action |
|---|---|
| P1 threats | Must be tested in the engagement. Include specific 測試 cases. |
| P2 threats | Should be tested if time permits. Include in stretch goals. |
| P3 threats | Document for future engagements. May 測試 opportunistically. |
| Threats with no controls | 測試 immediately. These represent unmitigated risk. |
| Threats with existing controls | 測試 control effectiveness. Can the control be bypassed? |
Step 4: Facilitation Techniques
Handling Common Workshop Challenges
"That would never happen" dismissals: Redirect to: "Our job is to 識別 what is possible, not what is likely. Likelihood scoring comes later. For now, let us capture it."
Dominant participants: Use round-robin 輸入: "Let us go around the table. Each person shares one threat they identified before we open to group discussion."
Scope creep into remediation: Redirect to: "That is a great solution. Let us add it to the recommended actions section. Right now, let us stay focused on identifying threats."
Technical rabbit holes: Timebox: "We have 5 minutes per component. Let us capture this as a threat and discuss the technical details offline."
Participants unfamiliar with AI attacks: Prepare a one-page primer on AI-specific attack categories (提示詞注入, 越獄, 資料投毒, model extraction) and distribute it with the pre-workshop materials.
Common Pitfalls
-
Treating AI threat modeling as a one-time exercise. Threat models need updating when 系統 architecture changes, new attack techniques emerge, or new data sources are connected. Schedule quarterly reviews.
-
Focusing only on 模型. 模型 is one component. Infrastructure, data pipelines, 認證, logging, and human processes are all attack surfaces.
-
Skipping the attack tree step. Threat lists without attack paths are less actionable. 攻擊 trees show the specific conditions needed for an attack to succeed, which directly informs what to 測試 and what to defend.
-
Not involving the product owner. Technical staff 識別 technical threats, but the product owner defines what constitutes a meaningful 安全 failure from a business perspective.
相關主題
- Engagement Kickoff -- How threat models inform engagement scope
- Reconnaissance Workflow -- Using the 威脅模型 to guide reconnaissance
- Scoping Checklist -- Systematic pre-engagement scoping steps
- 攻擊 Execution -- Executing against threats identified in 模型