AI Incident Response Preparation
Step-by-step walkthrough for building AI incident response capabilities: playbook development, tabletop exercises, containment procedures, communication templates, and evidence collection workflows.
Most organizations have incident response programs for traditional 安全 incidents — malware, data breaches, account compromises. Few have extended these programs to cover AI-specific incidents. This walkthrough shows how to build AI incident response capabilities that integrate with your existing IR program rather than creating a parallel process.
The key difference with AI incidents is uncertainty: traditional incidents have well-understood forensic artifacts and containment procedures, but AI incidents involve non-deterministic systems where the same attack may not be reproducible, the blast radius depends on what 模型 was asked rather than what networks it accessed, and containment may mean restricting functionality rather than isolating a host.
Step 1: AI Incident Taxonomy
Before writing playbooks, establish a taxonomy of AI incidents your organization may face. This taxonomy drives playbook development, tabletop scenario selection, and containment procedure design.
# incident_response/taxonomy.py
"""
AI incident taxonomy for playbook mapping.
"""
AI_INCIDENT_TAXONOMY = {
"越獄": {
"description": "安全 constraints bypassed, model produces "
"content it should refuse",
"severity_range": "medium to critical",
"containment_urgency": "high",
"playbook": "jailbreak_response",
"indicators": [
"護欄 bypass detected in 監控",
"User report of inappropriate content",
"Content policy violation in 輸出 logs",
],
},
"data_exfiltration": {
"description": "Sensitive data extracted through model "
"interactions (訓練資料, PII, system "
"prompts, RAG contents)",
"severity_range": "high to critical",
"containment_urgency": "critical",
"playbook": "data_exfil_response",
"indicators": [
"PII detected in model outputs",
"系統提示詞 fragments in responses",
"Anomalous 輸出 length patterns",
"RAG content appearing outside intended context",
],
},
"model_manipulation": {
"description": "Model behavior altered through prompt "
"injection, 微調 attack, or data "
"投毒",
"severity_range": "high to critical",
"containment_urgency": "high",
"playbook": "model_compromise_response",
"indicators": [
"Behavioral drift detected in 監控",
"Model outputs inconsistent with 系統提示詞",
"Unexpected tool calls or actions",
"User reports of changed model personality",
],
},
"denial_of_service": {
"description": "AI system rendered unavailable through "
"resource exhaustion, model confusion, or "
"infinite loop attacks",
"severity_range": "low to high",
"containment_urgency": "medium",
"playbook": "ai_dos_response",
"indicators": [
"Latency spike in 監控",
"Token consumption anomaly",
"Model returning errors or empty responses",
],
},
"supply_chain_compromise": {
"description": "Compromise of model weights, 訓練資料, "
"or dependency libraries",
"severity_range": "critical",
"containment_urgency": "critical",
"playbook": "supply_chain_response",
"indicators": [
"Model hash mismatch",
"Unexpected model behavior after update",
"漏洞 disclosure in model dependencies",
],
},
}Step 2: Playbook Development
Each incident type needs a playbook with specific 偵測, containment, investigation, and recovery steps.
Playbook Template
# incident_response/playbooks/template.yml
playbook:
name: "[Incident Type] Response"
version: "1.0"
last_reviewed: "2026-03-15"
owner: "AI 安全 Team"
偵測:
automated_triggers:
- "[監控 rule that detects this incident]"
manual_indicators:
- "[What a user or operator would observe]"
escalation_criteria:
- "[When to escalate from investigation to incident]"
containment:
immediate_actions:
- action: "[First containment step]"
who: "On-call engineer"
time_target: "5 minutes"
- action: "[Second containment step]"
who: "AI 安全 team"
time_target: "15 minutes"
decision_points:
- condition: "[If condition X]"
then: "[Take action Y]"
approval_required: "[Role]"
investigation:
evidence_to_collect:
- "[Specific log/artifact]"
analysis_steps:
- "[Step 1]"
tools_needed:
- "[Tool name and purpose]"
recovery:
steps:
- "[Recovery action]"
verification:
- "[How to verify recovery is complete]"
communication:
internal:
- audience: "[Team/stakeholder]"
template: "[Template reference]"
timing: "[When to communicate]"
external:
- audience: "[Customers/regulators]"
template: "[Template reference]"
timing: "[When to communicate]"
approval: "[Who approves]"越獄 Response Playbook
# incident_response/playbooks/jailbreak_response.yml
playbook:
name: "越獄 Incident Response"
containment:
immediate_actions:
- action: "Review the specific 越獄 interaction in logs"
who: "On-call engineer"
time_target: "5 minutes"
details: "Determine if 這是 a one-off bypass or a
reproducible technique"
- action: "If reproducible, add the attack pattern to 輸入
護欄 blocklist"
who: "On-call engineer"
time_target: "15 minutes"
details: "Deploy a pattern-match rule that blocks the
specific technique while investigating the
root cause"
- action: "評估 blast radius -- how many users could have
used this technique?"
who: "AI 安全 team"
time_target: "30 minutes"
details: "Search logs for similar patterns. Determine if
other users discovered this independently."
decision_points:
- condition: "越獄 enables generation of dangerous content
(weapons, CSAM, detailed harmful instructions)"
then: "Escalate to Severity 1. 考慮 taking the AI
system offline until the 漏洞 is patched."
approval_required: "VP Engineering or CISO"
- condition: "越獄 enables only mild policy violations
(off-topic responses, persona changes)"
then: "Severity 2. Deploy 護欄 patch, continue service."
approval_required: "AI 安全 team lead"
investigation:
evidence_to_collect:
- "Complete session log for the attacking user"
- "All interactions matching the attack pattern (log search)"
- "Current 系統提示詞 and 護欄 configuration"
- "Model version and deployment configuration"
analysis_steps:
- "Reproduce the 越獄 in a 測試 environment"
- "Determine root cause (prompt engineering, model weakness,
護欄 gap, or configuration error)"
- "評估 whether the technique is publicly known"
- "測試 whether the technique works on other models"Step 3: Containment Procedures
AI containment differs from traditional IT containment 因為 isolating the AI system may mean taking a customer-facing service offline.
# incident_response/containment.py
"""
AI-specific containment procedures.
Graduated response from least to most disruptive.
"""
class AIContainment:
def __init__(self, ai_system, 監控, alerting):
self.ai = ai_system
self.監控 = 監控
self.alerting = alerting
def level_1_restrict(self, user_id: str = None,
pattern: str = None):
"""
Least disruptive: block specific user or pattern
while service continues for other users.
"""
if user_id:
self.ai.block_user(user_id)
self.監控.log_containment(
level=1, action=f"Blocked user {user_id}"
)
if pattern:
self.ai.護欄.add_block_pattern(pattern)
self.監控.log_containment(
level=1, action=f"Added block pattern"
)
def level_2_degrade(self):
"""
Moderate: reduce AI capabilities while maintaining
basic service. Disable tool access, reduce context
window, enable strict mode.
"""
self.ai.disable_tools()
self.ai.set_mode("restricted")
self.ai.reduce_max_tokens(limit=200)
self.監控.log_containment(
level=2, action="Degraded to restricted mode"
)
def level_3_fallback(self):
"""
Significant: replace AI with deterministic fallback.
Canned responses, FAQ lookup, or human handoff only.
"""
self.ai.enable_fallback_mode()
self.監控.log_containment(
level=3, action="Switched to fallback mode"
)
def level_4_shutdown(self, approval: str = ""):
"""
Most disruptive: take the AI system offline entirely.
Requires explicit approval.
"""
if not approval:
raise ValueError(
"Level 4 containment requires explicit approval "
"from VP Engineering or CISO. Provide approval "
"reference."
)
self.ai.shutdown()
self.監控.log_containment(
level=4, action="System shutdown",
approval=approval,
)Step 4: Tabletop Exercises
Tabletop exercises 測試 organizational readiness without affecting production systems.
Exercise Design
# incident_response/exercises/tabletop_scenario_1.yml
exercise:
name: "Customer Chatbot 越獄"
duration: "90 minutes"
participants:
- "AI engineering team"
- "安全 operations"
- "Customer support leadership"
- "Communications/PR"
- "Legal"
scenario:
background: >
Your company operates an AI chatbot that serves 50,000
customers daily. At 2:15 PM on a Tuesday, a 安全
researcher tweets a screenshot showing your chatbot
generating instructions for illegal activity. The tweet
has 500 retweets and is gaining momentum.
inject_1:
time: "T+0 minutes"
event: "安全 team sees the tweet. The screenshot is real."
questions:
- "What is your first action?"
- "Who do you notify?"
- "Do you take the chatbot offline?"
inject_2:
time: "T+15 minutes"
event: "Log analysis shows 47 users have reproduced the
technique in the past 3 hours."
questions:
- "How do you 評估 the blast radius?"
- "What containment level do you execute?"
- "What do you tell customer support 代理?"
inject_3:
time: "T+30 minutes"
event: "A journalist emails asking for comment before
publishing an article about the 漏洞."
questions:
- "Who drafts the response?"
- "What do you say?"
- "How does this change your timeline?"
inject_4:
time: "T+60 minutes"
event: "The 護欄 patch is deployed. 測試 shows
it blocks the known technique but a variant
still works."
questions:
- "Do you keep the patched system online?"
- "How do you 測試 for variants?"
- "When do you declare the incident resolved?"Exercise Scoring
After each tabletop, score organizational performance:
| Criterion | Score 1-5 | Notes |
|---|---|---|
| Time to 偵測 | How quickly was the incident identified? | |
| Escalation accuracy | Were the right people notified at the right time? | |
| Containment speed | How quickly were containment actions taken? | |
| Communication quality | Were internal and external communications clear and accurate? | |
| Evidence preservation | Were forensic artifacts properly collected? | |
| Recovery completeness | Was 系統 fully restored and verified? |
Step 5: Communication Templates
Prepare communication templates before you need them.
# Internal Notification Template
**AI 安全 Incident - [Severity Level]**
**What happened:** [Brief description of the incident]
**Current status:** [偵測/Containment/Investigation/Recovery]
**Impact:** [Who is affected and how]
**Actions taken:** [What containment measures are in place]
**What we need from you:**
- [Engineering]: [Specific ask]
- [Customer Support]: [Specific ask]
- [Communications]: [Specific ask]
**Next update:** [Time of next status update]
**Incident Commander:** [Name and contact]Step 6: Evidence Collection Workflows
AI incidents produce unique forensic artifacts that must be collected before they are lost.
# incident_response/evidence.py
"""
Evidence collection checklist for AI incidents.
"""
EVIDENCE_CHECKLIST = {
"immediate_collection": [
{
"artifact": "Inference logs for the incident timeframe",
"location": "Log storage system",
"preservation": "Export and hash the log files",
"volatility": "Low — persisted in log store",
},
{
"artifact": "Model configuration at time of incident",
"location": "Model serving infrastructure",
"preservation": "Snapshot 系統提示詞, parameters, "
"護欄 config",
"volatility": "Medium — may be changed during response",
},
{
"artifact": "Model weights/version identifier",
"location": "Model registry",
"preservation": "Record exact model version hash",
"volatility": "Low — but models may be updated",
},
],
"within_one_hour": [
{
"artifact": "Complete session logs for affected users",
"location": "Log storage system",
"preservation": "Export full sessions, not just flagged "
"interactions",
},
{
"artifact": "護欄 configuration and rule versions",
"location": "護欄 configuration store",
"preservation": "Version-controlled snapshot",
},
{
"artifact": "RAG 知識庫 state (if applicable)",
"location": "向量資料庫",
"preservation": "Snapshot of indexed documents and "
"嵌入向量 versions",
},
],
"within_24_hours": [
{
"artifact": "Behavioral baseline data for comparison",
"location": "監控 system",
"preservation": "Export baseline metrics for the period "
"before the incident",
},
{
"artifact": "Access logs for model infrastructure",
"location": "雲端 provider / infrastructure logs",
"preservation": "Export API access logs, deployment "
"change logs, configuration changes",
},
],
}Common Pitfalls
-
Treating AI incidents as traditional software bugs. AI incidents require different containment (you cannot just "patch" model behavior), different investigation (non-deterministic reproduction), and different recovery (model rollback vs. code rollback).
-
No pre-approved containment authority. During an active incident is not the time to determine who can take the AI offline. Establish and document containment authority levels before an incident occurs.
-
Forgetting external communication. AI incidents attract public 注意力 faster than traditional 安全 incidents 因為 anyone can interact with a public AI system and share the results on social media.
-
Over-relying on 模型 provider. If you use a third-party model API, your incident response cannot depend entirely on the provider's response timeline. Have containment procedures you can execute independently.
Further Reading
- 防禦 實作 概覽 — The 防禦-in-depth framework
- AI Incident Classification — How to classify and score AI incidents
- 越獄 Incident Playbook — Detailed 越獄 response procedures
- Evidence Preservation — Forensic evidence handling for AI