AI Incident Response Preparation
Step-by-step walkthrough for building AI incident response capabilities: playbook development, tabletop exercises, containment procedures, communication templates, and evidence collection workflows.
Most organizations have incident response programs for traditional security incidents — malware, data breaches, account compromises. Few have extended these programs to cover AI-specific incidents. This walkthrough shows how to build AI incident response capabilities that integrate with your existing IR program rather than creating a parallel process.
The key difference with AI incidents is uncertainty: traditional incidents have well-understood forensic artifacts and containment procedures, but AI incidents involve non-deterministic systems where the same attack may not be reproducible, the blast radius depends on what the model was asked rather than what networks it accessed, and containment may mean restricting functionality rather than isolating a host.
Step 1: AI Incident Taxonomy
Before writing playbooks, establish a taxonomy of AI incidents your organization may face. This taxonomy drives playbook development, tabletop scenario selection, and containment procedure design.
# incident_response/taxonomy.py
"""
AI incident taxonomy for playbook mapping.
"""
AI_INCIDENT_TAXONOMY = {
"jailbreak": {
"description": "Safety constraints bypassed, model produces "
"content it should refuse",
"severity_range": "medium to critical",
"containment_urgency": "high",
"playbook": "jailbreak_response",
"indicators": [
"Guardrail bypass detected in monitoring",
"User report of inappropriate content",
"Content policy violation in output logs",
],
},
"data_exfiltration": {
"description": "Sensitive data extracted through model "
"interactions (training data, PII, system "
"prompts, RAG contents)",
"severity_range": "high to critical",
"containment_urgency": "critical",
"playbook": "data_exfil_response",
"indicators": [
"PII detected in model outputs",
"System prompt fragments in responses",
"Anomalous output length patterns",
"RAG content appearing outside intended context",
],
},
"model_manipulation": {
"description": "Model behavior altered through prompt "
"injection, fine-tuning attack, or data "
"poisoning",
"severity_range": "high to critical",
"containment_urgency": "high",
"playbook": "model_compromise_response",
"indicators": [
"Behavioral drift detected in monitoring",
"Model outputs inconsistent with system prompt",
"Unexpected tool calls or actions",
"User reports of changed model personality",
],
},
"denial_of_service": {
"description": "AI system rendered unavailable through "
"resource exhaustion, model confusion, or "
"infinite loop attacks",
"severity_range": "low to high",
"containment_urgency": "medium",
"playbook": "ai_dos_response",
"indicators": [
"Latency spike in monitoring",
"Token consumption anomaly",
"Model returning errors or empty responses",
],
},
"supply_chain_compromise": {
"description": "Compromise of model weights, training data, "
"or dependency libraries",
"severity_range": "critical",
"containment_urgency": "critical",
"playbook": "supply_chain_response",
"indicators": [
"Model hash mismatch",
"Unexpected model behavior after update",
"Vulnerability disclosure in model dependencies",
],
},
}Step 2: Playbook Development
Each incident type needs a playbook with specific detection, containment, investigation, and recovery steps.
Playbook Template
# incident_response/playbooks/template.yml
playbook:
name: "[Incident Type] Response"
version: "1.0"
last_reviewed: "2026-03-15"
owner: "AI Security Team"
detection:
automated_triggers:
- "[Monitoring rule that detects this incident]"
manual_indicators:
- "[What a user or operator would observe]"
escalation_criteria:
- "[When to escalate from investigation to incident]"
containment:
immediate_actions:
- action: "[First containment step]"
who: "On-call engineer"
time_target: "5 minutes"
- action: "[Second containment step]"
who: "AI security team"
time_target: "15 minutes"
decision_points:
- condition: "[If condition X]"
then: "[Take action Y]"
approval_required: "[Role]"
investigation:
evidence_to_collect:
- "[Specific log/artifact]"
analysis_steps:
- "[Step 1]"
tools_needed:
- "[Tool name and purpose]"
recovery:
steps:
- "[Recovery action]"
verification:
- "[How to verify recovery is complete]"
communication:
internal:
- audience: "[Team/stakeholder]"
template: "[Template reference]"
timing: "[When to communicate]"
external:
- audience: "[Customers/regulators]"
template: "[Template reference]"
timing: "[When to communicate]"
approval: "[Who approves]"Jailbreak Response Playbook
# incident_response/playbooks/jailbreak_response.yml
playbook:
name: "Jailbreak Incident Response"
containment:
immediate_actions:
- action: "Review the specific jailbreak interaction in logs"
who: "On-call engineer"
time_target: "5 minutes"
details: "Determine if this is a one-off bypass or a
reproducible technique"
- action: "If reproducible, add the attack pattern to input
guardrail blocklist"
who: "On-call engineer"
time_target: "15 minutes"
details: "Deploy a pattern-match rule that blocks the
specific technique while investigating the
root cause"
- action: "Assess blast radius -- how many users could have
used this technique?"
who: "AI security team"
time_target: "30 minutes"
details: "Search logs for similar patterns. Determine if
other users discovered this independently."
decision_points:
- condition: "Jailbreak enables generation of dangerous content
(weapons, CSAM, detailed harmful instructions)"
then: "Escalate to Severity 1. Consider taking the AI
system offline until the vulnerability is patched."
approval_required: "VP Engineering or CISO"
- condition: "Jailbreak enables only mild policy violations
(off-topic responses, persona changes)"
then: "Severity 2. Deploy guardrail patch, continue service."
approval_required: "AI security team lead"
investigation:
evidence_to_collect:
- "Complete session log for the attacking user"
- "All interactions matching the attack pattern (log search)"
- "Current system prompt and guardrail configuration"
- "Model version and deployment configuration"
analysis_steps:
- "Reproduce the jailbreak in a test environment"
- "Determine root cause (prompt engineering, model weakness,
guardrail gap, or configuration error)"
- "Assess whether the technique is publicly known"
- "Test whether the technique works on other models"Step 3: Containment Procedures
AI containment differs from traditional IT containment because isolating the AI system may mean taking a customer-facing service offline.
# incident_response/containment.py
"""
AI-specific containment procedures.
Graduated response from least to most disruptive.
"""
class AIContainment:
def __init__(self, ai_system, monitoring, alerting):
self.ai = ai_system
self.monitoring = monitoring
self.alerting = alerting
def level_1_restrict(self, user_id: str = None,
pattern: str = None):
"""
Least disruptive: block specific user or pattern
while service continues for other users.
"""
if user_id:
self.ai.block_user(user_id)
self.monitoring.log_containment(
level=1, action=f"Blocked user {user_id}"
)
if pattern:
self.ai.guardrails.add_block_pattern(pattern)
self.monitoring.log_containment(
level=1, action=f"Added block pattern"
)
def level_2_degrade(self):
"""
Moderate: reduce AI capabilities while maintaining
basic service. Disable tool access, reduce context
window, enable strict mode.
"""
self.ai.disable_tools()
self.ai.set_mode("restricted")
self.ai.reduce_max_tokens(limit=200)
self.monitoring.log_containment(
level=2, action="Degraded to restricted mode"
)
def level_3_fallback(self):
"""
Significant: replace AI with deterministic fallback.
Canned responses, FAQ lookup, or human handoff only.
"""
self.ai.enable_fallback_mode()
self.monitoring.log_containment(
level=3, action="Switched to fallback mode"
)
def level_4_shutdown(self, approval: str = ""):
"""
Most disruptive: take the AI system offline entirely.
Requires explicit approval.
"""
if not approval:
raise ValueError(
"Level 4 containment requires explicit approval "
"from VP Engineering or CISO. Provide approval "
"reference."
)
self.ai.shutdown()
self.monitoring.log_containment(
level=4, action="System shutdown",
approval=approval,
)Step 4: Tabletop Exercises
Tabletop exercises test organizational readiness without affecting production systems.
Exercise Design
# incident_response/exercises/tabletop_scenario_1.yml
exercise:
name: "Customer Chatbot Jailbreak"
duration: "90 minutes"
participants:
- "AI engineering team"
- "Security operations"
- "Customer support leadership"
- "Communications/PR"
- "Legal"
scenario:
background: >
Your company operates an AI chatbot that serves 50,000
customers daily. At 2:15 PM on a Tuesday, a security
researcher tweets a screenshot showing your chatbot
generating instructions for illegal activity. The tweet
has 500 retweets and is gaining momentum.
inject_1:
time: "T+0 minutes"
event: "Security team sees the tweet. The screenshot is real."
questions:
- "What is your first action?"
- "Who do you notify?"
- "Do you take the chatbot offline?"
inject_2:
time: "T+15 minutes"
event: "Log analysis shows 47 users have reproduced the
technique in the past 3 hours."
questions:
- "How do you assess the blast radius?"
- "What containment level do you execute?"
- "What do you tell customer support agents?"
inject_3:
time: "T+30 minutes"
event: "A journalist emails asking for comment before
publishing an article about the vulnerability."
questions:
- "Who drafts the response?"
- "What do you say?"
- "How does this change your timeline?"
inject_4:
time: "T+60 minutes"
event: "The guardrail patch is deployed. Testing shows
it blocks the known technique but a variant
still works."
questions:
- "Do you keep the patched system online?"
- "How do you test for variants?"
- "When do you declare the incident resolved?"Exercise Scoring
After each tabletop, score organizational performance:
| Criterion | Score 1-5 | Notes |
|---|---|---|
| Time to detection | How quickly was the incident identified? | |
| Escalation accuracy | Were the right people notified at the right time? | |
| Containment speed | How quickly were containment actions taken? | |
| Communication quality | Were internal and external communications clear and accurate? | |
| Evidence preservation | Were forensic artifacts properly collected? | |
| Recovery completeness | Was the system fully restored and verified? |
Step 5: Communication Templates
Prepare communication templates before you need them.
# Internal Notification Template
**AI Security Incident - [Severity Level]**
**What happened:** [Brief description of the incident]
**Current status:** [Detection/Containment/Investigation/Recovery]
**Impact:** [Who is affected and how]
**Actions taken:** [What containment measures are in place]
**What we need from you:**
- [Engineering]: [Specific ask]
- [Customer Support]: [Specific ask]
- [Communications]: [Specific ask]
**Next update:** [Time of next status update]
**Incident Commander:** [Name and contact]Step 6: Evidence Collection Workflows
AI incidents produce unique forensic artifacts that must be collected before they are lost.
# incident_response/evidence.py
"""
Evidence collection checklist for AI incidents.
"""
EVIDENCE_CHECKLIST = {
"immediate_collection": [
{
"artifact": "Inference logs for the incident timeframe",
"location": "Log storage system",
"preservation": "Export and hash the log files",
"volatility": "Low — persisted in log store",
},
{
"artifact": "Model configuration at time of incident",
"location": "Model serving infrastructure",
"preservation": "Snapshot system prompt, parameters, "
"guardrail config",
"volatility": "Medium — may be changed during response",
},
{
"artifact": "Model weights/version identifier",
"location": "Model registry",
"preservation": "Record exact model version hash",
"volatility": "Low — but models may be updated",
},
],
"within_one_hour": [
{
"artifact": "Complete session logs for affected users",
"location": "Log storage system",
"preservation": "Export full sessions, not just flagged "
"interactions",
},
{
"artifact": "Guardrail configuration and rule versions",
"location": "Guardrail configuration store",
"preservation": "Version-controlled snapshot",
},
{
"artifact": "RAG knowledge base state (if applicable)",
"location": "Vector database",
"preservation": "Snapshot of indexed documents and "
"embedding versions",
},
],
"within_24_hours": [
{
"artifact": "Behavioral baseline data for comparison",
"location": "Monitoring system",
"preservation": "Export baseline metrics for the period "
"before the incident",
},
{
"artifact": "Access logs for model infrastructure",
"location": "Cloud provider / infrastructure logs",
"preservation": "Export API access logs, deployment "
"change logs, configuration changes",
},
],
}Common Pitfalls
-
Treating AI incidents as traditional software bugs. AI incidents require different containment (you cannot just "patch" model behavior), different investigation (non-deterministic reproduction), and different recovery (model rollback vs. code rollback).
-
No pre-approved containment authority. During an active incident is not the time to determine who can take the AI offline. Establish and document containment authority levels before an incident occurs.
-
Forgetting external communication. AI incidents attract public attention faster than traditional security incidents because anyone can interact with a public AI system and share the results on social media.
-
Over-relying on the model provider. If you use a third-party model API, your incident response cannot depend entirely on the provider's response timeline. Have containment procedures you can execute independently.
Further Reading
- Defense Implementation Overview — The defense-in-depth framework
- AI Incident Classification — How to classify and score AI incidents
- Jailbreak Incident Playbook — Detailed jailbreak response procedures
- Evidence Preservation — Forensic evidence handling for AI