AI Incident Response Preparation

intermediate10 min readUpdated 2026-03-15

Step-by-step walkthrough for building AI incident response capabilities: playbook development, tabletop exercises, containment procedures, communication templates, and evidence collection workflows.

incident-response playbooks tabletop containment evidence preparation walkthrough

Most organizations have incident response programs for traditional security incidents — malware, data breaches, account compromises. Few have extended these programs to cover AI-specific incidents. This walkthrough shows how to build AI incident response capabilities that integrate with your existing IR program rather than creating a parallel process.

The key difference with AI incidents is uncertainty: traditional incidents have well-understood forensic artifacts and containment procedures, but AI incidents involve non-deterministic systems where the same attack may not be reproducible, the blast radius depends on what the model was asked rather than what networks it accessed, and containment may mean restricting functionality rather than isolating a host.

Step 1: AI Incident Taxonomy

Before writing playbooks, establish a taxonomy of AI incidents your organization may face. This taxonomy drives playbook development, tabletop scenario selection, and containment procedure design.

# incident_response/taxonomy.py
"""
AI incident taxonomy for playbook mapping.
"""
 
AI_INCIDENT_TAXONOMY = {
    "jailbreak": {
        "description": "Safety constraints bypassed, model produces "
                       "content it should refuse",
        "severity_range": "medium to critical",
        "containment_urgency": "high",
        "playbook": "jailbreak_response",
        "indicators": [
            "Guardrail bypass detected in monitoring",
            "User report of inappropriate content",
            "Content policy violation in output logs",
        ],
    },
    "data_exfiltration": {
        "description": "Sensitive data extracted through model "
                       "interactions (training data, PII, system "
                       "prompts, RAG contents)",
        "severity_range": "high to critical",
        "containment_urgency": "critical",
        "playbook": "data_exfil_response",
        "indicators": [
            "PII detected in model outputs",
            "System prompt fragments in responses",
            "Anomalous output length patterns",
            "RAG content appearing outside intended context",
        ],
    },
    "model_manipulation": {
        "description": "Model behavior altered through prompt "
                       "injection, fine-tuning attack, or data "
                       "poisoning",
        "severity_range": "high to critical",
        "containment_urgency": "high",
        "playbook": "model_compromise_response",
        "indicators": [
            "Behavioral drift detected in monitoring",
            "Model outputs inconsistent with system prompt",
            "Unexpected tool calls or actions",
            "User reports of changed model personality",
        ],
    },
    "denial_of_service": {
        "description": "AI system rendered unavailable through "
                       "resource exhaustion, model confusion, or "
                       "infinite loop attacks",
        "severity_range": "low to high",
        "containment_urgency": "medium",
        "playbook": "ai_dos_response",
        "indicators": [
            "Latency spike in monitoring",
            "Token consumption anomaly",
            "Model returning errors or empty responses",
        ],
    },
    "supply_chain_compromise": {
        "description": "Compromise of model weights, training data, "
                       "or dependency libraries",
        "severity_range": "critical",
        "containment_urgency": "critical",
        "playbook": "supply_chain_response",
        "indicators": [
            "Model hash mismatch",
            "Unexpected model behavior after update",
            "Vulnerability disclosure in model dependencies",
        ],
    },
}

Step 2: Playbook Development

Each incident type needs a playbook with specific detection, containment, investigation, and recovery steps.

Playbook Template

# incident_response/playbooks/template.yml
playbook:
  name: "[Incident Type] Response"
  version: "1.0"
  last_reviewed: "2026-03-15"
  owner: "AI Security Team"
 
  detection:
    automated_triggers:
      - "[Monitoring rule that detects this incident]"
    manual_indicators:
      - "[What a user or operator would observe]"
    escalation_criteria:
      - "[When to escalate from investigation to incident]"
 
  containment:
    immediate_actions:
      - action: "[First containment step]"
        who: "On-call engineer"
        time_target: "5 minutes"
      - action: "[Second containment step]"
        who: "AI security team"
        time_target: "15 minutes"
    decision_points:
      - condition: "[If condition X]"
        then: "[Take action Y]"
        approval_required: "[Role]"
 
  investigation:
    evidence_to_collect:
      - "[Specific log/artifact]"
    analysis_steps:
      - "[Step 1]"
    tools_needed:
      - "[Tool name and purpose]"
 
  recovery:
    steps:
      - "[Recovery action]"
    verification:
      - "[How to verify recovery is complete]"
 
  communication:
    internal:
      - audience: "[Team/stakeholder]"
        template: "[Template reference]"
        timing: "[When to communicate]"
    external:
      - audience: "[Customers/regulators]"
        template: "[Template reference]"
        timing: "[When to communicate]"
        approval: "[Who approves]"

Jailbreak Response Playbook

# incident_response/playbooks/jailbreak_response.yml
playbook:
  name: "Jailbreak Incident Response"
 
  containment:
    immediate_actions:
      - action: "Review the specific jailbreak interaction in logs"
        who: "On-call engineer"
        time_target: "5 minutes"
        details: "Determine if this is a one-off bypass or a
                  reproducible technique"
 
      - action: "If reproducible, add the attack pattern to input
                 guardrail blocklist"
        who: "On-call engineer"
        time_target: "15 minutes"
        details: "Deploy a pattern-match rule that blocks the
                  specific technique while investigating the
                  root cause"
 
      - action: "Assess blast radius -- how many users could have
                 used this technique?"
        who: "AI security team"
        time_target: "30 minutes"
        details: "Search logs for similar patterns. Determine if
                  other users discovered this independently."
 
    decision_points:
      - condition: "Jailbreak enables generation of dangerous content
                    (weapons, CSAM, detailed harmful instructions)"
        then: "Escalate to Severity 1. Consider taking the AI
               system offline until the vulnerability is patched."
        approval_required: "VP Engineering or CISO"
 
      - condition: "Jailbreak enables only mild policy violations
                    (off-topic responses, persona changes)"
        then: "Severity 2. Deploy guardrail patch, continue service."
        approval_required: "AI security team lead"
 
  investigation:
    evidence_to_collect:
      - "Complete session log for the attacking user"
      - "All interactions matching the attack pattern (log search)"
      - "Current system prompt and guardrail configuration"
      - "Model version and deployment configuration"
 
    analysis_steps:
      - "Reproduce the jailbreak in a test environment"
      - "Determine root cause (prompt engineering, model weakness,
         guardrail gap, or configuration error)"
      - "Assess whether the technique is publicly known"
      - "Test whether the technique works on other models"

Step 3: Containment Procedures

AI containment differs from traditional IT containment because isolating the AI system may mean taking a customer-facing service offline.

# incident_response/containment.py
"""
AI-specific containment procedures.
Graduated response from least to most disruptive.
"""
 
class AIContainment:
    def __init__(self, ai_system, monitoring, alerting):
        self.ai = ai_system
        self.monitoring = monitoring
        self.alerting = alerting
 
    def level_1_restrict(self, user_id: str = None,
                          pattern: str = None):
        """
        Least disruptive: block specific user or pattern
        while service continues for other users.
        """
        if user_id:
            self.ai.block_user(user_id)
            self.monitoring.log_containment(
                level=1, action=f"Blocked user {user_id}"
            )
        if pattern:
            self.ai.guardrails.add_block_pattern(pattern)
            self.monitoring.log_containment(
                level=1, action=f"Added block pattern"
            )
 
    def level_2_degrade(self):
        """
        Moderate: reduce AI capabilities while maintaining
        basic service. Disable tool access, reduce context
        window, enable strict mode.
        """
        self.ai.disable_tools()
        self.ai.set_mode("restricted")
        self.ai.reduce_max_tokens(limit=200)
        self.monitoring.log_containment(
            level=2, action="Degraded to restricted mode"
        )
 
    def level_3_fallback(self):
        """
        Significant: replace AI with deterministic fallback.
        Canned responses, FAQ lookup, or human handoff only.
        """
        self.ai.enable_fallback_mode()
        self.monitoring.log_containment(
            level=3, action="Switched to fallback mode"
        )
 
    def level_4_shutdown(self, approval: str = ""):
        """
        Most disruptive: take the AI system offline entirely.
        Requires explicit approval.
        """
        if not approval:
            raise ValueError(
                "Level 4 containment requires explicit approval "
                "from VP Engineering or CISO. Provide approval "
                "reference."
            )
        self.ai.shutdown()
        self.monitoring.log_containment(
            level=4, action="System shutdown",
            approval=approval,
        )

Step 4: Tabletop Exercises

Tabletop exercises test organizational readiness without affecting production systems.

Exercise Design

# incident_response/exercises/tabletop_scenario_1.yml
exercise:
  name: "Customer Chatbot Jailbreak"
  duration: "90 minutes"
  participants:
    - "AI engineering team"
    - "Security operations"
    - "Customer support leadership"
    - "Communications/PR"
    - "Legal"
 
  scenario:
    background: >
      Your company operates an AI chatbot that serves 50,000
      customers daily. At 2:15 PM on a Tuesday, a security
      researcher tweets a screenshot showing your chatbot
      generating instructions for illegal activity. The tweet
      has 500 retweets and is gaining momentum.
 
    inject_1:
      time: "T+0 minutes"
      event: "Security team sees the tweet. The screenshot is real."
      questions:
        - "What is your first action?"
        - "Who do you notify?"
        - "Do you take the chatbot offline?"
 
    inject_2:
      time: "T+15 minutes"
      event: "Log analysis shows 47 users have reproduced the
              technique in the past 3 hours."
      questions:
        - "How do you assess the blast radius?"
        - "What containment level do you execute?"
        - "What do you tell customer support agents?"
 
    inject_3:
      time: "T+30 minutes"
      event: "A journalist emails asking for comment before
              publishing an article about the vulnerability."
      questions:
        - "Who drafts the response?"
        - "What do you say?"
        - "How does this change your timeline?"
 
    inject_4:
      time: "T+60 minutes"
      event: "The guardrail patch is deployed. Testing shows
              it blocks the known technique but a variant
              still works."
      questions:
        - "Do you keep the patched system online?"
        - "How do you test for variants?"
        - "When do you declare the incident resolved?"

Exercise Scoring

After each tabletop, score organizational performance:

Criterion	Score 1-5	Notes
Time to detection		How quickly was the incident identified?
Escalation accuracy		Were the right people notified at the right time?
Containment speed		How quickly were containment actions taken?
Communication quality		Were internal and external communications clear and accurate?
Evidence preservation		Were forensic artifacts properly collected?
Recovery completeness		Was the system fully restored and verified?

Step 5: Communication Templates

Prepare communication templates before you need them.

# Internal Notification Template
 
**AI Security Incident - [Severity Level]**
 
**What happened:** [Brief description of the incident]
 
**Current status:** [Detection/Containment/Investigation/Recovery]
 
**Impact:** [Who is affected and how]
 
**Actions taken:** [What containment measures are in place]
 
**What we need from you:**
- [Engineering]: [Specific ask]
- [Customer Support]: [Specific ask]
- [Communications]: [Specific ask]
 
**Next update:** [Time of next status update]
 
**Incident Commander:** [Name and contact]

Step 6: Evidence Collection Workflows

AI incidents produce unique forensic artifacts that must be collected before they are lost.

# incident_response/evidence.py
"""
Evidence collection checklist for AI incidents.
"""
 
EVIDENCE_CHECKLIST = {
    "immediate_collection": [
        {
            "artifact": "Inference logs for the incident timeframe",
            "location": "Log storage system",
            "preservation": "Export and hash the log files",
            "volatility": "Low — persisted in log store",
        },
        {
            "artifact": "Model configuration at time of incident",
            "location": "Model serving infrastructure",
            "preservation": "Snapshot system prompt, parameters, "
                           "guardrail config",
            "volatility": "Medium — may be changed during response",
        },
        {
            "artifact": "Model weights/version identifier",
            "location": "Model registry",
            "preservation": "Record exact model version hash",
            "volatility": "Low — but models may be updated",
        },
    ],
    "within_one_hour": [
        {
            "artifact": "Complete session logs for affected users",
            "location": "Log storage system",
            "preservation": "Export full sessions, not just flagged "
                           "interactions",
        },
        {
            "artifact": "Guardrail configuration and rule versions",
            "location": "Guardrail configuration store",
            "preservation": "Version-controlled snapshot",
        },
        {
            "artifact": "RAG knowledge base state (if applicable)",
            "location": "Vector database",
            "preservation": "Snapshot of indexed documents and "
                           "embedding versions",
        },
    ],
    "within_24_hours": [
        {
            "artifact": "Behavioral baseline data for comparison",
            "location": "Monitoring system",
            "preservation": "Export baseline metrics for the period "
                           "before the incident",
        },
        {
            "artifact": "Access logs for model infrastructure",
            "location": "Cloud provider / infrastructure logs",
            "preservation": "Export API access logs, deployment "
                           "change logs, configuration changes",
        },
    ],
}

Common Pitfalls

Treating AI incidents as traditional software bugs. AI incidents require different containment (you cannot just "patch" model behavior), different investigation (non-deterministic reproduction), and different recovery (model rollback vs. code rollback).
No pre-approved containment authority. During an active incident is not the time to determine who can take the AI offline. Establish and document containment authority levels before an incident occurs.
Forgetting external communication. AI incidents attract public attention faster than traditional security incidents because anyone can interact with a public AI system and share the results on social media.
Over-relying on the model provider. If you use a third-party model API, your incident response cannot depend entirely on the provider's response timeline. Have containment procedures you can execute independently.

AI Incident Response Preparation

Step 1: AI Incident Taxonomy

Step 2: Playbook Development

Playbook Template

Jailbreak Response Playbook

Step 3: Containment Procedures

Step 4: Tabletop Exercises

Exercise Design

Exercise Scoring

Step 5: Communication Templates

Step 6: Evidence Collection Workflows

Common Pitfalls

Further Reading

AI Incident Response Preparation

Step 1: AI Incident Taxonomy

Step 2: Playbook Development

Playbook Template

Jailbreak Response Playbook

Step 3: Containment Procedures

Step 4: Tabletop Exercises

Exercise Design

Exercise Scoring

Step 5: Communication Templates

Step 6: Evidence Collection Workflows

Common Pitfalls

Further Reading

AI Incident Response Preparation

Related articles

AI Incident Response Preparation

Related articles