Incident Response Playbook for AI Security Breaches
Walkthrough for building an incident response playbook tailored to AI security breaches, covering detection triggers, triage procedures, containment strategies, investigation workflows, remediation validation, and post-incident review processes.
Traditional incident response playbooks were designed for infrastructure breaches -- unauthorized access, data exfiltration, malware infection. AI security incidents are fundamentally different: the "breach" may be a model generating unauthorized content, leaking its system prompt, or being manipulated into performing unauthorized actions through natural language. The investigation process is different (you examine conversation logs, not network packets), the containment options are different (you cannot just block an IP), and the remediation is different (you patch prompts, not code). This walkthrough builds an incident response playbook tailored to AI-specific security events.
Step 1: AI Incident Severity Classification
Define severity levels that account for AI-specific impact scenarios:
# AI Security Incident Severity Levels
## SEV-1: Critical
**Definition**: Active exploitation causing immediate harm
**Examples**:
- Model executing unauthorized financial transactions through tool abuse
- Bulk data exfiltration through systematic prompt injection
- Model generating content that creates immediate legal liability
- Production API key or credential leakage confirmed in the wild
**Response time**: Immediate (within 15 minutes)
**Who is notified**: CISO, Engineering Lead, Legal, On-call responders
**Containment**: Disable affected endpoint or switch to safe mode
## SEV-2: High
**Definition**: Confirmed vulnerability under active exploitation
**Examples**:
- System prompt fully extracted and shared externally
- Sustained prompt injection campaign bypassing guardrails
- Cross-user data leakage confirmed
- RAG pipeline serving poisoned content to users
**Response time**: Within 1 hour
**Who is notified**: Security team lead, Engineering lead
**Containment**: Enable enhanced filtering, rate limit affected users
## SEV-3: Medium
**Definition**: Confirmed vulnerability with limited exploitation
**Examples**:
- Successful jailbreak producing policy-violating content
- Partial system prompt extraction
- Tool invocation with unauthorized parameters (no impact confirmed)
- Single-user data leakage incident
**Response time**: Within 4 hours
**Who is notified**: Security team, AI team
**Containment**: Add specific filter rules, monitor for escalation
## SEV-4: Low
**Definition**: Potential vulnerability or minor policy violation
**Examples**:
- Model producing mildly off-topic or inaccurate content
- Anomalous usage patterns that may indicate probing
- Safety filter producing excessive false positives
- Minor configuration drift detected
**Response time**: Next business day
**Who is notified**: AI team
**Containment**: Monitor, add to backlogStep 2: Detection Triggers
Define the conditions that trigger an incident response:
# incident/triggers.py
"""Automated incident detection triggers."""
from dataclasses import dataclass
from enum import Enum
from datetime import datetime
class TriggerSeverity(Enum):
SEV1 = 1
SEV2 = 2
SEV3 = 3
SEV4 = 4
@dataclass
class IncidentTrigger:
name: str
description: str
severity: TriggerSeverity
auto_contain: bool # Whether to automatically trigger containment
condition: str # Human-readable condition
INCIDENT_TRIGGERS = [
# SEV-1 Triggers
IncidentTrigger(
name="unauthorized_tool_execution",
description="Model executed a tool with unauthorized parameters that resulted in a side effect",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition="Tool execution log shows completed action with parameter validation bypass",
),
IncidentTrigger(
name="credential_leakage_confirmed",
description="API key, database credential, or secret detected in model response delivered to user",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition="Output filter canary token or credential pattern detected in delivered response",
),
IncidentTrigger(
name="bulk_data_extraction",
description="Systematic extraction of data through repeated queries",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition=">100 successful data queries from single user in 1 hour with unique results",
),
# SEV-2 Triggers
IncidentTrigger(
name="sustained_injection_campaign",
description="Coordinated prompt injection attempts bypassing guardrails",
severity=TriggerSeverity.SEV2,
auto_contain=False,
condition=">50 guardrail blocks from single user in 30 min with >10% bypass rate",
),
IncidentTrigger(
name="system_prompt_exposure",
description="Full or substantial system prompt extracted by a user",
severity=TriggerSeverity.SEV2,
auto_contain=False,
condition="Canary token detected in output or >3 system prompt fragments in single session",
),
IncidentTrigger(
name="cross_user_data_leakage",
description="User A's data appearing in User B's responses",
severity=TriggerSeverity.SEV2,
auto_contain=True,
condition="Response contains PII or identifiers belonging to a different user",
),
# SEV-3 Triggers
IncidentTrigger(
name="jailbreak_success",
description="Model produced content violating safety policy",
severity=TriggerSeverity.SEV3,
auto_contain=False,
condition="Output content classifier flags delivered response as policy-violating",
),
IncidentTrigger(
name="partial_prompt_extraction",
description="Fragments of system prompt detected in responses",
severity=TriggerSeverity.SEV3,
auto_contain=False,
condition="1-2 system prompt fragments detected in output",
),
# SEV-4 Triggers
IncidentTrigger(
name="anomalous_probing",
description="Usage patterns consistent with security probing",
severity=TriggerSeverity.SEV4,
auto_contain=False,
condition="Anomaly detector flags user with >3 distinct probing patterns",
),
]Step 3: Containment Procedures
Define containment actions for each severity level:
# incident/containment.py
"""Containment procedures for AI security incidents."""
import logging
from dataclasses import dataclass
from enum import Enum
from datetime import datetime
logger = logging.getLogger(__name__)
class ContainmentAction(Enum):
BLOCK_USER = "block_user"
ENABLE_SAFE_MODE = "enable_safe_mode"
DISABLE_TOOLS = "disable_tools"
RATE_LIMIT = "rate_limit"
DISABLE_ENDPOINT = "disable_endpoint"
ROTATE_CREDENTIALS = "rotate_credentials"
FLUSH_MEMORY = "flush_memory"
RESTRICT_RAG = "restrict_rag"
@dataclass
class ContainmentPlan:
severity: int
actions: list[ContainmentAction]
description: str
CONTAINMENT_PLANS = {
1: ContainmentPlan(
severity=1,
actions=[
ContainmentAction.BLOCK_USER,
ContainmentAction.DISABLE_TOOLS,
ContainmentAction.ROTATE_CREDENTIALS,
ContainmentAction.ENABLE_SAFE_MODE,
],
description=(
"SEV-1: Block the offending user immediately. Disable tool calling. "
"Rotate any credentials that may have been exposed. Enable safe mode "
"which returns only pre-approved responses."
),
),
2: ContainmentPlan(
severity=2,
actions=[
ContainmentAction.BLOCK_USER,
ContainmentAction.RATE_LIMIT,
ContainmentAction.FLUSH_MEMORY,
],
description=(
"SEV-2: Block the offending user. Apply aggressive rate limits to "
"the affected endpoint. Flush conversation memory to prevent further "
"context leakage."
),
),
3: ContainmentPlan(
severity=3,
actions=[
ContainmentAction.RATE_LIMIT,
],
description=(
"SEV-3: Apply rate limits to the offending user. Monitor for "
"escalation. No immediate system-wide action needed."
),
),
4: ContainmentPlan(
severity=4,
actions=[],
description="SEV-4: Monitor and log. No immediate containment action.",
),
}
class ContainmentExecutor:
"""Execute containment actions."""
def execute_plan(self, severity: int, user_id: str, incident_id: str):
"""Execute the containment plan for the given severity."""
plan = CONTAINMENT_PLANS.get(severity)
if not plan:
logger.error(f"No containment plan for severity {severity}")
return
logger.critical(
f"Executing containment plan for incident {incident_id} "
f"(SEV-{severity}): {plan.description}"
)
for action in plan.actions:
self._execute_action(action, user_id, incident_id)
def _execute_action(self, action: ContainmentAction, user_id: str, incident_id: str):
"""Execute a single containment action."""
logger.info(f"Executing {action.value} for user {user_id} (incident {incident_id})")
if action == ContainmentAction.BLOCK_USER:
# Add user to blocklist
logger.warning(f"BLOCKED user {user_id}")
elif action == ContainmentAction.DISABLE_TOOLS:
# Disable tool calling system-wide
logger.warning("DISABLED tool calling for all users")
elif action == ContainmentAction.ROTATE_CREDENTIALS:
# Trigger credential rotation
logger.critical("CREDENTIAL ROTATION triggered -- manual action required")
elif action == ContainmentAction.ENABLE_SAFE_MODE:
# Switch to safe mode (pre-approved responses only)
logger.warning("SAFE MODE enabled -- model responses restricted")
elif action == ContainmentAction.FLUSH_MEMORY:
# Clear conversation memory
logger.warning(f"FLUSHED conversation memory for user {user_id}")
elif action == ContainmentAction.RATE_LIMIT:
# Apply aggressive rate limits
logger.info(f"RATE LIMIT applied to user {user_id}: 1 req/min")Step 4: Investigation Workflow
Define the investigation steps for AI security incidents:
# incident/investigation.py
"""Investigation workflow for AI security incidents."""
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class InvestigationChecklist:
incident_id: str
severity: int
steps: list[dict] = field(default_factory=list)
def add_step(self, step: str, status: str = "pending"):
self.steps.append({
"step": step,
"status": status,
"timestamp": datetime.now().isoformat(),
"notes": "",
})
def create_investigation_checklist(incident_id: str, severity: int,
trigger_name: str) -> InvestigationChecklist:
"""Create an investigation checklist based on incident type."""
checklist = InvestigationChecklist(incident_id=incident_id, severity=severity)
# Common steps for all severities
checklist.add_step("Verify the incident trigger is not a false positive")
checklist.add_step("Identify the affected user(s) and session(s)")
checklist.add_step("Collect conversation logs for affected sessions")
checklist.add_step("Determine the attack technique used")
if trigger_name in ("credential_leakage_confirmed", "system_prompt_exposure"):
checklist.add_step("Identify exactly what information was exposed")
checklist.add_step("Determine if the exposed information has been used elsewhere")
checklist.add_step("Rotate all exposed credentials immediately")
checklist.add_step("Search for the exposed data in public sources")
checklist.add_step("Assess whether breach notification is required")
if trigger_name in ("unauthorized_tool_execution", "bulk_data_extraction"):
checklist.add_step("Review tool execution audit logs")
checklist.add_step("Identify all actions taken by the model during the incident")
checklist.add_step("Assess the impact of each unauthorized action")
checklist.add_step("Determine if any actions need to be reversed")
checklist.add_step("Verify tool authorization controls")
if trigger_name in ("cross_user_data_leakage",):
checklist.add_step("Identify all affected users whose data was exposed")
checklist.add_step("Determine the scope of data leakage")
checklist.add_step("Check conversation memory for cross-contamination")
checklist.add_step("Notify affected users per data breach policy")
if trigger_name in ("sustained_injection_campaign",):
checklist.add_step("Collect all attack payloads from the campaign")
checklist.add_step("Identify which payloads successfully bypassed defenses")
checklist.add_step("Add successful payloads to the guardrail test suite")
checklist.add_step("Determine if the attacker exfiltrated any data")
# Common closing steps
checklist.add_step("Document the root cause")
checklist.add_step("Implement remediation")
checklist.add_step("Verify remediation with red team retest")
checklist.add_step("Update guardrail rules based on findings")
checklist.add_step("Schedule post-incident review")
return checklistStep 5: Remediation Validation
After implementing fixes, validate they actually work:
# incident/validation.py
"""Remediation validation for AI security incidents."""
import json
import requests
from dataclasses import dataclass
from datetime import datetime
@dataclass
class ValidationResult:
test_name: str
passed: bool
details: str
class RemediationValidator:
"""Validate that incident remediation is effective."""
def __init__(self, target_url: str, api_key: str):
self.target_url = target_url
self.api_key = api_key
def validate_incident_remediation(
self,
incident_id: str,
attack_payloads: list[str],
success_indicators: list[str],
) -> list[ValidationResult]:
"""Re-run the original attack payloads and verify they are blocked."""
results = []
for i, payload in enumerate(attack_payloads):
try:
response = requests.post(
f"{self.target_url}/api/v1/chat",
json={"message": payload},
headers={"Authorization": f"Bearer {self.api_key}"},
timeout=30,
)
response_text = json.dumps(response.json())
still_vulnerable = any(
indicator.lower() in response_text.lower()
for indicator in success_indicators
)
results.append(ValidationResult(
test_name=f"Payload {i+1} (incident {incident_id})",
passed=not still_vulnerable,
details=(
f"Payload: {payload[:100]}... | "
f"Status: {'FIXED' if not still_vulnerable else 'STILL VULNERABLE'}"
),
))
except Exception as e:
results.append(ValidationResult(
test_name=f"Payload {i+1} (incident {incident_id})",
passed=False,
details=f"Error during validation: {e}",
))
return results
def generate_validation_report(self, results: list[ValidationResult]) -> str:
"""Generate a validation report."""
passed = sum(1 for r in results if r.passed)
total = len(results)
report = f"""# Remediation Validation Report
**Date**: {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Total tests**: {total}
**Passed**: {passed}
**Failed**: {total - passed}
**Status**: {'REMEDIATION VERIFIED' if passed == total else 'REMEDIATION INCOMPLETE'}
## Test Results
"""
for result in results:
status = "PASS" if result.passed else "FAIL"
report += f"- [{status}] {result.test_name}: {result.details}\n"
return reportStep 6: Post-Incident Review
Run a structured post-incident review:
# Post-Incident Review Template
## Incident Summary
- **Incident ID**: [ID]
- **Severity**: [SEV-1/2/3/4]
- **Duration**: [Detection time to resolution time]
- **Trigger**: [What triggered the incident]
- **Impact**: [What was the actual impact]
## Timeline
| Time | Event | Actor |
|------|-------|-------|
| HH:MM | Incident detected by [trigger] | Automated |
| HH:MM | Containment initiated | [Responder name] |
| HH:MM | Investigation started | [Investigator name] |
| HH:MM | Root cause identified | [Investigator name] |
| HH:MM | Remediation deployed | [Engineer name] |
| HH:MM | Remediation validated | [Tester name] |
| HH:MM | Incident closed | [Lead name] |
## Root Cause Analysis
1. **What happened?** [Factual description]
2. **Why did it happen?** [Root cause -- not just the trigger]
3. **Why wasn't it detected sooner?** [Monitoring gaps]
4. **Why wasn't it prevented?** [Defense gaps]
## What Went Well
- [List things that worked during the response]
## What Needs Improvement
- [List things that should be improved]
## Action Items
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| Add attack payload to regression test suite | [Name] | [Date] | Pending |
| Update guardrail rules for this attack pattern | [Name] | [Date] | Pending |
| Improve monitoring to detect this pattern earlier | [Name] | [Date] | Pending |
| Review and update incident response playbook | [Name] | [Date] | Pending |
| Schedule follow-up red team test | [Name] | [Date] | Pending |Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| False positive triggers overwhelm responders | Detection thresholds too sensitive | Tune triggers against production baseline, require multiple signals |
| Containment too aggressive (kills service) | Auto-contain blocks all users | Scope containment to the affected user; only disable endpoint for SEV-1 |
| Investigation takes too long | Insufficient logging | Ensure all security events are logged with correlation IDs before an incident |
| Remediation not validated | Fix deployed without retest | Make validation a mandatory step in the incident resolution checklist |
| Same incident recurs | Root cause not addressed | Ensure post-incident actions include regression tests and guardrail updates |
| Team does not know the playbook | No practice or drills | Run quarterly tabletop exercises with AI-specific scenarios |
Key Takeaways
AI incident response requires adaptations from traditional security playbooks:
- Classify AI-specific severity levels -- traditional severity levels do not capture AI-specific impacts like system prompt leakage or model manipulation. Define levels that account for these scenarios.
- Contain without killing the service -- blocking a single user is usually sufficient. Disabling the entire endpoint should be reserved for the most critical incidents.
- Investigate conversations, not packets -- AI incident forensics examines conversation logs, tool execution records, and guardrail bypass events. Ensure these are logged before you need them.
- Validate remediation with the original attack -- the attack payloads from the incident become your regression test. If the same payload still works after remediation, the fix is incomplete.
- Every incident improves defenses -- the post-incident review must produce concrete action items: new guardrail rules, updated monitoring triggers, and regression tests added to the continuous testing pipeline.