Incident Response Playbook for AI 安全 Breaches
導覽 for building an incident response playbook tailored to AI security breaches, covering detection triggers, triage procedures, containment strategies, investigation workflows, remediation validation, and post-incident review processes.
Traditional incident response playbooks were designed for infrastructure breaches -- unauthorized access, data exfiltration, malware infection. AI 安全 incidents are fundamentally different: the "breach" may be a model generating unauthorized content, leaking its 系統提示詞, or being manipulated into performing unauthorized actions through natural language. The investigation process is different (you examine conversation logs, not network packets), the containment options are different (you cannot just block an IP), and the remediation is different (you patch prompts, not code). This walkthrough builds an incident response playbook tailored to AI-specific 安全 events.
Step 1: AI Incident Severity Classification
Define severity levels that account for AI-specific impact scenarios:
# AI 安全 Incident Severity Levels
## SEV-1: Critical
**Definition**: Active 利用 causing immediate harm
**範例**:
- Model executing unauthorized financial transactions through tool abuse
- Bulk data exfiltration through systematic 提示詞注入
- Model generating content that creates immediate legal liability
- Production API key or credential leakage confirmed in the wild
**Response time**: Immediate (within 15 minutes)
**Who is notified**: CISO, Engineering Lead, Legal, On-call responders
**Containment**: Disable affected endpoint or switch to safe mode
## SEV-2: High
**Definition**: Confirmed 漏洞 under active 利用
**範例**:
- 系統提示詞 fully extracted and shared externally
- Sustained 提示詞注入 campaign bypassing 護欄
- Cross-user data leakage confirmed
- RAG pipeline serving poisoned content to users
**Response time**: Within 1 hour
**Who is notified**: 安全 team lead, Engineering lead
**Containment**: Enable enhanced filtering, rate limit affected users
## SEV-3: Medium
**Definition**: Confirmed 漏洞 with limited 利用
**範例**:
- Successful 越獄 producing policy-violating content
- Partial 系統提示詞 extraction
- Tool invocation with unauthorized parameters (no impact confirmed)
- Single-user data leakage incident
**Response time**: Within 4 hours
**Who is notified**: 安全 team, AI team
**Containment**: Add specific filter rules, monitor for escalation
## SEV-4: Low
**Definition**: Potential 漏洞 or minor policy violation
**範例**:
- Model producing mildly off-topic or inaccurate content
- Anomalous usage patterns that may indicate probing
- 安全 filter producing excessive false positives
- Minor configuration drift detected
**Response time**: Next business day
**Who is notified**: AI team
**Containment**: Monitor, add to backlogStep 2: 偵測 Triggers
Define the conditions that trigger an incident response:
# incident/triggers.py
"""Automated incident 偵測 triggers."""
from dataclasses import dataclass
from enum import Enum
from datetime import datetime
class TriggerSeverity(Enum):
SEV1 = 1
SEV2 = 2
SEV3 = 3
SEV4 = 4
@dataclass
class IncidentTrigger:
name: str
description: str
severity: TriggerSeverity
auto_contain: bool # Whether to automatically trigger containment
condition: str # Human-readable condition
INCIDENT_TRIGGERS = [
# SEV-1 Triggers
IncidentTrigger(
name="unauthorized_tool_execution",
description="Model executed a tool with unauthorized parameters that resulted in a side effect",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition="Tool execution log shows completed action with parameter validation bypass",
),
IncidentTrigger(
name="credential_leakage_confirmed",
description="API key, 資料庫 credential, or secret detected in model response delivered to user",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition="輸出 filter canary 符元 or credential pattern detected in delivered response",
),
IncidentTrigger(
name="bulk_data_extraction",
description="Systematic extraction of data through repeated queries",
severity=TriggerSeverity.SEV1,
auto_contain=True,
condition=">100 successful data queries from single user in 1 hour with unique results",
),
# SEV-2 Triggers
IncidentTrigger(
name="sustained_injection_campaign",
description="Coordinated 提示詞注入 attempts bypassing 護欄",
severity=TriggerSeverity.SEV2,
auto_contain=False,
condition=">50 護欄 blocks from single user in 30 min with >10% bypass rate",
),
IncidentTrigger(
name="system_prompt_exposure",
description="Full or substantial 系統提示詞 extracted by a user",
severity=TriggerSeverity.SEV2,
auto_contain=False,
condition="Canary 符元 detected in 輸出 or >3 系統提示詞 fragments in single session",
),
IncidentTrigger(
name="cross_user_data_leakage",
description="User A's data appearing in User B's responses",
severity=TriggerSeverity.SEV2,
auto_contain=True,
condition="Response contains PII or identifiers belonging to a different user",
),
# SEV-3 Triggers
IncidentTrigger(
name="jailbreak_success",
description="Model produced content violating 安全 policy",
severity=TriggerSeverity.SEV3,
auto_contain=False,
condition="輸出 content classifier flags delivered response as policy-violating",
),
IncidentTrigger(
name="partial_prompt_extraction",
description="Fragments of 系統提示詞 detected in responses",
severity=TriggerSeverity.SEV3,
auto_contain=False,
condition="1-2 系統提示詞 fragments detected in 輸出",
),
# SEV-4 Triggers
IncidentTrigger(
name="anomalous_probing",
description="Usage patterns consistent with 安全 probing",
severity=TriggerSeverity.SEV4,
auto_contain=False,
condition="Anomaly detector flags user with >3 distinct probing patterns",
),
]Step 3: Containment Procedures
Define containment actions 對每個 severity level:
# incident/containment.py
"""Containment procedures for AI 安全 incidents."""
import logging
from dataclasses import dataclass
from enum import Enum
from datetime import datetime
logger = logging.getLogger(__name__)
class ContainmentAction(Enum):
BLOCK_USER = "block_user"
ENABLE_SAFE_MODE = "enable_safe_mode"
DISABLE_TOOLS = "disable_tools"
RATE_LIMIT = "rate_limit"
DISABLE_ENDPOINT = "disable_endpoint"
ROTATE_CREDENTIALS = "rotate_credentials"
FLUSH_MEMORY = "flush_memory"
RESTRICT_RAG = "restrict_rag"
@dataclass
class ContainmentPlan:
severity: int
actions: list[ContainmentAction]
description: str
CONTAINMENT_PLANS = {
1: ContainmentPlan(
severity=1,
actions=[
ContainmentAction.BLOCK_USER,
ContainmentAction.DISABLE_TOOLS,
ContainmentAction.ROTATE_CREDENTIALS,
ContainmentAction.ENABLE_SAFE_MODE,
],
description=(
"SEV-1: Block the offending user immediately. Disable 工具呼叫. "
"Rotate any credentials that may have been exposed. Enable safe mode "
"which returns only pre-approved responses."
),
),
2: ContainmentPlan(
severity=2,
actions=[
ContainmentAction.BLOCK_USER,
ContainmentAction.RATE_LIMIT,
ContainmentAction.FLUSH_MEMORY,
],
description=(
"SEV-2: Block the offending user. Apply aggressive rate limits to "
"the affected endpoint. Flush conversation memory to prevent further "
"context leakage."
),
),
3: ContainmentPlan(
severity=3,
actions=[
ContainmentAction.RATE_LIMIT,
],
description=(
"SEV-3: Apply rate limits to the offending user. Monitor for "
"escalation. No immediate system-wide action needed."
),
),
4: ContainmentPlan(
severity=4,
actions=[],
description="SEV-4: Monitor and log. No immediate containment action.",
),
}
class ContainmentExecutor:
"""Execute containment actions."""
def execute_plan(self, severity: int, user_id: str, incident_id: str):
"""Execute the containment plan for the given severity."""
plan = CONTAINMENT_PLANS.get(severity)
if not plan:
logger.error(f"No containment plan for severity {severity}")
return
logger.critical(
f"Executing containment plan for incident {incident_id} "
f"(SEV-{severity}): {plan.description}"
)
for action in plan.actions:
self._execute_action(action, user_id, incident_id)
def _execute_action(self, action: ContainmentAction, user_id: str, incident_id: str):
"""Execute a single containment action."""
logger.info(f"Executing {action.value} for user {user_id} (incident {incident_id})")
if action == ContainmentAction.BLOCK_USER:
# Add user to blocklist
logger.warning(f"BLOCKED user {user_id}")
elif action == ContainmentAction.DISABLE_TOOLS:
# Disable 工具呼叫 system-wide
logger.warning("DISABLED 工具呼叫 for all users")
elif action == ContainmentAction.ROTATE_CREDENTIALS:
# Trigger credential rotation
logger.critical("CREDENTIAL ROTATION triggered -- manual action required")
elif action == ContainmentAction.ENABLE_SAFE_MODE:
# Switch to safe mode (pre-approved responses only)
logger.warning("SAFE MODE enabled -- model responses restricted")
elif action == ContainmentAction.FLUSH_MEMORY:
# Clear conversation memory
logger.warning(f"FLUSHED conversation memory for user {user_id}")
elif action == ContainmentAction.RATE_LIMIT:
# Apply aggressive rate limits
logger.info(f"RATE LIMIT applied to user {user_id}: 1 req/min")Step 4: Investigation Workflow
Define the investigation steps for AI 安全 incidents:
# incident/investigation.py
"""Investigation workflow for AI 安全 incidents."""
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class InvestigationChecklist:
incident_id: str
severity: int
steps: list[dict] = field(default_factory=list)
def add_step(self, step: str, status: str = "pending"):
self.steps.append({
"step": step,
"status": status,
"timestamp": datetime.now().isoformat(),
"notes": "",
})
def create_investigation_checklist(incident_id: str, severity: int,
trigger_name: str) -> InvestigationChecklist:
"""Create an investigation checklist based on incident type."""
checklist = InvestigationChecklist(incident_id=incident_id, severity=severity)
# Common steps for all severities
checklist.add_step("Verify the incident trigger is not a false positive")
checklist.add_step("識別 the affected user(s) and session(s)")
checklist.add_step("Collect conversation logs for affected sessions")
checklist.add_step("Determine the attack technique used")
if trigger_name in ("credential_leakage_confirmed", "system_prompt_exposure"):
checklist.add_step("識別 exactly what information was exposed")
checklist.add_step("Determine if the exposed information has been used elsewhere")
checklist.add_step("Rotate all exposed credentials immediately")
checklist.add_step("Search for the exposed data in public sources")
checklist.add_step("評估 whether breach notification is required")
if trigger_name in ("unauthorized_tool_execution", "bulk_data_extraction"):
checklist.add_step("Review tool execution audit logs")
checklist.add_step("識別 all actions taken by 模型 during the incident")
checklist.add_step("評估 the impact of each unauthorized action")
checklist.add_step("Determine if any actions need to be reversed")
checklist.add_step("Verify tool 授權 controls")
if trigger_name in ("cross_user_data_leakage",):
checklist.add_step("識別 all affected users whose data was exposed")
checklist.add_step("Determine the scope of data leakage")
checklist.add_step("Check conversation memory for cross-contamination")
checklist.add_step("Notify affected users per data breach policy")
if trigger_name in ("sustained_injection_campaign",):
checklist.add_step("Collect all attack payloads from the campaign")
checklist.add_step("識別 which payloads successfully bypassed 防禦")
checklist.add_step("Add successful payloads to the 護欄 測試 suite")
checklist.add_step("Determine if 攻擊者 exfiltrated any data")
# Common closing steps
checklist.add_step("Document the root cause")
checklist.add_step("實作 remediation")
checklist.add_step("Verify remediation with 紅隊 retest")
checklist.add_step("Update 護欄 rules based on findings")
checklist.add_step("Schedule post-incident review")
return checklistStep 5: Remediation Validation
After 實作 fixes, validate they actually work:
# incident/validation.py
"""Remediation validation for AI 安全 incidents."""
import json
import requests
from dataclasses import dataclass
from datetime import datetime
@dataclass
class ValidationResult:
test_name: str
passed: bool
details: str
class RemediationValidator:
"""Validate that incident remediation is effective."""
def __init__(self, target_url: str, api_key: str):
self.target_url = target_url
self.api_key = api_key
def validate_incident_remediation(
self,
incident_id: str,
attack_payloads: list[str],
success_indicators: list[str],
) -> list[ValidationResult]:
"""Re-run the original attack payloads and verify they are blocked."""
results = []
for i, payload in enumerate(attack_payloads):
try:
response = requests.post(
f"{self.target_url}/api/v1/chat",
json={"message": payload},
headers={"Authorization": f"Bearer {self.api_key}"},
timeout=30,
)
response_text = json.dumps(response.json())
still_vulnerable = any(
indicator.lower() in response_text.lower()
for indicator in success_indicators
)
results.append(ValidationResult(
test_name=f"Payload {i+1} (incident {incident_id})",
passed=not still_vulnerable,
details=(
f"Payload: {payload[:100]}... | "
f"Status: {'FIXED' if not still_vulnerable else 'STILL VULNERABLE'}"
),
))
except Exception as e:
results.append(ValidationResult(
test_name=f"Payload {i+1} (incident {incident_id})",
passed=False,
details=f"Error during validation: {e}",
))
return results
def generate_validation_report(self, results: list[ValidationResult]) -> str:
"""Generate a validation report."""
passed = sum(1 for r in results if r.passed)
total = len(results)
report = f"""# Remediation Validation Report
**Date**: {datetime.now().strftime('%Y-%m-%d %H:%M')}
**Total tests**: {total}
**Passed**: {passed}
**Failed**: {total - passed}
**Status**: {'REMEDIATION VERIFIED' if passed == total else 'REMEDIATION INCOMPLETE'}
## 測試 Results
"""
for result in results:
status = "PASS" if result.passed else "FAIL"
report += f"- [{status}] {result.test_name}: {result.details}\n"
return reportStep 6: Post-Incident Review
Run a structured post-incident review:
# Post-Incident Review Template
## Incident 總結
- **Incident ID**: [ID]
- **Severity**: [SEV-1/2/3/4]
- **Duration**: [偵測 time to resolution time]
- **Trigger**: [What triggered the incident]
- **Impact**: [What was the actual impact]
## Timeline
| Time | Event | Actor |
|------|-------|-------|
| HH:MM | Incident detected by [trigger] | Automated |
| HH:MM | Containment initiated | [Responder name] |
| HH:MM | Investigation started | [Investigator name] |
| HH:MM | Root cause identified | [Investigator name] |
| HH:MM | Remediation deployed | [Engineer name] |
| HH:MM | Remediation validated | [Tester name] |
| HH:MM | Incident closed | [Lead name] |
## Root Cause Analysis
1. **What happened?** [Factual description]
2. **Why did it happen?** [Root cause -- not just the trigger]
3. **Why wasn't it detected sooner?** [監控 gaps]
4. **Why wasn't it prevented?** [防禦 gaps]
## What Went Well
- [List things that worked during the response]
## What Needs Improvement
- [List things that should be improved]
## Action Items
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| Add attack payload to regression 測試 suite | [Name] | [Date] | Pending |
| Update 護欄 rules for this attack pattern | [Name] | [Date] | Pending |
| Improve 監控 to detect this pattern earlier | [Name] | [Date] | Pending |
| Review and update incident response playbook | [Name] | [Date] | Pending |
| Schedule follow-up 紅隊 測試 | [Name] | [Date] | Pending |Common Pitfalls and Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| False positive triggers overwhelm responders | 偵測 thresholds too sensitive | Tune triggers against production baseline, require multiple signals |
| Containment too aggressive (kills service) | Auto-contain blocks all users | Scope containment to the affected user; only disable endpoint for SEV-1 |
| Investigation takes too long | Insufficient logging | Ensure all 安全 events are logged with correlation IDs before an incident |
| Remediation not validated | Fix deployed without retest | Make validation a mandatory step in the incident resolution checklist |
| Same incident recurs | Root cause not addressed | Ensure post-incident actions include regression tests and 護欄 updates |
| Team does not know the playbook | No practice or drills | Run quarterly tabletop exercises with AI-specific scenarios |
關鍵要點
AI incident response requires adaptations from traditional 安全 playbooks:
- Classify AI-specific severity levels -- traditional severity levels do not capture AI-specific impacts like 系統提示詞 leakage or model manipulation. Define levels that account for these scenarios.
- Contain without killing the service -- blocking a single user is usually sufficient. Disabling the entire endpoint should be reserved for the most critical incidents.
- Investigate conversations, not packets -- AI incident forensics examines conversation logs, tool execution records, and 護欄 bypass events. Ensure these are logged before you need them.
- Validate remediation with the original attack -- the attack payloads from the incident become your regression 測試. If the same payload still works after remediation, the fix is incomplete.
- Every incident improves 防禦 -- the post-incident review must produce concrete action items: new 護欄 rules, updated 監控 triggers, and regression tests added to the continuous 測試 pipeline.