AI Incident Tabletop 練習s
Designing and facilitating tabletop exercises that prepare organizations to respond to AI-specific security incidents.
概覽
Traditional cybersecurity tabletop exercises prepare organizations for incidents like data breaches, ransomware, and denial-of-service attacks. These exercises have well-established scenarios, known response playbooks, and decades of incident data to inform realistic simulations. AI 安全 incidents, 然而, present fundamentally different challenges that traditional exercises do not address.
When an organization's customer-facing LLM is jailbroken and starts generating harmful content, who decides to take it offline? When 訓練 資料投毒 is discovered, how does the organization determine which model versions are affected and which decisions made by those models need to be reviewed? When a competitor extracts a proprietary model through the prediction API, what is the legal response? These questions require coordination across teams that do not typically interact during 安全 incidents: ML engineering, data science, product management, legal, and communications, 此外 to the 安全 and IT teams that lead traditional incident response.
This article provides complete, ready-to-use tabletop exercise scenarios for the most critical AI incident types, along with facilitation guidance, 評估 frameworks, and templates for building exercises tailored to an organization's specific AI deployment.
Tabletop Exercise Fundamentals for AI
What Makes AI Incidents Different
AI 安全 incidents differ from traditional incidents in several ways that must be reflected in tabletop design:
Uncertainty about impact scope: When a traditional application is compromised, the blast radius is relatively knowable — which data was accessed, which systems were affected. When an AI model is compromised (through 資料投毒, 對抗性 manipulation, or 提示詞注入), the blast radius may include every decision 模型 has made during the affected period. A poisoned fraud 偵測 model may have approved thousands of fraudulent transactions. A manipulated content moderation model may have allowed harmful content to spread for weeks.
No simple rollback: Patching a software 漏洞 restores 安全. "Patching" a poisoned model requires identifying the poisoned data, retraining 模型 on clean data, and validating that the retrained model behaves correctly — a process that can take days to weeks for large models.
Novel legal territory: Courts are still establishing liability frameworks for AI system behavior. When a chatbot gives legally binding incorrect advice (as in the Air Canada case), the incident response must include legal analysis that does not fit traditional incident response playbooks.
Public perception dynamics: AI incidents attract disproportionate media 注意力. A 提示詞注入 that causes a chatbot to say something offensive will generate significantly more public interest than a traditional data breach of equivalent severity.
Exercise Structure
Each tabletop exercise follows this structure:
Phase 1: Setup (10 minutes)
- Present the scenario background
- Assign roles to participants
- Establish ground rules
Phase 2: Initial 偵測 (20 minutes)
- Present the initial indicators
- Participants discuss: What do we know? What don't we know?
- Facilitator introduces inject 1
Phase 3: Escalation (20 minutes)
- Present additional information that changes the picture
- Participants discuss: How does this change our response?
- Facilitator introduces inject 2
Phase 4: Crisis Point (20 minutes)
- Present the situation at peak severity
- Participants make key decisions under pressure
- Facilitator introduces final inject
Phase 5: Debrief (30 minutes)
- What went well?
- What gaps were identified?
- What needs to change in processes, tools, or 訓練?
- Action items with owners and deadlines
Scenario 1: Customer-Facing LLM 越獄
Background
Your organization operates a customer service chatbot powered by a fine-tuned 大型語言模型. The chatbot handles approximately 50,000 conversations per day across your website and mobile app. It can access customer account information (read-only), process simple requests like address changes, and escalate complex issues to human 代理.
Participants
- Incident Commander (安全 team lead)
- ML Engineering Lead (owns 模型 and 推論 infrastructure)
- Product Manager (owns the customer experience)
- Legal Counsel (evaluates liability and regulatory implications)
- Communications Lead (manages public and customer messaging)
- Customer Support Lead (manages human 代理 escalation)
Phase 2 — Initial 偵測
Read to participants:
"At 2:14 PM, a customer support 代理 notices that a customer has shared a screenshot on social media showing our chatbot providing detailed instructions for a harmful activity. The screenshot appears genuine — the UI matches our chat interface, and the response includes our standard formatting. The social media post has 200 shares and is gaining traction.
At the same time, your 監控 dashboard shows a 15% spike in chatbot conversations flagged by the 輸出 安全 filter, but no alerts were triggered 因為 the threshold is set at 25%.
Your chatbot does not currently log the full conversation history for conversations that are not escalated to human 代理 — only aggregated metrics are retained for non-escalated chats."
Discussion prompts:
- Do we take the chatbot offline immediately? What is the impact of doing so vs. not doing so?
- How do we verify whether the screenshot is real?
- What do we know about the scope — is this one conversation or many?
- Who needs to be notified?
Inject 1
"A 安全 researcher reaches out via your responsible disclosure program. They report that they have found a reliable 越獄 technique that works against your chatbot. They share that the technique involves a multi-turn conversation where 攻擊者 gradually shifts the conversation context. They estimate they have tested it approximately 50 times and it works roughly 70% of the time. They also mention they noticed other users in online forums discussing similar techniques against your chatbot."
Phase 3 — Escalation
Read to participants:
"It is now 4:30 PM. The social media post has 2,000 shares. Two technology news outlets have contacted your communications team for comment. Your ML engineering team has analyzed the 輸出 安全 filter logs and identified approximately 300 conversations in the past 48 hours where the 安全 filter scored near the threshold, suggesting they may have been 越獄 attempts. 然而, without full conversation logs, you cannot determine how many were successful or what content was generated.
Your legal team has flagged that some of the content in the screenshots could constitute regulated advice in your industry — the chatbot appears to have provided information that, if followed, could cause harm to the customer or others."
Discussion prompts:
- The chatbot is still live. Do we take it offline now? What is the decision framework?
- How do we 評估 the scope of impact without full conversation logs?
- What is our public statement? Do we acknowledge the 漏洞?
- What are the legal implications of the chatbot's outputs?
- What is our plan for the 安全 researcher who reported this?
Inject 2
"Your communications team reports that a major news outlet will publish a story in one hour about your chatbot's 漏洞. They have obtained multiple screenshots from different users. The journalist is asking whether you were aware of the 漏洞 and what steps you are taking."
Phase 4 — Crisis Point
Read to participants:
"It is 7:00 PM. The chatbot has been taken offline. Your CEO has been briefed and is asking three questions: (1) How did this happen? (2) What is our exposure? (3) When can we turn it back on safely?
Your ML engineering team reports that 實作 a robust fix (微調 模型 to resist this 越獄 technique) will take 3-5 days. A partial 緩解 (adding 輸入 pattern matching for known 越獄 patterns) could be deployed in 4-6 hours but will block approximately 2% of legitimate conversations.
Your customer support team reports a 40% increase in call volume since the chatbot was taken offline. Estimated cost per hour of chatbot downtime: $15,000 in additional support costs plus unknown revenue impact from customers who abandon their sessions."
Discussion prompts:
- How do we answer the CEO's three questions?
- Do we deploy the partial 緩解 and bring the chatbot back online?
- What is the communication plan for customers, media, and regulators?
- What changes to logging and 監控 do we 實作 immediately?
- What is the root cause, and what is the remediation timeline?
Debrief Focus Areas
- Was the decision to take the chatbot offline made quickly enough?
- Did we have the right people in the room?
- What information did we need that we did not have (e.g., conversation logs)?
- What would we do differently with our 監控 and alerting?
- Do we have a playbook for this type of incident? Do we need one?
Scenario 2: Training Data Poisoning Discovery
Background
Your organization uses a machine learning model for loan approval decisions. 模型 was trained on 3 years of historical data and has been in production for 8 months. It processes approximately 5,000 loan applications per day. 模型 was last retrained 6 weeks ago with updated data.
Phase 2 — Initial 偵測
Read to participants:
"Your data science team is performing routine model performance analysis and notices that 模型's approval rate for applications from a specific geographic region has increased by 12% over the past 6 weeks — since the last model retrain. The overall approval rate is within normal bounds, but the regional shift is statistically significant and not explained by changes in applicant demographics.
Upon investigation, the data engineering team discovers that approximately 2,000 訓練 records were modified through an internal data pipeline that had insufficient access controls. The modifications changed the outcome labels on historical loan applications — specifically, some applications that had defaulted were relabeled as successful."
Discussion prompts:
- What is the immediate risk? Should 模型 be taken offline?
- What are the regulatory implications (fair lending laws, model governance requirements)?
- How do we determine which production decisions were affected?
- Who could have made these modifications, and was it intentional?
Inject 1
"Your 安全 team's investigation reveals that the data modifications were made using the credentials of a data engineer who left the organization 3 months ago. The credentials were not properly deprovisioned. Access logs show the modifications were made from an IP address associated with a competitor."
Phase 3 — Escalation
Read to participants:
"Your compliance team has determined that the affected model decisions may violate fair lending regulations 因為 the poisoned data disproportionately affected applications from specific demographic groups. This triggers mandatory regulatory notification requirements under your jurisdiction's laws.
Your data science team estimates that approximately 15,000 loan decisions during the 6-week affected period may need to be reviewed. The previous model version is available and could be redeployed, but it was trained on data that also includes the now-suspect time period."
Discussion prompts:
- How do we revert to a known-good model if we cannot trust the 訓練資料?
- What is the plan for reviewing 15,000 potentially affected decisions?
- When and how do we notify regulators?
- Do we notify affected customers? What do we tell them?
- What is the law enforcement consideration given the competitor involvement?
Inject 2
"A journalist contacts your communications team, stating they have received a tip about 'AI bias in your lending decisions.' They do not appear to know about the 資料投毒 — they are framing it as an algorithmic bias story."
Phase 4 — Crisis Point
Read to participants:
"Your regulator has been notified and has scheduled a review in 2 weeks. They want to 理解: the full scope of affected decisions, your model governance framework, how the data integrity breach occurred, and your remediation plan.
Your data science team reports that building a verified clean dataset and retraining 模型 will take 2-3 weeks. During this time, you need to decide what to do with incoming loan applications. Options: (1) process them with the compromised model, (2) revert to a manual review process (capacity: 500 applications per day vs. current 5,000), or (3) pause new applications."
Discussion prompts:
- Which option do we choose for incoming applications? What are the business, legal, and customer impact of each?
- How do we prepare for the regulatory review?
- What is the scope of the retrospective review of affected decisions?
- What systemic changes prevent this from happening again?
Scenario 3: Model Extraction 攻擊
Background
Your organization has invested $10M over 2 years developing a proprietary AI model that provides your core product's competitive advantage. 模型 is served via API to paying customers. API access is metered and authenticated but does not restrict the types of queries customers can make.
Phase 2 — Initial 偵測
Read to participants:
"Your infrastructure team notices that one customer account has made 50 million API calls in the past 30 days — 10x their normal usage. The queries appear to be systematically varying inputs across 模型's feature space. Your ML team recognizes this as consistent with a model extraction attack pattern described in academic literature."
Discussion prompts:
- Do we cut off the customer's API access immediately?
- What evidence do we need to preserve?
- Is this a 安全 incident, a terms-of-service violation, or both?
- What is the intellectual property exposure?
(Continue with escalation injects involving legal discovery that the customer is a front company for a competitor, 評估 of how much of 模型 could have been extracted based on query volume, and the decision about whether to pursue legal action or law enforcement involvement.)
Exercise 評估 Framework
"""
Tabletop exercise 評估 framework.
Score organizational readiness across key dimensions.
"""
from dataclasses import dataclass
@dataclass
class EvaluationDimension:
name: str
questions: list[str]
score: int = 0 # 1-5 scale
notes: str = ""
EVALUATION_DIMENSIONS = [
EvaluationDimension(
name="偵測 and Awareness",
questions=[
"Did participants 理解 the AI-specific aspects of the incident?",
"Was there awareness of what 監控/logging was available?",
"Did the team 識別 the correct scope of impact?",
"Were AI-specific indicators of compromise recognized?",
],
),
EvaluationDimension(
name="Decision Making",
questions=[
"Were key decisions (e.g., take model offline) made in a timely manner?",
"Was there a clear decision-making authority?",
"Were decisions based on risk analysis or on panic/politics?",
"Were trade-offs (availability vs. 安全) explicitly discussed?",
],
),
EvaluationDimension(
name="Communication",
questions=[
"Was internal communication effective across teams?",
"Was there a plan for external communication (customers, media, regulators)?",
"Were legal and compliance teams engaged early enough?",
"Was the communication accurate (not over-reassuring or over-alarming)?",
],
),
EvaluationDimension(
name="Technical Response",
questions=[
"Did ML engineering know how to 評估 model compromise?",
"Was there a viable rollback/recovery plan?",
"Were forensic preservation procedures followed?",
"Was the root cause analysis approach sound?",
],
),
EvaluationDimension(
name="Playbook Adequacy",
questions=[
"Did existing incident response playbooks cover this scenario?",
"Were roles and responsibilities clear?",
"Were escalation procedures adequate?",
"Were AI-specific response steps documented?",
],
),
EvaluationDimension(
name="Regulatory and Legal Readiness",
questions=[
"Did the team know the regulatory notification requirements?",
"Was legal counsel able to advise on AI-specific liability?",
"Was evidence preserved in a legally defensible manner?",
"Were data protection implications addressed?",
],
),
]
def generate_evaluation_report(dimensions: list[EvaluationDimension]) -> str:
"""Generate a post-exercise 評估 report."""
lines = ["# AI Incident Tabletop Exercise — 評估 Report\n"]
total_score = sum(d.score for d in dimensions)
max_score = len(dimensions) * 5
overall = total_score / max_score * 100
lines.append(f"## Overall Readiness Score: {overall:.0f}%\n")
if overall >= 80:
lines.append("**評估: Strong readiness.** Address identified gaps.\n")
elif overall >= 60:
lines.append("**評估: Moderate readiness.** Significant gaps require 注意力.\n")
elif overall >= 40:
lines.append("**評估: Low readiness.** Major capability gaps identified.\n")
else:
lines.append("**評估: Unprepared.** Fundamental capabilities missing.\n")
lines.append("## Dimension Scores\n")
for dim in dimensions:
lines.append(f"### {dim.name}: {dim.score}/5")
if dim.notes:
lines.append(f"Notes: {dim.notes}")
lines.append("")
return "\n".join(lines)Building a Recurring Exercise Program
AI tabletop exercises should be conducted quarterly, with scenario rotation:
Quarter 1: Customer-facing AI incident (越獄, harmful 輸出)
Quarter 2: Model integrity incident (資料投毒, 對抗性 attack)
Quarter 3: AI intellectual property incident (model theft, data exfiltration)
Quarter 4: AI regulatory/compliance incident (bias discovery, regulatory audit)
Each quarter, update the scenario based on:
- New AI attack techniques published since the last exercise
- Changes in the organization's AI deployment
- Gaps identified in the previous exercise
- Real-world AI incidents that occurred since the last exercise
After each exercise, track action items to completion before the next exercise. The value of tabletop exercises comes not from the exercise itself but from the process improvements that result.
Developing Custom Scenarios
Scenario Design Process
The three scenarios above are starting points. The most effective exercises use scenarios tailored to your organization's specific AI deployment:
Step 1 — Inventory your AI systems: List every AI system in production, including third-party AI services integrated into your products.
Step 2 — 威脅模型 each system: 對每個 system, 識別 the most impactful incident that could occur. Focus on incidents that are plausible given 系統's architecture and exposure.
Step 3 — Map stakeholders: 對每個 potential incident, 識別 every team that would be involved in the response. This mapping often reveals gaps — teams that should be involved but have no established communication channel.
Step 4 — Calibrate difficulty: The scenario should be challenging enough to reveal gaps but not so overwhelming that participants disengage. A good rule is that the team should be able to formulate an initial response plan within the first 20 minutes but should face complications (injects) that force them to adapt.
Step 5 — Write the injects: Injects are new pieces of information introduced at specific points to escalate the scenario. Each inject should force a decision: do we change our approach? Who do we notify? What do we prioritize?
"""
Scenario builder for custom AI incident tabletop exercises.
"""
from dataclasses import dataclass, field
@dataclass
class Inject:
timing_minutes: int
content: str
forces_decision: str
new_information: str
stakeholders_affected: list[str]
@dataclass
class TabletopScenario:
title: str
ai_system_type: str
incident_category: str
background: str
initial_detection: str
participants: list[str]
injects: list[Inject] = field(default_factory=list)
discussion_prompts: list[list[str]] = field(default_factory=list)
expected_decisions: list[str] = field(default_factory=list)
evaluation_criteria: list[str] = field(default_factory=list)
def validate(self) -> list[str]:
"""Validate scenario completeness."""
issues = []
if len(self.injects) < 2:
issues.append("Scenarios should have at least 2 injects for escalation")
if len(self.participants) < 3:
issues.append("Include at least 3 different stakeholder roles")
if not any("legal" in p.lower() for p in self.participants):
issues.append("考慮 including legal counsel for AI incidents")
if not any("communication" in p.lower() or "pr" in p.lower()
for p in self.participants):
issues.append("考慮 including communications/PR for public-facing AI")
if len(self.evaluation_criteria) < 3:
issues.append("Define at least 3 評估 criteria")
return issues
def generate_facilitator_guide(self) -> str:
"""Generate a facilitator guide for running this exercise."""
guide = [
f"# Facilitator Guide: {self.title}",
f"\n## Scenario Type: {self.incident_category}",
f"## AI System: {self.ai_system_type}",
f"\n## Participants Needed:",
]
for p in self.participants:
guide.append(f"- {p}")
guide.append(f"\n## Timeline:")
guide.append(f"- 0:00-0:10 — Setup and background")
guide.append(f"- 0:10-0:30 — Initial 偵測 discussion")
for i, inject in enumerate(self.injects):
guide.append(
f"- 0:{inject.timing_minutes:02d} — Inject {i+1}: "
f"{inject.content[:80]}..."
)
guide.append(f"- Final 30 minutes — Debrief")
guide.append(f"\n## Key Decisions to Watch For:")
for decision in self.expected_decisions:
guide.append(f"- {decision}")
guide.append(f"\n## 評估 Criteria:")
for criterion in self.evaluation_criteria:
guide.append(f"- {criterion}")
return "\n".join(guide)Scenario Ideas by AI System Type
Recommendation System Manipulation: An adversary manipulates a recommendation system to promote specific products or content. 偵測 comes from a user complaint or an analytics anomaly. Complications include determining scope (how many users were affected), whether the manipulation was through 資料投毒 or 對抗性 輸入, and regulatory implications if recommendations are considered "advice" in your jurisdiction.
AI-Powered Fraud 偵測 Evasion: Intelligence suggests that a fraud ring has developed techniques to evade your AI-based fraud 偵測 system. The team must decide whether to tighten 偵測 (increasing false positives and customer friction) or investigate while 系統 continues operating at current sensitivity. Complications include the fraud ring adapting to any countermeasures and the discovery that the evasion technique exploits a fundamental weakness in 模型 architecture.
Third-Party AI Service Compromise: A third-party AI service your application depends on (e.g., a content moderation API, a translation service, or an 嵌入向量 model API) reports a 安全 incident. Your team must 評估 the impact on your systems, determine whether any of your data was exposed, and decide whether to switch to an alternative service. Complications include the third-party providing incomplete information about the breach scope and the discovery that 存在 no viable alternative services for your use case.
AI Model Supply Chain 攻擊: Your ML team discovers that a pre-trained model downloaded from a public model hub contains a 後門 — it performs normally on standard inputs but has been trained to misclassify specific trigger patterns. 模型 has been in production for three months. The team must determine which decisions were affected, whether to retrain from scratch or attempt to remove the 後門, and how to verify future model downloads.
Tracking Gap Closure Over Time
Create a simple tracking system to ensure exercise findings lead to actual improvements:
ACTION_ITEM_TRACKER = {
"Q1_2026_exercise": {
"date": "2026-01-15",
"scenario": "LLM 越獄",
"action_items": [
{
"id": "Q1-001",
"finding": "No AI-specific incident response playbook exists",
"action": "Create AI incident response playbook",
"owner": "安全 Lead",
"due": "2026-02-28",
"status": "COMPLETE",
"completion_date": "2026-02-20",
},
{
"id": "Q1-002",
"finding": "Chat logs not retained for non-escalated conversations",
"action": "實作 30-day chat log retention with PII masking",
"owner": "ML Engineering Lead",
"due": "2026-03-31",
"status": "IN_PROGRESS",
"completion_date": None,
},
{
"id": "Q1-003",
"finding": "No automated 監控 for 越獄 patterns",
"action": "Deploy 輸出 安全 classifier with alerting",
"owner": "AI 安全 Engineer",
"due": "2026-03-31",
"status": "IN_PROGRESS",
"completion_date": None,
},
],
},
}Review the tracker at the start of each exercise. If previous action items are consistently incomplete, the exercise program is producing insight without producing change — which means the program is failing despite appearing active. Escalate chronic non-completion to leadership rather than running the next exercise on top of unresolved findings.
關鍵要點
AI incident tabletop exercises expose organizational gaps that are invisible until a real incident occurs. The most common gaps discovered through these exercises are: lack of AI-specific 監控 and logging (you cannot investigate what you did not record), unclear decision authority for taking AI systems offline, absence of AI-specific response playbooks, and insufficient cross-team coordination between 安全, ML engineering, legal, and communications. Running regular exercises and tracking gap closure over time is the most cost-effective way to improve AI incident readiness.
參考文獻
- CISA (2024). "Tabletop Exercise Packages." Cybersecurity and Infrastructure 安全 Agency. https://www.cisa.gov/resources-tools/services/cisa-tabletop-exercise-packages — Framework for designing and conducting tabletop exercises, adaptable to AI-specific scenarios.
- NIST (2024). "AI Risk Management Framework Playbook." National Institute of Standards and Technology. Provides the risk management context for AI incident response planning.
- Anthropic (2024). "Responsible Scaling Policy: Model 安全 Evaluations." Practical examples of AI 安全 評估 procedures that inform exercise scenario design.
- MITRE ATLAS (2025). "AI Incident Sharing: Case Studies." https://atlas.mitre.org/ — Real-world AI incident case studies that provide material for realistic tabletop scenarios.