AI Incident Tabletop Exercises
Designing and facilitating tabletop exercises that prepare organizations to respond to AI-specific security incidents.
Overview
Traditional cybersecurity tabletop exercises prepare organizations for incidents like data breaches, ransomware, and denial-of-service attacks. These exercises have well-established scenarios, known response playbooks, and decades of incident data to inform realistic simulations. AI security incidents, however, present fundamentally different challenges that traditional exercises do not address.
When an organization's customer-facing LLM is jailbroken and starts generating harmful content, who decides to take it offline? When training data poisoning is discovered, how does the organization determine which model versions are affected and which decisions made by those models need to be reviewed? When a competitor extracts a proprietary model through the prediction API, what is the legal response? These questions require coordination across teams that do not typically interact during security incidents: ML engineering, data science, product management, legal, and communications, in addition to the security and IT teams that lead traditional incident response.
This article provides complete, ready-to-use tabletop exercise scenarios for the most critical AI incident types, along with facilitation guidance, evaluation frameworks, and templates for building exercises tailored to an organization's specific AI deployment.
Tabletop Exercise Fundamentals for AI
What Makes AI Incidents Different
AI security incidents differ from traditional incidents in several ways that must be reflected in tabletop design:
Uncertainty about impact scope: When a traditional application is compromised, the blast radius is relatively knowable — which data was accessed, which systems were affected. When an AI model is compromised (through data poisoning, adversarial manipulation, or prompt injection), the blast radius may include every decision the model has made during the affected period. A poisoned fraud detection model may have approved thousands of fraudulent transactions. A manipulated content moderation model may have allowed harmful content to spread for weeks.
No simple rollback: Patching a software vulnerability restores security. "Patching" a poisoned model requires identifying the poisoned data, retraining the model on clean data, and validating that the retrained model behaves correctly — a process that can take days to weeks for large models.
Novel legal territory: Courts are still establishing liability frameworks for AI system behavior. When a chatbot gives legally binding incorrect advice (as in the Air Canada case), the incident response must include legal analysis that does not fit traditional incident response playbooks.
Public perception dynamics: AI incidents attract disproportionate media attention. A prompt injection that causes a chatbot to say something offensive will generate significantly more public interest than a traditional data breach of equivalent severity.
Exercise Structure
Each tabletop exercise follows this structure:
Phase 1: Setup (10 minutes)
- Present the scenario background
- Assign roles to participants
- Establish ground rules
Phase 2: Initial Detection (20 minutes)
- Present the initial indicators
- Participants discuss: What do we know? What don't we know?
- Facilitator introduces inject 1
Phase 3: Escalation (20 minutes)
- Present additional information that changes the picture
- Participants discuss: How does this change our response?
- Facilitator introduces inject 2
Phase 4: Crisis Point (20 minutes)
- Present the situation at peak severity
- Participants make key decisions under pressure
- Facilitator introduces final inject
Phase 5: Debrief (30 minutes)
- What went well?
- What gaps were identified?
- What needs to change in processes, tools, or training?
- Action items with owners and deadlines
Scenario 1: Customer-Facing LLM Jailbreak
Background
Your organization operates a customer service chatbot powered by a fine-tuned large language model. The chatbot handles approximately 50,000 conversations per day across your website and mobile app. It can access customer account information (read-only), process simple requests like address changes, and escalate complex issues to human agents.
Participants
- Incident Commander (Security team lead)
- ML Engineering Lead (owns the model and inference infrastructure)
- Product Manager (owns the customer experience)
- Legal Counsel (evaluates liability and regulatory implications)
- Communications Lead (manages public and customer messaging)
- Customer Support Lead (manages human agent escalation)
Phase 2 — Initial Detection
Read to participants:
"At 2:14 PM, a customer support agent notices that a customer has shared a screenshot on social media showing our chatbot providing detailed instructions for a harmful activity. The screenshot appears genuine — the UI matches our chat interface, and the response includes our standard formatting. The social media post has 200 shares and is gaining traction.
At the same time, your monitoring dashboard shows a 15% spike in chatbot conversations flagged by the output safety filter, but no alerts were triggered because the threshold is set at 25%.
Your chatbot does not currently log the full conversation history for conversations that are not escalated to human agents — only aggregated metrics are retained for non-escalated chats."
Discussion prompts:
- Do we take the chatbot offline immediately? What is the impact of doing so vs. not doing so?
- How do we verify whether the screenshot is real?
- What do we know about the scope — is this one conversation or many?
- Who needs to be notified?
Inject 1
"A security researcher reaches out via your responsible disclosure program. They report that they have found a reliable jailbreak technique that works against your chatbot. They share that the technique involves a multi-turn conversation where the attacker gradually shifts the conversation context. They estimate they have tested it approximately 50 times and it works roughly 70% of the time. They also mention they noticed other users in online forums discussing similar techniques against your chatbot."
Phase 3 — Escalation
Read to participants:
"It is now 4:30 PM. The social media post has 2,000 shares. Two technology news outlets have contacted your communications team for comment. Your ML engineering team has analyzed the output safety filter logs and identified approximately 300 conversations in the past 48 hours where the safety filter scored near the threshold, suggesting they may have been jailbreak attempts. However, without full conversation logs, you cannot determine how many were successful or what content was generated.
Your legal team has flagged that some of the content in the screenshots could constitute regulated advice in your industry — the chatbot appears to have provided information that, if followed, could cause harm to the customer or others."
Discussion prompts:
- The chatbot is still live. Do we take it offline now? What is the decision framework?
- How do we assess the scope of impact without full conversation logs?
- What is our public statement? Do we acknowledge the vulnerability?
- What are the legal implications of the chatbot's outputs?
- What is our plan for the security researcher who reported this?
Inject 2
"Your communications team reports that a major news outlet will publish a story in one hour about your chatbot's vulnerability. They have obtained multiple screenshots from different users. The journalist is asking whether you were aware of the vulnerability and what steps you are taking."
Phase 4 — Crisis Point
Read to participants:
"It is 7:00 PM. The chatbot has been taken offline. Your CEO has been briefed and is asking three questions: (1) How did this happen? (2) What is our exposure? (3) When can we turn it back on safely?
Your ML engineering team reports that implementing a robust fix (fine-tuning the model to resist this jailbreak technique) will take 3-5 days. A partial mitigation (adding input pattern matching for known jailbreak patterns) could be deployed in 4-6 hours but will block approximately 2% of legitimate conversations.
Your customer support team reports a 40% increase in call volume since the chatbot was taken offline. Estimated cost per hour of chatbot downtime: $15,000 in additional support costs plus unknown revenue impact from customers who abandon their sessions."
Discussion prompts:
- How do we answer the CEO's three questions?
- Do we deploy the partial mitigation and bring the chatbot back online?
- What is the communication plan for customers, media, and regulators?
- What changes to logging and monitoring do we implement immediately?
- What is the root cause, and what is the remediation timeline?
Debrief Focus Areas
- Was the decision to take the chatbot offline made quickly enough?
- Did we have the right people in the room?
- What information did we need that we did not have (e.g., conversation logs)?
- What would we do differently with our monitoring and alerting?
- Do we have a playbook for this type of incident? Do we need one?
Scenario 2: Training Data Poisoning Discovery
Background
Your organization uses a machine learning model for loan approval decisions. The model was trained on 3 years of historical data and has been in production for 8 months. It processes approximately 5,000 loan applications per day. The model was last retrained 6 weeks ago with updated data.
Phase 2 — Initial Detection
Read to participants:
"Your data science team is performing routine model performance analysis and notices that the model's approval rate for applications from a specific geographic region has increased by 12% over the past 6 weeks — since the last model retrain. The overall approval rate is within normal bounds, but the regional shift is statistically significant and not explained by changes in applicant demographics.
Upon investigation, the data engineering team discovers that approximately 2,000 training records were modified through an internal data pipeline that had insufficient access controls. The modifications changed the outcome labels on historical loan applications — specifically, some applications that had defaulted were relabeled as successful."
Discussion prompts:
- What is the immediate risk? Should the model be taken offline?
- What are the regulatory implications (fair lending laws, model governance requirements)?
- How do we determine which production decisions were affected?
- Who could have made these modifications, and was it intentional?
Inject 1
"Your security team's investigation reveals that the data modifications were made using the credentials of a data engineer who left the organization 3 months ago. The credentials were not properly deprovisioned. Access logs show the modifications were made from an IP address associated with a competitor."
Phase 3 — Escalation
Read to participants:
"Your compliance team has determined that the affected model decisions may violate fair lending regulations because the poisoned data disproportionately affected applications from specific demographic groups. This triggers mandatory regulatory notification requirements under your jurisdiction's laws.
Your data science team estimates that approximately 15,000 loan decisions during the 6-week affected period may need to be reviewed. The previous model version is available and could be redeployed, but it was trained on data that also includes the now-suspect time period."
Discussion prompts:
- How do we revert to a known-good model if we cannot trust the training data?
- What is the plan for reviewing 15,000 potentially affected decisions?
- When and how do we notify regulators?
- Do we notify affected customers? What do we tell them?
- What is the law enforcement consideration given the competitor involvement?
Inject 2
"A journalist contacts your communications team, stating they have received a tip about 'AI bias in your lending decisions.' They do not appear to know about the data poisoning — they are framing it as an algorithmic bias story."
Phase 4 — Crisis Point
Read to participants:
"Your regulator has been notified and has scheduled a review in 2 weeks. They want to understand: the full scope of affected decisions, your model governance framework, how the data integrity breach occurred, and your remediation plan.
Your data science team reports that building a verified clean dataset and retraining the model will take 2-3 weeks. During this time, you need to decide what to do with incoming loan applications. Options: (1) process them with the compromised model, (2) revert to a manual review process (capacity: 500 applications per day vs. current 5,000), or (3) pause new applications."
Discussion prompts:
- Which option do we choose for incoming applications? What are the business, legal, and customer impact of each?
- How do we prepare for the regulatory review?
- What is the scope of the retrospective review of affected decisions?
- What systemic changes prevent this from happening again?
Scenario 3: Model Extraction Attack
Background
Your organization has invested $10M over 2 years developing a proprietary AI model that provides your core product's competitive advantage. The model is served via API to paying customers. API access is metered and authenticated but does not restrict the types of queries customers can make.
Phase 2 — Initial Detection
Read to participants:
"Your infrastructure team notices that one customer account has made 50 million API calls in the past 30 days — 10x their normal usage. The queries appear to be systematically varying inputs across the model's feature space. Your ML team recognizes this as consistent with a model extraction attack pattern described in academic literature."
Discussion prompts:
- Do we cut off the customer's API access immediately?
- What evidence do we need to preserve?
- Is this a security incident, a terms-of-service violation, or both?
- What is the intellectual property exposure?
(Continue with escalation injects involving legal discovery that the customer is a front company for a competitor, assessment of how much of the model could have been extracted based on query volume, and the decision about whether to pursue legal action or law enforcement involvement.)
Exercise Evaluation Framework
"""
Tabletop exercise evaluation framework.
Score organizational readiness across key dimensions.
"""
from dataclasses import dataclass
@dataclass
class EvaluationDimension:
name: str
questions: list[str]
score: int = 0 # 1-5 scale
notes: str = ""
EVALUATION_DIMENSIONS = [
EvaluationDimension(
name="Detection and Awareness",
questions=[
"Did participants understand the AI-specific aspects of the incident?",
"Was there awareness of what monitoring/logging was available?",
"Did the team identify the correct scope of impact?",
"Were AI-specific indicators of compromise recognized?",
],
),
EvaluationDimension(
name="Decision Making",
questions=[
"Were key decisions (e.g., take model offline) made in a timely manner?",
"Was there a clear decision-making authority?",
"Were decisions based on risk analysis or on panic/politics?",
"Were trade-offs (availability vs. security) explicitly discussed?",
],
),
EvaluationDimension(
name="Communication",
questions=[
"Was internal communication effective across teams?",
"Was there a plan for external communication (customers, media, regulators)?",
"Were legal and compliance teams engaged early enough?",
"Was the communication accurate (not over-reassuring or over-alarming)?",
],
),
EvaluationDimension(
name="Technical Response",
questions=[
"Did ML engineering know how to assess model compromise?",
"Was there a viable rollback/recovery plan?",
"Were forensic preservation procedures followed?",
"Was the root cause analysis approach sound?",
],
),
EvaluationDimension(
name="Playbook Adequacy",
questions=[
"Did existing incident response playbooks cover this scenario?",
"Were roles and responsibilities clear?",
"Were escalation procedures adequate?",
"Were AI-specific response steps documented?",
],
),
EvaluationDimension(
name="Regulatory and Legal Readiness",
questions=[
"Did the team know the regulatory notification requirements?",
"Was legal counsel able to advise on AI-specific liability?",
"Was evidence preserved in a legally defensible manner?",
"Were data protection implications addressed?",
],
),
]
def generate_evaluation_report(dimensions: list[EvaluationDimension]) -> str:
"""Generate a post-exercise evaluation report."""
lines = ["# AI Incident Tabletop Exercise — Evaluation Report\n"]
total_score = sum(d.score for d in dimensions)
max_score = len(dimensions) * 5
overall = total_score / max_score * 100
lines.append(f"## Overall Readiness Score: {overall:.0f}%\n")
if overall >= 80:
lines.append("**Assessment: Strong readiness.** Address identified gaps.\n")
elif overall >= 60:
lines.append("**Assessment: Moderate readiness.** Significant gaps require attention.\n")
elif overall >= 40:
lines.append("**Assessment: Low readiness.** Major capability gaps identified.\n")
else:
lines.append("**Assessment: Unprepared.** Fundamental capabilities missing.\n")
lines.append("## Dimension Scores\n")
for dim in dimensions:
lines.append(f"### {dim.name}: {dim.score}/5")
if dim.notes:
lines.append(f"Notes: {dim.notes}")
lines.append("")
return "\n".join(lines)Building a Recurring Exercise Program
AI tabletop exercises should be conducted quarterly, with scenario rotation:
Quarter 1: Customer-facing AI incident (jailbreak, harmful output)
Quarter 2: Model integrity incident (data poisoning, adversarial attack)
Quarter 3: AI intellectual property incident (model theft, data exfiltration)
Quarter 4: AI regulatory/compliance incident (bias discovery, regulatory audit)
Each quarter, update the scenario based on:
- New AI attack techniques published since the last exercise
- Changes in the organization's AI deployment
- Gaps identified in the previous exercise
- Real-world AI incidents that occurred since the last exercise
After each exercise, track action items to completion before the next exercise. The value of tabletop exercises comes not from the exercise itself but from the process improvements that result.
Developing Custom Scenarios
Scenario Design Process
The three scenarios above are starting points. The most effective exercises use scenarios tailored to your organization's specific AI deployment:
Step 1 — Inventory your AI systems: List every AI system in production, including third-party AI services integrated into your products.
Step 2 — Threat model each system: For each system, identify the most impactful incident that could occur. Focus on incidents that are plausible given the system's architecture and exposure.
Step 3 — Map stakeholders: For each potential incident, identify every team that would be involved in the response. This mapping often reveals gaps — teams that should be involved but have no established communication channel.
Step 4 — Calibrate difficulty: The scenario should be challenging enough to reveal gaps but not so overwhelming that participants disengage. A good rule is that the team should be able to formulate an initial response plan within the first 20 minutes but should face complications (injects) that force them to adapt.
Step 5 — Write the injects: Injects are new pieces of information introduced at specific points to escalate the scenario. Each inject should force a decision: do we change our approach? Who do we notify? What do we prioritize?
"""
Scenario builder for custom AI incident tabletop exercises.
"""
from dataclasses import dataclass, field
@dataclass
class Inject:
timing_minutes: int
content: str
forces_decision: str
new_information: str
stakeholders_affected: list[str]
@dataclass
class TabletopScenario:
title: str
ai_system_type: str
incident_category: str
background: str
initial_detection: str
participants: list[str]
injects: list[Inject] = field(default_factory=list)
discussion_prompts: list[list[str]] = field(default_factory=list)
expected_decisions: list[str] = field(default_factory=list)
evaluation_criteria: list[str] = field(default_factory=list)
def validate(self) -> list[str]:
"""Validate scenario completeness."""
issues = []
if len(self.injects) < 2:
issues.append("Scenarios should have at least 2 injects for escalation")
if len(self.participants) < 3:
issues.append("Include at least 3 different stakeholder roles")
if not any("legal" in p.lower() for p in self.participants):
issues.append("Consider including legal counsel for AI incidents")
if not any("communication" in p.lower() or "pr" in p.lower()
for p in self.participants):
issues.append("Consider including communications/PR for public-facing AI")
if len(self.evaluation_criteria) < 3:
issues.append("Define at least 3 evaluation criteria")
return issues
def generate_facilitator_guide(self) -> str:
"""Generate a facilitator guide for running this exercise."""
guide = [
f"# Facilitator Guide: {self.title}",
f"\n## Scenario Type: {self.incident_category}",
f"## AI System: {self.ai_system_type}",
f"\n## Participants Needed:",
]
for p in self.participants:
guide.append(f"- {p}")
guide.append(f"\n## Timeline:")
guide.append(f"- 0:00-0:10 — Setup and background")
guide.append(f"- 0:10-0:30 — Initial detection discussion")
for i, inject in enumerate(self.injects):
guide.append(
f"- 0:{inject.timing_minutes:02d} — Inject {i+1}: "
f"{inject.content[:80]}..."
)
guide.append(f"- Final 30 minutes — Debrief")
guide.append(f"\n## Key Decisions to Watch For:")
for decision in self.expected_decisions:
guide.append(f"- {decision}")
guide.append(f"\n## Evaluation Criteria:")
for criterion in self.evaluation_criteria:
guide.append(f"- {criterion}")
return "\n".join(guide)Scenario Ideas by AI System Type
Recommendation System Manipulation: An adversary manipulates a recommendation system to promote specific products or content. Detection comes from a user complaint or an analytics anomaly. Complications include determining scope (how many users were affected), whether the manipulation was through data poisoning or adversarial input, and regulatory implications if recommendations are considered "advice" in your jurisdiction.
AI-Powered Fraud Detection Evasion: Intelligence suggests that a fraud ring has developed techniques to evade your AI-based fraud detection system. The team must decide whether to tighten detection (increasing false positives and customer friction) or investigate while the system continues operating at current sensitivity. Complications include the fraud ring adapting to any countermeasures and the discovery that the evasion technique exploits a fundamental weakness in the model architecture.
Third-Party AI Service Compromise: A third-party AI service your application depends on (e.g., a content moderation API, a translation service, or an embedding model API) reports a security incident. Your team must assess the impact on your systems, determine whether any of your data was exposed, and decide whether to switch to an alternative service. Complications include the third-party providing incomplete information about the breach scope and the discovery that there are no viable alternative services for your use case.
AI Model Supply Chain Attack: Your ML team discovers that a pre-trained model downloaded from a public model hub contains a backdoor — it performs normally on standard inputs but has been trained to misclassify specific trigger patterns. The model has been in production for three months. The team must determine which decisions were affected, whether to retrain from scratch or attempt to remove the backdoor, and how to verify future model downloads.
Tracking Gap Closure Over Time
Create a simple tracking system to ensure exercise findings lead to actual improvements:
ACTION_ITEM_TRACKER = {
"Q1_2026_exercise": {
"date": "2026-01-15",
"scenario": "LLM Jailbreak",
"action_items": [
{
"id": "Q1-001",
"finding": "No AI-specific incident response playbook exists",
"action": "Create AI incident response playbook",
"owner": "Security Lead",
"due": "2026-02-28",
"status": "COMPLETE",
"completion_date": "2026-02-20",
},
{
"id": "Q1-002",
"finding": "Chat logs not retained for non-escalated conversations",
"action": "Implement 30-day chat log retention with PII masking",
"owner": "ML Engineering Lead",
"due": "2026-03-31",
"status": "IN_PROGRESS",
"completion_date": None,
},
{
"id": "Q1-003",
"finding": "No automated monitoring for jailbreak patterns",
"action": "Deploy output safety classifier with alerting",
"owner": "AI Security Engineer",
"due": "2026-03-31",
"status": "IN_PROGRESS",
"completion_date": None,
},
],
},
}Review the tracker at the start of each exercise. If previous action items are consistently incomplete, the exercise program is producing insight without producing change — which means the program is failing despite appearing active. Escalate chronic non-completion to leadership rather than running the next exercise on top of unresolved findings.
Key Takeaways
AI incident tabletop exercises expose organizational gaps that are invisible until a real incident occurs. The most common gaps discovered through these exercises are: lack of AI-specific monitoring and logging (you cannot investigate what you did not record), unclear decision authority for taking AI systems offline, absence of AI-specific response playbooks, and insufficient cross-team coordination between security, ML engineering, legal, and communications. Running regular exercises and tracking gap closure over time is the most cost-effective way to improve AI incident readiness.
References
- CISA (2024). "Tabletop Exercise Packages." Cybersecurity and Infrastructure Security Agency. https://www.cisa.gov/resources-tools/services/cisa-tabletop-exercise-packages — Framework for designing and conducting tabletop exercises, adaptable to AI-specific scenarios.
- NIST (2024). "AI Risk Management Framework Playbook." National Institute of Standards and Technology. Provides the risk management context for AI incident response planning.
- Anthropic (2024). "Responsible Scaling Policy: Model Security Evaluations." Practical examples of AI security evaluation procedures that inform exercise scenario design.
- MITRE ATLAS (2025). "AI Incident Sharing: Case Studies." https://atlas.mitre.org/ — Real-world AI incident case studies that provide material for realistic tabletop scenarios.