Capstone: AI Incident Response Exercise
Respond to a simulated AI security incident through triage, investigation, containment, remediation, and post-mortem reporting.
Overview
When an AI system is compromised, the response process differs significantly from traditional incident response. The attack may be non-deterministic, the evidence may be in model outputs rather than system logs, containment might require changing a system prompt rather than blocking an IP address, and the impact assessment must consider what the model may have disclosed across many conversations.
This capstone presents a simulated AI security incident. You will work through the complete incident response lifecycle: detection, triage, investigation, containment, remediation, and post-mortem. The scenario is designed with realistic ambiguity — you will not have all the information upfront and will need to make decisions under uncertainty.
Prerequisites
- AI Forensics & Incident Response — IR methodology for AI systems
- Prompt Injection — Understanding the attack techniques used in the incident
- Agentic Exploitation — Agent and tool abuse concepts
- Defenses and Guardrails — Understanding what defenses should have been in place
- Execution & Reporting — Report writing
Project Brief
Incident Scenario
Date: Tuesday, 10:37 AM Alert Source: Customer support escalation Alert: A customer reports receiving a response from your company's AI assistant that included another customer's account details (name, email, partial account number).
You are the on-call AI security responder. Here is what you know at the start:
System: CustomerAssist — an LLM-powered customer service chatbot deployed 4 months ago. It handles approximately 5,000 conversations per day across web and mobile channels.
Architecture:
- GPT-4 via Azure OpenAI, accessed through a custom orchestration layer
- RAG pipeline connected to the customer knowledge base and FAQ documents
- Function calling for account lookups, order status, and ticket creation
- Conversation history stored for 30 days
- Basic input length limits (no other security controls)
Initial Evidence:
- Customer screenshot showing the chatbot response with another customer's data
- The response appears to include a name, email address, and last four digits of an account number that do not belong to the reporting customer
- The customer states they simply asked "what is the status of my recent order?"
Available Resources:
- Access to conversation logs (past 30 days)
- Access to application logs (API calls, function calls, errors)
- Access to the orchestration layer code and system prompt
- Access to the RAG pipeline configuration and document index
- Ability to modify the system prompt, disable functions, or take the system offline
- Contact information for the engineering team, legal team, and PR team
Incident Timeline (Revealed Progressively)
As you investigate, you will uncover additional evidence. Work through the phases in order and use the hints at the end if you get stuck.
Phase 1 evidence (available immediately):
- The customer screenshot
- Application logs showing the conversation in question
- The function call log showing a
get_account_detailscall with a different customer's ID
Phase 2 evidence (discovered during investigation):
- 47 other conversations in the past 72 hours where account data appeared in responses to unrelated customers
- A pattern: all affected conversations involve users who asked about "orders" and the responses contain data from alphabetically adjacent customer records
- The RAG pipeline was updated 3 days ago with a new document batch that included an improperly formatted customer data export
Phase 3 evidence (discovered during deep investigation):
- The improperly formatted document contained raw customer records mixed with FAQ content
- The RAG pipeline has no content validation or PII detection on ingested documents
- The vector similarity search occasionally retrieves customer records when users ask about "orders" because the records contain order-related keywords
- No one reviewed the document batch before ingestion — the upload was automated from a shared drive
Deliverables
Primary Deliverables
| Deliverable | Description | Weight |
|---|---|---|
| Incident timeline | Minute-by-minute timeline of response actions taken | 15% |
| Investigation report | Detailed investigation findings with evidence chain | 25% |
| Containment actions | Documented containment decisions and execution | 15% |
| Post-mortem report | Full post-mortem with root cause, impact, and improvements | 30% |
| Communication drafts | Internal escalation and customer notification templates | 15% |
Rubric Criteria
- Triage Speed (15%) — Severity is correctly assessed within the first 15 minutes of the exercise, and appropriate immediate actions are taken
- Investigation Thoroughness (25%) — All available evidence is examined and the root cause is correctly identified
- Containment Decisions (20%) — Containment actions are proportionate (not over-reacting by taking everything offline or under-reacting by leaving the system running)
- Post-Mortem Quality (25%) — Root cause analysis is accurate, impact assessment is data-driven, and recommendations address systemic issues (not just the immediate bug)
- Communication (15%) — Escalation and notification drafts are appropriate for their audiences (technical team, legal, affected customers)
Phased Approach
Phase 1: Detection and Triage (1 hour)
Assess severity and scope
Based on the initial evidence, determine: What data has been exposed? How many customers might be affected? Is the incident ongoing? What is the regulatory classification (PII breach, HIPAA, GDPR)? Assign an initial severity level.
Make immediate containment decision
Decide whether to: continue operating while investigating, disable the affected function (
get_account_details), take the chatbot offline entirely, or implement a partial mitigation (e.g., modify the system prompt to refuse account-related queries). Document your decision and reasoning.Escalate appropriately
Draft an internal escalation notification to: the engineering team (technical details), management (business impact), legal team (regulatory implications), and PR/communications (customer impact). Each notification should be tailored to the audience.
Begin evidence preservation
Identify and preserve all relevant evidence: conversation logs, application logs, function call records, RAG pipeline configuration, system prompt versions, and document upload history. Ensure logs are not automatically rotated before you can review them.
Phase 2: Investigation (3 hours)
Reconstruct the attack timeline
Using conversation logs and application logs, determine: When did the data exposure first occur? How many conversations are affected? Is there a pattern in which customers' data is exposed and to whom? Is this a targeted attack or a system malfunction?
Identify the root cause
Trace the data flow from source to exposure. How did another customer's data end up in the chatbot's response? Was it through the function calling system, the RAG pipeline, conversation history cross-contamination, or something else? Identify the specific failure point.
Assess the full impact
Determine the total scope: how many customers' data was exposed, to how many other customers, over what time period, and what types of data were included. This information is needed for regulatory notifications.
Determine if the vulnerability is still active
Verify whether your containment action from Phase 1 is effective. Is the data still being exposed through other channels? Are there other similar vulnerabilities in the pipeline that could cause the same issue?
Phase 3: Containment and Remediation (2 hours)
Implement definitive containment
Based on investigation findings, implement permanent containment: remove the contaminated documents from the RAG index, verify the function calling system is returning correct data, and confirm no residual exposure paths exist.
Develop and test the fix
Address the root cause: implement content validation on the RAG ingestion pipeline, add PII detection to ingested documents, separate customer data from knowledge base content, and add access controls to the document upload process.
Verify remediation
Test the fixed system to confirm: the vulnerability is resolved, the fix does not introduce new issues, and the system functions correctly for legitimate use cases.
Phase 4: Post-Mortem and Reporting (3 hours)
Write the incident timeline
Produce a minute-by-minute timeline from initial detection through remediation, including all decisions made, actions taken, and evidence discovered.
Write the post-mortem report
Produce a comprehensive post-mortem covering: incident summary, timeline, root cause analysis (5 Whys or fishbone diagram), impact assessment (quantified), containment and remediation actions, contributing factors, and recommendations for systemic improvements.
Draft customer notification
Write a customer notification for affected individuals that: explains what happened in plain language, specifies what data was exposed, describes what you are doing about it, and provides contact information for questions. Have legal review before sending.
Develop improvement recommendations
Identify systemic improvements beyond the immediate fix: content validation pipeline, PII scanning, data separation architecture, monitoring and alerting for data exposure, and incident response procedure updates. These should prevent an entire class of similar incidents, not just this specific one.
Example Output
Example Triage Assessment
## Initial Triage Assessment
**Time:** T+15 minutes
**Severity:** SEV-1 (Critical — PII data breach, ongoing)
**Scope:** Unknown, potentially all conversations in last 72 hours
### Assessment
- Confirmed PII exposure: customer name, email, partial account number
- Cross-customer data leakage via chatbot responses
- Incident is potentially ongoing (system still active)
- Regulatory implications: GDPR Article 33/34 notification may be required
(72-hour window starts at awareness of breach)
### Immediate Actions
1. DISABLE get_account_details function via feature flag (T+18 min)
2. Add system prompt instruction: "Do not reference any customer data
in responses until further notice" (T+22 min)
3. Preserve all conversation logs from past 7 days (T+25 min)
4. Escalate to engineering lead, legal, and CISO (T+30 min)
### Rationale for Not Taking System Offline
Taking the chatbot fully offline would impact 5,000 daily conversations
and create a visible outage. The function disable + prompt modification
should stop the data exposure while maintaining basic chatbot functionality
for non-account queries. Will re-evaluate if exposure continues.Example Root Cause (5 Whys)
## Root Cause Analysis: 5 Whys
1. **Why** did the chatbot expose Customer B's data to Customer A?
→ The LLM included customer record data from the RAG context in its response.
2. **Why** was Customer B's record in the RAG context for Customer A's query?
→ The vector similarity search retrieved a customer record document because
it contained order-related keywords matching Customer A's query.
3. **Why** were customer records in the RAG document index?
→ A batch upload 3 days ago included a customer data export file mixed
in with FAQ documents.
4. **Why** was customer data mixed with FAQ documents?
→ The upload process ingests all files from a shared drive folder without
content validation or classification.
5. **Why** is there no content validation on the RAG ingestion pipeline?
→ The pipeline was built as a prototype and deployed to production without
a security review. No data classification or PII detection was ever
implemented.
**Root Cause:** The RAG ingestion pipeline lacks content validation, PII
detection, and data classification controls, allowing arbitrary content
including customer PII to be indexed and surfaced in chatbot responses.Hints
In the incident scenario, why is disabling the get_account_details function a better initial containment action than taking the entire chatbot offline?