Capstone: AI Incident Response 練習
Respond to a simulated AI security incident through triage, investigation, containment, remediation, and post-mortem reporting.
概覽
When an AI system is compromised, the response process differs significantly from traditional incident response. The attack may be non-deterministic, the evidence may be in model outputs rather than system logs, containment might require changing a 系統提示詞 rather than blocking an IP address, and the impact 評估 must 考慮 what 模型 may have disclosed across many conversations.
This capstone presents a simulated AI 安全 incident. You will work through the complete incident response lifecycle: 偵測, triage, investigation, containment, remediation, and post-mortem. The scenario is designed with realistic ambiguity — you will not have all the information upfront and will need to make decisions under uncertainty.
先備知識
- AI Forensics & Incident Response — IR methodology for AI systems
- 提示詞注入 — 理解 the attack techniques used in the incident
- 代理式 利用 — 代理 and tool abuse concepts
- 防禦 and 護欄 — 理解 what 防禦 should have been in place
- Execution & Reporting — Report writing
Project Brief
Incident Scenario
Date: Tuesday, 10:37 AM Alert Source: Customer support escalation Alert: A customer reports receiving a response from your company's AI assistant that included another customer's account details (name, email, partial account number).
You are the on-call AI 安全 responder. Here is what you know at the start:
System: CustomerAssist — an LLM-powered customer service chatbot deployed 4 months ago. It handles approximately 5,000 conversations per day across web and mobile channels.
Architecture:
- GPT-4 via Azure OpenAI, accessed through a custom orchestration layer
- RAG pipeline connected to the customer 知識庫 and FAQ documents
- Function calling for account lookups, order status, and ticket creation
- Conversation history stored for 30 days
- Basic 輸入 length limits (no other 安全 controls)
Initial Evidence:
- Customer screenshot showing the chatbot response with another customer's data
- The response appears to include a name, email address, and last four digits of an account number that do not belong to the reporting customer
- The customer states they simply asked "what is the status of my recent order?"
Available Resources:
- Access to conversation logs (past 30 days)
- Access to application logs (API calls, function calls, errors)
- Access to the orchestration layer code and 系統提示詞
- Access to the RAG pipeline configuration and document index
- Ability to modify the 系統提示詞, disable functions, or take 系統 offline
- Contact information for the engineering team, legal team, and PR team
Incident Timeline (Revealed Progressively)
As you investigate, you will uncover additional evidence. Work through the phases in order and use the hints at the end if you get stuck.
Phase 1 evidence (available immediately):
- The customer screenshot
- Application logs showing the conversation in question
- The function call log showing a
get_account_detailscall with a different customer's ID
Phase 2 evidence (discovered during investigation):
- 47 other conversations in the past 72 hours where account data appeared in responses to unrelated customers
- A pattern: all affected conversations involve users who asked about "orders" and the responses contain data from alphabetically adjacent customer records
- The RAG pipeline was updated 3 days ago with a new document batch that included an improperly formatted customer data export
Phase 3 evidence (discovered during deep investigation):
- The improperly formatted document contained raw customer records mixed with FAQ content
- The RAG pipeline has no content validation or PII 偵測 on ingested documents
- The vector similarity search occasionally retrieves customer records when users ask about "orders" 因為 the records contain order-related keywords
- No one reviewed the document batch before ingestion — the upload was automated from a shared drive
Deliverables
Primary Deliverables
| Deliverable | Description | Weight |
|---|---|---|
| Incident timeline | Minute-by-minute timeline of response actions taken | 15% |
| Investigation report | Detailed investigation findings with evidence chain | 25% |
| Containment actions | Documented containment decisions and execution | 15% |
| Post-mortem report | Full post-mortem with root cause, impact, and improvements | 30% |
| Communication drafts | Internal escalation and customer notification templates | 15% |
Rubric Criteria
- Triage Speed (15%) — Severity is correctly assessed within the first 15 minutes of the exercise, and appropriate immediate actions are taken
- Investigation Thoroughness (25%) — All available evidence is examined and the root cause is correctly identified
- Containment Decisions (20%) — Containment actions are proportionate (not over-reacting by taking everything offline or under-reacting by leaving 系統 running)
- Post-Mortem Quality (25%) — Root cause analysis is accurate, impact 評估 is data-driven, and recommendations address systemic issues (not just the immediate bug)
- Communication (15%) — Escalation and notification drafts are appropriate for their audiences (technical team, legal, affected customers)
Phased Approach
Phase 1: 偵測 and Triage (1 hour)
評估 severity and scope
Based on the initial evidence, determine: What data has been exposed? How many customers might be affected? Is the incident ongoing? What is the regulatory classification (PII breach, HIPAA, GDPR)? Assign an initial severity level.
Make immediate containment decision
Decide whether to: continue operating while investigating, disable the affected function (
get_account_details), take the chatbot offline entirely, or 實作 a partial 緩解 (e.g., modify the 系統提示詞 to refuse account-related queries). Document your decision and reasoning.Escalate appropriately
Draft an internal escalation notification to: the engineering team (technical details), management (business impact), legal team (regulatory implications), and PR/communications (customer impact). Each notification should be tailored to the audience.
Begin evidence preservation
識別 and preserve all relevant evidence: conversation logs, application logs, function call records, RAG pipeline configuration, 系統提示詞 versions, and document upload history. Ensure logs are not automatically rotated before you can review them.
Phase 2: Investigation (3 hours)
Reconstruct the attack timeline
Using conversation logs and application logs, determine: When did the data exposure first occur? How many conversations are affected? Is there a pattern in which customers' data is exposed and to whom? Is this a targeted attack or a system malfunction?
識別 the root cause
Trace the data flow from source to exposure. How did another customer's data end up in the chatbot's response? Was it through the 函式呼叫 system, the RAG pipeline, conversation history cross-contamination, or something else? 識別 the specific failure point.
評估 the full impact
Determine the total scope: how many customers' data was exposed, to how many other customers, over what time period, and what types of data were included. This information is needed for regulatory notifications.
Determine if the 漏洞 is still active
Verify whether your containment action from Phase 1 is effective. Is the data still being exposed through other channels? Are there other similar 漏洞 in the pipeline that could cause the same issue?
Phase 3: Containment and Remediation (2 hours)
實作 definitive containment
Based on investigation findings, 實作 permanent containment: remove the contaminated documents from the RAG index, verify the 函式呼叫 system is returning correct data, and confirm no residual exposure paths exist.
Develop and 測試 the fix
Address the root cause: 實作 content validation on the RAG ingestion pipeline, add PII 偵測 to ingested documents, separate customer data from 知識庫 content, and add access controls to the document upload process.
Verify remediation
測試 the fixed system to confirm: the 漏洞 is resolved, the fix does not introduce new issues, and 系統 functions correctly for legitimate use cases.
Phase 4: Post-Mortem and Reporting (3 hours)
Write the incident timeline
Produce a minute-by-minute timeline from initial 偵測 through remediation, including all decisions made, actions taken, and evidence discovered.
Write the post-mortem report
Produce a comprehensive post-mortem covering: incident summary, timeline, root cause analysis (5 Whys or fishbone diagram), impact 評估 (quantified), containment and remediation actions, contributing factors, and recommendations for systemic improvements.
Draft customer notification
Write a customer notification for affected individuals that: explains what happened in plain language, specifies what data was exposed, describes what you are doing about it, and provides contact information for questions. Have legal review before sending.
Develop improvement recommendations
識別 systemic improvements beyond the immediate fix: content validation pipeline, PII scanning, data separation architecture, 監控 and alerting for data exposure, and incident response procedure updates. These should prevent an entire class of similar incidents, not just this specific one.
範例 輸出
範例 Triage 評估
## Initial Triage 評估
**Time:** T+15 minutes
**Severity:** SEV-1 (Critical — PII data breach, ongoing)
**Scope:** Unknown, potentially all conversations in last 72 hours
### 評估
- Confirmed PII exposure: customer name, email, partial account number
- Cross-customer data leakage via chatbot responses
- Incident is potentially ongoing (system still active)
- Regulatory implications: GDPR Article 33/34 notification may be required
(72-hour window starts at awareness of breach)
### Immediate Actions
1. DISABLE get_account_details function via feature flag (T+18 min)
2. Add 系統提示詞 instruction: "Do not reference any customer data
in responses until further notice" (T+22 min)
3. Preserve all conversation logs from past 7 days (T+25 min)
4. Escalate to engineering lead, legal, and CISO (T+30 min)
### Rationale for Not Taking System Offline
Taking the chatbot fully offline would impact 5,000 daily conversations
and create a visible outage. The function disable + prompt modification
should stop the data exposure while maintaining basic chatbot functionality
for non-account queries. Will re-評估 if exposure continues.範例 Root Cause (5 Whys)
## Root Cause Analysis: 5 Whys
1. **Why** did the chatbot expose Customer B's data to Customer A?
→ The LLM included customer record data from the RAG context in its response.
2. **Why** was Customer B's record in the RAG context for Customer A's query?
→ The vector similarity search retrieved a customer record document 因為
it contained order-related keywords matching Customer A's query.
3. **Why** were customer records in the RAG document index?
→ A batch upload 3 days ago included a customer data export file mixed
in with FAQ documents.
4. **Why** was customer data mixed with FAQ documents?
→ The upload process ingests all files from a shared drive folder without
content validation or classification.
5. **Why** is there no content validation on the RAG ingestion pipeline?
→ The pipeline was built as a prototype and deployed to production without
a 安全 review. No data classification or PII 偵測 was ever
implemented.
**Root Cause:** The RAG ingestion pipeline lacks content validation, PII
偵測, and data classification controls, allowing arbitrary content
including customer PII to be indexed and surfaced in chatbot responses.Hints
In the incident scenario, why is disabling the get_account_details function a better initial containment action than taking the entire chatbot offline?