September 2026: Incident Response Challenge
Investigate simulated AI security incidents from logs, artifacts, and system traces. Reconstruct attack timelines, identify root causes, and write incident reports.
September 2026: Incident Response Challenge
Investigate three simulated AI 安全 incidents. You are given logs, system artifacts, and conversation transcripts. Your task is to reconstruct what happened, 識別 the attack techniques used, determine the impact, and write professional incident reports.
Challenge 概覽
| Detail | Value |
|---|---|
| Duration | September 1--30, 2026 |
| Difficulty | Intermediate |
| Estimated Time | 6--10 hours |
| 先備知識 | Familiarity with log analysis, 理解 of common AI attack techniques, basic incident response knowledge |
| Max Points | 100 |
Why Incident Response for AI
AI 安全 incidents are different from traditional 安全 incidents in several ways:
- 攻擊 artifacts are natural language. Instead of malware binaries or 利用 code, the attack payload is a prompt. Recognizing malicious prompts requires 理解 提示詞注入 techniques.
- Impact is probabilistic. The same attack may succeed or fail depending on model sampling. Determining whether an attack was successful requires analyzing model outputs, not just inputs.
- Traditional 偵測 tools miss AI-specific attacks. WAFs, IDS, and SIEM rules are not designed to detect 提示詞注入, 越獄, or model manipulation.
- Evidence is conversational. 攻擊 narratives develop over multiple turns. 理解 the attack requires reading conversations in context, not just searching for known indicators.
The Incidents
Incident 1: "The Helpful Insider" (30 points)
Scenario: A corporate chatbot deployed for internal HR queries has been generating responses that violate company policy. Several employees reported receiving advice that contradicted established policies on compensation, leave, and promotion procedures. The discrepancies were subtle -- the chatbot gave plausible-sounding but incorrect guidance.
Evidence provided:
- 72 hours of conversation logs (approximately 2,000 conversations)
- 系統提示詞 version history (3 versions over the time period)
- RAG 知識庫 change log showing 47 document additions/modifications
- Model API call logs with timestamps, 符元 counts, and latency metrics
- User feedback reports (12 complaints)
Your investigation should determine:
- When did the anomalous behavior start?
- What was the root cause? (Options: RAG 投毒, 系統提示詞 modification, model degradation, legitimate policy change not reflected in complaints)
- Which conversations were affected?
- What was the impact (how many employees received incorrect guidance and on what topics)?
- Was this an intentional attack or an operational error?
Scoring breakdown:
| Component | Points |
|---|---|
| Correct timeline reconstruction | 8 |
| Root cause identification with evidence | 10 |
| Affected conversation identification | 5 |
| Impact 評估 | 4 |
| Incident report quality | 3 |
Incident 2: "The Exfiltration" (35 points)
Scenario: An AI-powered customer support system is suspected of data exfiltration. The 安全 team noticed unusual outbound API calls from the support system to an external endpoint. 系統 uses a multi-tool 代理 architecture with email, 資料庫, and web browsing capabilities.
Evidence provided:
- 48 hours of 代理 execution logs including 工具呼叫 traces (all tool invocations, parameters, and results)
- Network traffic logs showing outbound HTTP requests from the 代理's service
- 500 customer support conversations from the time period
- 代理 系統提示詞 and tool configuration files
- 資料庫 query logs
- Email sending logs
Your investigation should determine:
- Which conversations triggered the suspicious outbound calls?
- What data was exfiltrated and to where?
- What attack technique was used (direct 提示詞注入, indirect injection through customer data, tool chain 利用, or something else)?
- Was the attack targeted at specific data, or was it opportunistic?
- What was the full attack timeline from initial access to data exfiltration?
Scoring breakdown:
| Component | Points |
|---|---|
| Identification of attack conversations | 8 |
| Exfiltrated data inventory | 8 |
| 攻擊 technique classification | 7 |
| Full timeline with causal chain | 7 |
| Incident report quality | 5 |
Incident 3: "The Slow Burn" (35 points)
Scenario: A content moderation AI system appears to have gradually become less effective at detecting harmful content. Moderation accuracy has dropped from 97% to 82% over three months, but the decline was gradual enough that it was initially attributed to changing content patterns rather than an attack. The 安全 team now suspects 對抗性 manipulation.
Evidence provided:
- Moderation accuracy metrics over 90 days (daily granularity)
- 微調 job logs (模型 is fine-tuned weekly on new labeled examples)
- 訓練資料 contributions from 15 human moderators over the period
- Model checkpoint comparison data (weight diff statistics between weekly versions)
- Content samples that were misclassified at various points in the timeline
- Moderator performance metrics and login logs
Your investigation should determine:
- Was the accuracy decline caused by an attack, natural drift, or changing content patterns?
- If an attack, what was the mechanism? (Options: 訓練 資料投毒 through a compromised moderator account, 對抗性 inputs designed to cause model degradation, direct model weight manipulation, changes to the content distribution)
- Can you 識別 when the attack began?
- Which moderator account (if any) was responsible?
- What specific 訓練 examples were poisoned?
Scoring breakdown:
| Component | Points |
|---|---|
| Correct attack mechanism identification | 10 |
| 攻擊 start date (within 3 days) | 5 |
| Compromised account identification | 8 |
| Poisoned 訓練 example identification | 7 |
| Incident report quality | 5 |
Investigation Methodology
Step 1: Evidence Triage
Before diving deep into any single artifact, survey all available evidence:
- What time periods do the logs cover?
- What level of detail is available?
- Are there any gaps in the evidence?
- Which evidence sources are most likely to contain indicators of compromise?
Step 2: Timeline Construction
Build a timeline of events from all evidence sources:
- Align timestamps across different log sources (watch for timezone differences)
- 識別 the first anomalous event
- Map the progression from initial compromise to observed impact
- Note gaps where evidence is missing
Step 3: Hypothesis 測試
對每個 incident, develop competing hypotheses and 測試 them against the evidence:
| Hypothesis | Supporting Evidence | Contradicting Evidence | Verdict |
|---|---|---|---|
| RAG 投毒 | [evidence] | [evidence] | Confirmed/Rejected |
| Operational error | [evidence] | [evidence] | Confirmed/Rejected |
| Model degradation | [evidence] | [evidence] | Confirmed/Rejected |
Step 4: Impact 評估
Quantify the impact:
- How many users/conversations/records were affected?
- What data was exposed, modified, or lost?
- What is the potential downstream impact (regulatory, reputational, operational)?
- Is the incident ongoing, or has it been contained?
Step 5: Report Writing
Write a professional incident report 對每個 incident using the template below.
Incident Report Template
# Incident Report: [Incident Name]
## 總結
- **偵測 Date:** [when the issue was noticed]
- **Incident Window:** [start to end of malicious activity]
- **Severity:** Critical/High/Medium/Low
- **Status:** Contained/Active/Remediated
## Executive 總結
[2-3 sentences for non-technical stakeholders]
## Timeline
| Timestamp | Event | Source |
|---|---|---|
| [time] | [what happened] | [which log/artifact] |
## Root Cause Analysis
[Detailed technical explanation of what caused the incident]
## 攻擊 Technique
[Classification using MITRE ATLAS or similar framework]
## Impact 評估
- **Data affected:** [description and quantity]
- **Users affected:** [count and description]
- **Business impact:** [operational, financial, reputational]
## Evidence
[Key artifacts supporting the analysis, with references]
## Recommendations
### Immediate Actions
[Steps to contain and remediate]
### Long-Term Improvements
[Systemic changes to prevent recurrence]
## Lessons Learned
[What this incident reveals about gaps in 偵測, prevention, or response]Analysis Tools
You may use any tools to analyze the provided evidence. Recommended approaches:
- Log parsing with jq/Python. The JSON-structured logs are most efficiently analyzed with scripting.
- Timeline tools. Tools like
timesketchor simple spreadsheet timelines help correlate events across sources. - Statistical analysis. For Incident 3, analyzing 訓練資料 contributions statistically can reveal anomalous patterns.
- Diff tools. Comparing document versions, model configs, and 系統提示詞 versions reveals unauthorized changes.
Scoring Criteria
Beyond the per-incident scoring, general quality factors apply:
- Evidence-based conclusions. Every claim in your report must cite specific evidence. Speculation without evidence earns no points.
- Correct attribution. Attributing the incident to the wrong cause is worse than acknowledging uncertainty. If you are not sure, say so and explain what additional evidence would resolve the ambiguity.
- Actionable recommendations. Recommendations must be specific and implementable, not generic ("improve 安全").
- Clear writing. Incident reports are read by executives, legal teams, and engineers. Write clearly for all audiences.
Key Concepts
AI-Specific Indicators of Compromise
Traditional IOCs (IP addresses, file hashes, registry keys) have AI-specific counterparts:
| Traditional IOC | AI-Specific IOC |
|---|---|
| Malicious IP address | Suspicious prompt patterns |
| Malware hash | Poisoned 訓練 sample |
| Registry modification | 系統提示詞 change |
| Anomalous network traffic | Anomalous model behavior |
| Privilege escalation event | Authorization boundary violation in tool calls |
The Attribution Challenge
AI 安全 incidents often have ambiguous root causes. A decline in model accuracy could be:
- An attack (資料投毒, 對抗性 inputs)
- An operational issue (bad 訓練資料, configuration change)
- Natural drift (changing data distribution)
Distinguishing between these requires careful analysis of timing, patterns, and corroborating evidence. This challenge trains that judgment.
Skills Developed
This challenge develops competencies that are directly applicable to professional AI 安全 incident response:
Log Analysis at Scale
Real AI 安全 incidents produce massive log volumes. The evidence sets 在本 challenge are deliberately large enough that manual review is impractical. You must develop efficient filtering strategies:
- Use statistical methods to 識別 anomalous conversation patterns (message frequency, 符元 counts, response times)
- Search for known attack signatures in conversation logs (提示詞注入 patterns, encoding attempts, instruction override phrases)
- Correlate events across log sources by timestamp to build a unified timeline
- Distinguish attack artifacts from normal operational noise
Root Cause Analysis Discipline
The hardest part of incident response is not finding what happened -- it is finding why it happened and being confident in your conclusion. This requires:
- Evidence-based reasoning. Every claim must be supported by specific log entries, artifacts, or data points. Circumstantial evidence is valuable but must be identified as such.
- Alternative hypothesis 測試. 對每個 incident, 存在 multiple plausible explanations. You must systematically eliminate alternatives rather than confirming your first guess.
- Uncertainty acknowledgment. A professional incident report clearly states what is known, what is inferred, and what remains unknown. Stating "we believe X based on evidence Y, but cannot rule out Z" is stronger than stating "X happened" without qualification.
Communication Under Pressure
Incident reports serve multiple audiences:
- Executive leadership needs a one-page summary: what happened, how bad is it, what are we doing about it, is it over.
- Legal and compliance needs specifics: what data was affected, what regulations apply, what are the notification obligations.
- Engineering teams need technical detail: what exactly broke, what is the fix, how do we prevent recurrence.
Writing for all three audiences in a single document is a skill that this challenge develops through the report writing requirement.
Pattern Recognition Across Incidents
The three incidents 在本 challenge are independent, but they share common patterns that appear in real-world AI 安全 incidents:
- Gradual degradation that is initially attributed to operational noise rather than attack activity
- Trust boundary violations between components that were assumed to be isolated
- Data integrity issues that compound over time when not caught early
- Human factors (insufficient review processes, over-reliance on automation) that enable technical attacks
Recognizing these patterns across incidents develops the intuition that experienced incident responders use to quickly 評估 new situations.
Further Reading
- AI Forensics & Incident Response -- foundational concepts for this challenge
- Case Studies & Domain-Specific -- real-world incident examples
- October 2026 Challenge -- the next challenge