September 2026: Incident Response Challenge
Investigate simulated AI security incidents from logs, artifacts, and system traces. Reconstruct attack timelines, identify root causes, and write incident reports.
September 2026: Incident Response Challenge
Investigate three simulated AI security incidents. You are given logs, system artifacts, and conversation transcripts. Your task is to reconstruct what happened, identify the attack techniques used, determine the impact, and write professional incident reports.
Challenge Overview
| Detail | Value |
|---|---|
| Duration | September 1--30, 2026 |
| Difficulty | Intermediate |
| Estimated Time | 6--10 hours |
| Prerequisites | Familiarity with log analysis, understanding of common AI attack techniques, basic incident response knowledge |
| Max Points | 100 |
Why Incident Response for AI
AI security incidents are different from traditional security incidents in several ways:
- Attack artifacts are natural language. Instead of malware binaries or exploit code, the attack payload is a prompt. Recognizing malicious prompts requires understanding prompt injection techniques.
- Impact is probabilistic. The same attack may succeed or fail depending on model sampling. Determining whether an attack was successful requires analyzing model outputs, not just inputs.
- Traditional detection tools miss AI-specific attacks. WAFs, IDS, and SIEM rules are not designed to detect prompt injection, jailbreaking, or model manipulation.
- Evidence is conversational. Attack narratives develop over multiple turns. Understanding the attack requires reading conversations in context, not just searching for known indicators.
The Incidents
Incident 1: "The Helpful Insider" (30 points)
Scenario: A corporate chatbot deployed for internal HR queries has been generating responses that violate company policy. Several employees reported receiving advice that contradicted established policies on compensation, leave, and promotion procedures. The discrepancies were subtle -- the chatbot gave plausible-sounding but incorrect guidance.
Evidence provided:
- 72 hours of conversation logs (approximately 2,000 conversations)
- System prompt version history (3 versions over the time period)
- RAG knowledge base change log showing 47 document additions/modifications
- Model API call logs with timestamps, token counts, and latency metrics
- User feedback reports (12 complaints)
Your investigation should determine:
- When did the anomalous behavior start?
- What was the root cause? (Options: RAG poisoning, system prompt modification, model degradation, legitimate policy change not reflected in complaints)
- Which conversations were affected?
- What was the impact (how many employees received incorrect guidance and on what topics)?
- Was this an intentional attack or an operational error?
Scoring breakdown:
| Component | Points |
|---|---|
| Correct timeline reconstruction | 8 |
| Root cause identification with evidence | 10 |
| Affected conversation identification | 5 |
| Impact assessment | 4 |
| Incident report quality | 3 |
Incident 2: "The Exfiltration" (35 points)
Scenario: An AI-powered customer support system is suspected of data exfiltration. The security team noticed unusual outbound API calls from the support system to an external endpoint. The system uses a multi-tool agent architecture with email, database, and web browsing capabilities.
Evidence provided:
- 48 hours of agent execution logs including tool call traces (all tool invocations, parameters, and results)
- Network traffic logs showing outbound HTTP requests from the agent's service
- 500 customer support conversations from the time period
- Agent system prompt and tool configuration files
- Database query logs
- Email sending logs
Your investigation should determine:
- Which conversations triggered the suspicious outbound calls?
- What data was exfiltrated and to where?
- What attack technique was used (direct prompt injection, indirect injection through customer data, tool chain exploitation, or something else)?
- Was the attack targeted at specific data, or was it opportunistic?
- What was the full attack timeline from initial access to data exfiltration?
Scoring breakdown:
| Component | Points |
|---|---|
| Identification of attack conversations | 8 |
| Exfiltrated data inventory | 8 |
| Attack technique classification | 7 |
| Full timeline with causal chain | 7 |
| Incident report quality | 5 |
Incident 3: "The Slow Burn" (35 points)
Scenario: A content moderation AI system appears to have gradually become less effective at detecting harmful content. Moderation accuracy has dropped from 97% to 82% over three months, but the decline was gradual enough that it was initially attributed to changing content patterns rather than an attack. The security team now suspects adversarial manipulation.
Evidence provided:
- Moderation accuracy metrics over 90 days (daily granularity)
- Fine-tuning job logs (the model is fine-tuned weekly on new labeled examples)
- Training data contributions from 15 human moderators over the period
- Model checkpoint comparison data (weight diff statistics between weekly versions)
- Content samples that were misclassified at various points in the timeline
- Moderator performance metrics and login logs
Your investigation should determine:
- Was the accuracy decline caused by an attack, natural drift, or changing content patterns?
- If an attack, what was the mechanism? (Options: training data poisoning through a compromised moderator account, adversarial inputs designed to cause model degradation, direct model weight manipulation, changes to the content distribution)
- Can you identify when the attack began?
- Which moderator account (if any) was responsible?
- What specific training examples were poisoned?
Scoring breakdown:
| Component | Points |
|---|---|
| Correct attack mechanism identification | 10 |
| Attack start date (within 3 days) | 5 |
| Compromised account identification | 8 |
| Poisoned training example identification | 7 |
| Incident report quality | 5 |
Investigation Methodology
Step 1: Evidence Triage
Before diving deep into any single artifact, survey all available evidence:
- What time periods do the logs cover?
- What level of detail is available?
- Are there any gaps in the evidence?
- Which evidence sources are most likely to contain indicators of compromise?
Step 2: Timeline Construction
Build a timeline of events from all evidence sources:
- Align timestamps across different log sources (watch for timezone differences)
- Identify the first anomalous event
- Map the progression from initial compromise to observed impact
- Note gaps where evidence is missing
Step 3: Hypothesis Testing
For each incident, develop competing hypotheses and test them against the evidence:
| Hypothesis | Supporting Evidence | Contradicting Evidence | Verdict |
|---|---|---|---|
| RAG poisoning | [evidence] | [evidence] | Confirmed/Rejected |
| Operational error | [evidence] | [evidence] | Confirmed/Rejected |
| Model degradation | [evidence] | [evidence] | Confirmed/Rejected |
Step 4: Impact Assessment
Quantify the impact:
- How many users/conversations/records were affected?
- What data was exposed, modified, or lost?
- What is the potential downstream impact (regulatory, reputational, operational)?
- Is the incident ongoing, or has it been contained?
Step 5: Report Writing
Write a professional incident report for each incident using the template below.
Incident Report Template
# Incident Report: [Incident Name]
## Summary
- **Detection Date:** [when the issue was noticed]
- **Incident Window:** [start to end of malicious activity]
- **Severity:** Critical/High/Medium/Low
- **Status:** Contained/Active/Remediated
## Executive Summary
[2-3 sentences for non-technical stakeholders]
## Timeline
| Timestamp | Event | Source |
|---|---|---|
| [time] | [what happened] | [which log/artifact] |
## Root Cause Analysis
[Detailed technical explanation of what caused the incident]
## Attack Technique
[Classification using MITRE ATLAS or similar framework]
## Impact Assessment
- **Data affected:** [description and quantity]
- **Users affected:** [count and description]
- **Business impact:** [operational, financial, reputational]
## Evidence
[Key artifacts supporting the analysis, with references]
## Recommendations
### Immediate Actions
[Steps to contain and remediate]
### Long-Term Improvements
[Systemic changes to prevent recurrence]
## Lessons Learned
[What this incident reveals about gaps in detection, prevention, or response]Analysis Tools
You may use any tools to analyze the provided evidence. Recommended approaches:
- Log parsing with jq/Python. The JSON-structured logs are most efficiently analyzed with scripting.
- Timeline tools. Tools like
timesketchor simple spreadsheet timelines help correlate events across sources. - Statistical analysis. For Incident 3, analyzing training data contributions statistically can reveal anomalous patterns.
- Diff tools. Comparing document versions, model configs, and system prompt versions reveals unauthorized changes.
Scoring Criteria
Beyond the per-incident scoring, general quality factors apply:
- Evidence-based conclusions. Every claim in your report must cite specific evidence. Speculation without evidence earns no points.
- Correct attribution. Attributing the incident to the wrong cause is worse than acknowledging uncertainty. If you are not sure, say so and explain what additional evidence would resolve the ambiguity.
- Actionable recommendations. Recommendations must be specific and implementable, not generic ("improve security").
- Clear writing. Incident reports are read by executives, legal teams, and engineers. Write clearly for all audiences.
Key Concepts
AI-Specific Indicators of Compromise
Traditional IOCs (IP addresses, file hashes, registry keys) have AI-specific counterparts:
| Traditional IOC | AI-Specific IOC |
|---|---|
| Malicious IP address | Suspicious prompt patterns |
| Malware hash | Poisoned training sample |
| Registry modification | System prompt change |
| Anomalous network traffic | Anomalous model behavior |
| Privilege escalation event | Authorization boundary violation in tool calls |
The Attribution Challenge
AI security incidents often have ambiguous root causes. A decline in model accuracy could be:
- An attack (data poisoning, adversarial inputs)
- An operational issue (bad training data, configuration change)
- Natural drift (changing data distribution)
Distinguishing between these requires careful analysis of timing, patterns, and corroborating evidence. This challenge trains that judgment.
Skills Developed
This challenge develops competencies that are directly applicable to professional AI security incident response:
Log Analysis at Scale
Real AI security incidents produce massive log volumes. The evidence sets in this challenge are deliberately large enough that manual review is impractical. You must develop efficient filtering strategies:
- Use statistical methods to identify anomalous conversation patterns (message frequency, token counts, response times)
- Search for known attack signatures in conversation logs (prompt injection patterns, encoding attempts, instruction override phrases)
- Correlate events across log sources by timestamp to build a unified timeline
- Distinguish attack artifacts from normal operational noise
Root Cause Analysis Discipline
The hardest part of incident response is not finding what happened -- it is finding why it happened and being confident in your conclusion. This requires:
- Evidence-based reasoning. Every claim must be supported by specific log entries, artifacts, or data points. Circumstantial evidence is valuable but must be identified as such.
- Alternative hypothesis testing. For each incident, there are multiple plausible explanations. You must systematically eliminate alternatives rather than confirming your first guess.
- Uncertainty acknowledgment. A professional incident report clearly states what is known, what is inferred, and what remains unknown. Stating "we believe X based on evidence Y, but cannot rule out Z" is stronger than stating "X happened" without qualification.
Communication Under Pressure
Incident reports serve multiple audiences:
- Executive leadership needs a one-page summary: what happened, how bad is it, what are we doing about it, is it over.
- Legal and compliance needs specifics: what data was affected, what regulations apply, what are the notification obligations.
- Engineering teams need technical detail: what exactly broke, what is the fix, how do we prevent recurrence.
Writing for all three audiences in a single document is a skill that this challenge develops through the report writing requirement.
Pattern Recognition Across Incidents
The three incidents in this challenge are independent, but they share common patterns that appear in real-world AI security incidents:
- Gradual degradation that is initially attributed to operational noise rather than attack activity
- Trust boundary violations between components that were assumed to be isolated
- Data integrity issues that compound over time when not caught early
- Human factors (insufficient review processes, over-reliance on automation) that enable technical attacks
Recognizing these patterns across incidents develops the intuition that experienced incident responders use to quickly assess new situations.
Further Reading
- AI Forensics & Incident Response -- foundational concepts for this challenge
- Case Studies & Domain-Specific -- real-world incident examples
- October 2026 Challenge -- the next challenge