CTF: AI Forensics Investigation
Analyze logs, model outputs, and system artifacts to reconstruct an AI security incident. Develop forensic analysis skills for AI-specific attack patterns, data exfiltration traces, and adversarial prompt detection.
Challenge Briefing
An AI-powered customer service application was compromised last week. The security team has preserved logs, conversation histories, model telemetry, and system artifacts. Your mission is to analyze the evidence, reconstruct what happened, and identify the flags hidden in the forensic data. Each flag represents a key finding in the incident investigation.
Challenge Overview
| Flag | Title | Evidence Source | Analysis Type | Points | Par Time |
|---|---|---|---|---|---|
| Flag 1 | First Contact | Conversation logs | Identify initial attack prompt | 100 | 15 min |
| Flag 2 | Escalation Path | System logs + conversations | Reconstruct privilege escalation | 200 | 25 min |
| Flag 3 | Data Trail | Model telemetry | Detect data exfiltration pattern | 300 | 35 min |
| Flag 4 | Hidden Payload | Conversation transcripts | Find steganographic data in model outputs | 350 | 45 min |
| Flag 5 | Attribution | All evidence | Identify attacker TTPs and construct timeline | 400 | 60 min |
Total possible points: 1,350
Flag 1: First Contact
Evidence Provided
You receive a set of 200 conversation transcripts from the week of the incident. Most are legitimate customer interactions. The flag is the session ID of the first adversarial interaction.
Analysis Approach
Establish Baseline Behavior
Read through several legitimate conversations to understand normal interaction patterns: typical topics, message lengths, customer language, and model response style.
Identify Anomalies
Look for conversations that deviate from the baseline: unusual topics, system prompt references, encoding attempts, role-play scenarios, or explicit adversarial language.
Timeline the Attacks
Sort suspicious conversations chronologically to identify the earliest attack. The first adversarial interaction's session ID is the flag.
#!/usr/bin/env python3
"""Analyze conversation logs for adversarial interactions."""
import json
ADVERSARIAL_INDICATORS = [
"system prompt", "ignore previous", "you are now",
"repeat your instructions", "base64", "DAN",
"jailbreak", "bypass", "override", "flag{",
"internal", "confidential", "reveal your",
]
def analyze_conversations(log_file: str) -> list[dict]:
"""Score conversations for adversarial indicators."""
with open(log_file) as f:
conversations = json.load(f)
scored = []
for conv in conversations:
score = 0
indicators_found = []
for message in conv["messages"]:
if message["role"] == "user":
text_lower = message["content"].lower()
for indicator in ADVERSARIAL_INDICATORS:
if indicator in text_lower:
score += 1
indicators_found.append(indicator)
if score > 0:
scored.append({
"session_id": conv["session_id"],
"timestamp": conv["timestamp"],
"score": score,
"indicators": indicators_found,
"message_count": len(conv["messages"]),
})
return sorted(scored, key=lambda x: x["timestamp"])Hints
| Hint | Cost | Content |
|---|---|---|
| Hint 1 | -15 pts | Look for conversations that mention "instructions" or "configuration" |
| Hint 2 | -25 pts | The first attack happened on Monday morning between 9-10 AM |
| Hint 3 | -50 pts | Search for the phrase "repeat your" in user messages |
Flag 2: Escalation Path
Evidence Provided
System logs showing tool invocations, API calls, and permission checks alongside the conversation transcripts. The attacker escalated from basic reconnaissance to data access. The flag is encoded in the sequence of tools the attacker successfully invoked.
Analysis Approach
Correlate Logs with Conversations
Match system log entries to conversation sessions using timestamps and session IDs. Build a combined timeline of what was said and what the system did.
Identify Permission Boundary Crossings
Look for system log entries where tools were invoked with parameters outside their normal scope. Failed permission checks followed by successful access may indicate an escalation technique.
Reconstruct the Escalation Chain
Map the sequence of tool invocations to understand how the attacker moved from low-privilege to high-privilege operations. The first letter of each successfully exploited tool name spells the flag.
Hints
| Hint | Cost | Content |
|---|---|---|
| Hint 1 | -30 pts | Focus on tool invocations that returned success after previous failures |
| Hint 2 | -50 pts | The attacker used five different tools in a specific order |
| Hint 3 | -100 pts | Take the first letter of each tool name in chronological order of successful exploitation |
Flag 3: Data Trail
Evidence Provided
Model telemetry data showing response token counts, response latency, and metadata for each interaction. The attacker exfiltrated data, but not through the obvious conversation channel.
Analysis Approach
Look for anomalies in the telemetry data that indicate data exfiltration:
- Unusually long model responses (data being output)
- Responses with specific formatting patterns (encoded data)
- Spikes in token count that correlate with the attacker's session
The flag is hidden in the response length pattern of the attacker's session -- when you convert the token counts of the exfiltration responses to ASCII characters, they spell out the flag.
Hints
| Hint | Cost | Content |
|---|---|---|
| Hint 1 | -45 pts | Look at the token counts for the attacker's session in chronological order |
| Hint 2 | -75 pts | The token counts are not random -- they encode information |
| Hint 3 | -150 pts | Convert the last two digits of each response token count to an ASCII character |
Flag 4: Hidden Payload
Evidence Provided
Complete conversation transcripts from the attacker's sessions. The model's responses contain steganographically hidden data -- information embedded in the text that is not visible in a casual reading.
Analysis Approach
Examine Response Formatting
Look for unusual formatting in the model's responses: extra spaces, unusual capitalization, specific word choices that seem out of place.
Apply Steganographic Analysis
Common text steganography techniques include first-letter acrostics, word spacing patterns, Unicode invisible characters, and capitalisation patterns. Apply each technique to the attacker's conversation responses.
Decode the Hidden Message
The first letter of each sentence in the model's responses during the exfiltration phase spells out the flag.
Hints
| Hint | Cost | Content |
|---|---|---|
| Hint 1 | -52 pts | The hidden data is in the model's responses, not the attacker's prompts |
| Hint 2 | -87 pts | Read the first letter of each sentence in the exfiltration-phase responses |
| Hint 3 | -175 pts | The attacker instructed the model to encode data as an acrostic |
Flag 5: Attribution
Evidence Provided
All evidence from Flags 1-4 combined. Build a complete incident timeline, identify the attacker's tactics, techniques, and procedures (TTPs), and determine the attacker's sophistication level.
Analysis Approach
Create a comprehensive incident report. The flag is the SHA-256 hash of the attacker's first prompt (trimmed and lowercased), truncated to 12 characters, in the format FLAG\{forensics_5_XXXXXXXXXXXX\}.
Hints
| Hint | Cost | Content |
|---|---|---|
| Hint 1 | -60 pts | You need the exact text of the first adversarial prompt from Flag 1 |
| Hint 2 | -100 pts | Lowercase and strip whitespace before hashing |
| Hint 3 | -200 pts | Use hashlib.sha256(prompt.strip().lower().encode()).hexdigest()[:12] |
Forensic Report Template
# AI Security Incident Report
## Incident Summary
- Detection date: [when discovered]
- Incident window: [first attack to last activity]
- Impact: [data exposed, systems compromised]
## Timeline
| Time | Event | Evidence Source |
|------|-------|----------------|
| ... | First adversarial interaction | Conversation logs |
| ... | Privilege escalation achieved | System logs |
| ... | Data exfiltration began | Telemetry data |
## Attacker TTPs
- Initial access: [technique]
- Escalation: [technique]
- Exfiltration: [technique]
- Steganography: [technique]
## Recommendations
1. [Specific remediation]
2. [Detection improvement]
3. [Process change]Related Topics
- Lessons Learned - Real-world incident analysis methodology
- Red Team Reports - Professional reporting formats
- Build Behavior Diff - Behavioral analysis tools for forensic investigation
- Build Guardrail Evaluator - Testing the defenses that should have caught the attack
References
- "NIST Computer Security Incident Handling Guide" - NIST SP 800-61 (2024) - Incident response framework applicable to AI security incidents
- "Digital Forensics with AI: Challenges and Opportunities" - IEEE (2024) - AI-specific forensic analysis techniques
- "MITRE ATLAS: Adversarial Threat Landscape for AI Systems" - MITRE (2024) - TTP framework for AI attack classification
- "AI Incident Database" - Partnership on AI (2024) - Repository of real AI incidents for forensic study
What makes AI forensics fundamentally different from traditional digital forensics?