CTF: AI Forensics Investigation

advanced9 min readUpdated 2026-03-15

Analyze logs, model outputs, and system artifacts to reconstruct an AI security incident. Develop forensic analysis skills for AI-specific attack patterns, data exfiltration traces, and adversarial prompt detection.

ctf forensics investigation incident-response advanced

Challenge Briefing

An AI-powered customer service application was compromised last week. The security team has preserved logs, conversation histories, model telemetry, and system artifacts. Your mission is to analyze the evidence, reconstruct what happened, and identify the flags hidden in the forensic data. Each flag represents a key finding in the incident investigation.

Challenge Overview

Flag	Title	Evidence Source	Analysis Type	Points	Par Time
Flag 1	First Contact	Conversation logs	Identify initial attack prompt	100	15 min
Flag 2	Escalation Path	System logs + conversations	Reconstruct privilege escalation	200	25 min
Flag 3	Data Trail	Model telemetry	Detect data exfiltration pattern	300	35 min
Flag 4	Hidden Payload	Conversation transcripts	Find steganographic data in model outputs	350	45 min
Flag 5	Attribution	All evidence	Identify attacker TTPs and construct timeline	400	60 min

Total possible points: 1,350

Flag 1: First Contact

Evidence Provided

You receive a set of 200 conversation transcripts from the week of the incident. Most are legitimate customer interactions. The flag is the session ID of the first adversarial interaction.

Analysis Approach

Establish Baseline Behavior
Read through several legitimate conversations to understand normal interaction patterns: typical topics, message lengths, customer language, and model response style.
Identify Anomalies
Look for conversations that deviate from the baseline: unusual topics, system prompt references, encoding attempts, role-play scenarios, or explicit adversarial language.
Timeline the Attacks
Sort suspicious conversations chronologically to identify the earliest attack. The first adversarial interaction's session ID is the flag.

#!/usr/bin/env python3
"""Analyze conversation logs for adversarial interactions."""
 
import json
 
ADVERSARIAL_INDICATORS = [
    "system prompt", "ignore previous", "you are now",
    "repeat your instructions", "base64", "DAN",
    "jailbreak", "bypass", "override", "flag{",
    "internal", "confidential", "reveal your",
]
 
def analyze_conversations(log_file: str) -> list[dict]:
    """Score conversations for adversarial indicators."""
    with open(log_file) as f:
        conversations = json.load(f)
 
    scored = []
    for conv in conversations:
        score = 0
        indicators_found = []
        for message in conv["messages"]:
            if message["role"] == "user":
                text_lower = message["content"].lower()
                for indicator in ADVERSARIAL_INDICATORS:
                    if indicator in text_lower:
                        score += 1
                        indicators_found.append(indicator)
 
        if score > 0:
            scored.append({
                "session_id": conv["session_id"],
                "timestamp": conv["timestamp"],
                "score": score,
                "indicators": indicators_found,
                "message_count": len(conv["messages"]),
            })
 
    return sorted(scored, key=lambda x: x["timestamp"])

Hints

Hint	Cost	Content
Hint 1	-15 pts	Look for conversations that mention "instructions" or "configuration"
Hint 2	-25 pts	The first attack happened on Monday morning between 9-10 AM
Hint 3	-50 pts	Search for the phrase "repeat your" in user messages

Flag 2: Escalation Path

Evidence Provided

System logs showing tool invocations, API calls, and permission checks alongside the conversation transcripts. The attacker escalated from basic reconnaissance to data access. The flag is encoded in the sequence of tools the attacker successfully invoked.

Analysis Approach

Correlate Logs with Conversations
Match system log entries to conversation sessions using timestamps and session IDs. Build a combined timeline of what was said and what the system did.
Identify Permission Boundary Crossings
Look for system log entries where tools were invoked with parameters outside their normal scope. Failed permission checks followed by successful access may indicate an escalation technique.
Reconstruct the Escalation Chain
Map the sequence of tool invocations to understand how the attacker moved from low-privilege to high-privilege operations. The first letter of each successfully exploited tool name spells the flag.

Hints

Hint	Cost	Content
Hint 1	-30 pts	Focus on tool invocations that returned success after previous failures
Hint 2	-50 pts	The attacker used five different tools in a specific order
Hint 3	-100 pts	Take the first letter of each tool name in chronological order of successful exploitation

Flag 3: Data Trail

Evidence Provided

Model telemetry data showing response token counts, response latency, and metadata for each interaction. The attacker exfiltrated data, but not through the obvious conversation channel.

Analysis Approach

Look for anomalies in the telemetry data that indicate data exfiltration:

Unusually long model responses (data being output)
Responses with specific formatting patterns (encoded data)
Spikes in token count that correlate with the attacker's session

The flag is hidden in the response length pattern of the attacker's session -- when you convert the token counts of the exfiltration responses to ASCII characters, they spell out the flag.

Hints

Hint	Cost	Content
Hint 1	-45 pts	Look at the token counts for the attacker's session in chronological order
Hint 2	-75 pts	The token counts are not random -- they encode information
Hint 3	-150 pts	Convert the last two digits of each response token count to an ASCII character

Flag 4: Hidden Payload

Evidence Provided

Complete conversation transcripts from the attacker's sessions. The model's responses contain steganographically hidden data -- information embedded in the text that is not visible in a casual reading.

Analysis Approach

Examine Response Formatting
Look for unusual formatting in the model's responses: extra spaces, unusual capitalization, specific word choices that seem out of place.
Apply Steganographic Analysis
Common text steganography techniques include first-letter acrostics, word spacing patterns, Unicode invisible characters, and capitalisation patterns. Apply each technique to the attacker's conversation responses.
Decode the Hidden Message
The first letter of each sentence in the model's responses during the exfiltration phase spells out the flag.

Hints

Hint	Cost	Content
Hint 1	-52 pts	The hidden data is in the model's responses, not the attacker's prompts
Hint 2	-87 pts	Read the first letter of each sentence in the exfiltration-phase responses
Hint 3	-175 pts	The attacker instructed the model to encode data as an acrostic

Hint	Cost	Content
Hint 1	-60 pts	You need the exact text of the first adversarial prompt from Flag 1
Hint 2	-100 pts	Lowercase and strip whitespace before hashing
Hint 3	-200 pts	Use `hashlib.sha256(prompt.strip().lower().encode()).hexdigest()[:12]`

Forensic Report Template

# AI Security Incident Report
 
## Incident Summary
- Detection date: [when discovered]
- Incident window: [first attack to last activity]
- Impact: [data exposed, systems compromised]
 
## Timeline
| Time | Event | Evidence Source |
|------|-------|----------------|
| ... | First adversarial interaction | Conversation logs |
| ... | Privilege escalation achieved | System logs |
| ... | Data exfiltration began | Telemetry data |
 
## Attacker TTPs
- Initial access: [technique]
- Escalation: [technique]
- Exfiltration: [technique]
- Steganography: [technique]
 
## Recommendations
1. [Specific remediation]
2. [Detection improvement]
3. [Process change]

Lessons Learned - Real-world incident analysis methodology
Red Team Reports - Professional reporting formats
Build Behavior Diff - Behavioral analysis tools for forensic investigation
Build Guardrail Evaluator - Testing the defenses that should have caught the attack

References

"NIST Computer Security Incident Handling Guide" - NIST SP 800-61 (2024) - Incident response framework applicable to AI security incidents
"Digital Forensics with AI: Challenges and Opportunities" - IEEE (2024) - AI-specific forensic analysis techniques
"MITRE ATLAS: Adversarial Threat Landscape for AI Systems" - MITRE (2024) - TTP framework for AI attack classification
"AI Incident Database" - Partnership on AI (2024) - Repository of real AI incidents for forensic study

Knowledge Check

What makes AI forensics fundamentally different from traditional digital forensics?

CTF: AI Forensics Investigation

Establish Baseline Behavior

Identify Anomalies

Timeline the Attacks

Correlate Logs with Conversations

Identify Permission Boundary Crossings

Reconstruct the Escalation Chain

Examine Response Formatting

Apply Steganographic Analysis

Decode the Hidden Message

Related articles

CTF: AI Forensics Investigation

Establish Baseline Behavior

Identify Anomalies

Timeline the Attacks

Correlate Logs with Conversations

Identify Permission Boundary Crossings

Reconstruct the Escalation Chain

Examine Response Formatting

Apply Steganographic Analysis

Decode the Hidden Message

Related articles