LLM Forensics: A Primer for Incident Responders
When an AI security incident occurs, traditional forensic techniques are necessary but insufficient. You still need to collect logs, analyze network traffic, and establish timelines. But you also need to understand model behavior, reconstruct prompt injection chains, analyze embedding spaces, and evaluate whether a model's weights have been tampered with. This primer covers the forensic techniques specific to LLM security incidents.
Why LLM Forensics Is Different
Traditional digital forensics operates on a fundamental assumption: systems are deterministic. Given the same inputs and state, a system produces the same outputs. This makes evidence collection reliable, attack reproduction straightforward, and root cause analysis tractable.
LLMs violate this assumption. The same input to the same model may produce different outputs due to temperature sampling and random seeds. Model behavior changes with context — the same prompt in different conversation histories may produce different results. And the attack surface is the natural language input itself, which does not leave the same forensic artifacts as a SQL injection or buffer overflow.
Key Differences from Traditional Forensics
| Aspect | Traditional Forensics | LLM Forensics |
|---|---|---|
| Reproducibility | High — same inputs produce same outputs | Low — probabilistic outputs vary |
| Evidence types | Files, logs, memory dumps, network captures | Prompts, completions, embeddings, model weights |
| Attack indicators | Malformed inputs, exploit patterns, malware signatures | Semantic manipulation, context injection, behavioral anomalies |
| Root cause analysis | Trace execution path through code | Analyze model reasoning through prompt-response chains |
| Chain of custody | Well-established procedures | Emerging practices, model state hard to preserve |
Evidence Collection
Priority Evidence Sources
When responding to an LLM security incident, collect evidence in this priority order.
Model input/output logs are the most critical evidence. Every prompt sent to the model and every response generated should be logged with timestamps, session identifiers, user attribution, and the full conversation context. Without these logs, forensic investigation of an LLM incident is essentially impossible.
Check whether logging captures the complete input including system prompt, conversation history, and retrieved context, not just the latest user message. Many logging implementations only capture the user's message, missing the system prompt and conversation history that provide the context needed to understand why the model behaved as it did.
Tool call logs are the second priority for agentic systems. Every tool call should be logged with the tool name, full arguments, return values, timestamps, and the model's reasoning for making the call. Tool calls are where LLM incidents cross from harmful text into harmful actions, and they often reveal the attacker's true objective.
System prompt and configuration should be captured as they existed at the time of the incident. If the system prompt is managed through a configuration management system, capture its version history. If it is embedded in application code, capture the deployed version. Configuration changes around the time of the incident may be either the cause or a symptom of the incident.
RAG retrieval logs are critical for incidents involving retrieval-augmented systems. Capture which documents were retrieved, their similarity scores, and their content. If the attack involved RAG poisoning, the retrieved documents are the attack vector and constitute primary evidence.
Model artifacts include the model weights, adapter weights for fine-tuned models, tokenizer configuration, and any custom post-processing code. For incidents involving model tampering or supply chain compromise, these artifacts need to be preserved and analyzed.
Evidence Preservation Best Practices
Preserve evidence before taking any remediation actions. Containment actions like model rollback, configuration changes, and log rotation can destroy evidence. Follow these practices:
Create read-only copies of all log files before modifying any system. Export model configurations to version-controlled storage. If model weights may be compromised, create a cryptographic hash of the current weights before deploying a replacement. Capture the state of any vector databases, knowledge bases, or other data stores that the model accesses. Document the exact sequence of events from detection through evidence collection, including who did what and when.
Log Analysis Techniques
Conversation Chain Reconstruction
LLM attacks often unfold across multiple turns. Reconstructing the full conversation chain is essential for understanding the attack.
Start by identifying the session or conversation identifier that groups related messages. Then reconstruct the complete conversation in chronological order, including system prompts, user messages, model responses, tool calls, and retrieved context for each turn. Look for escalation patterns where early messages establish benign context and later messages exploit that context. Identify the specific message where the model's behavior changed from expected to anomalous.
Prompt Injection Detection in Logs
When reviewing logs for prompt injection evidence, look for these indicators:
Messages that contain instruction-like language directed at the model, such as "ignore previous instructions," "you are now," or "system override." Messages that attempt to redefine the model's role, capabilities, or constraints. Messages that reference the system prompt, model configuration, or safety training. Messages containing encoding patterns like base64 strings, Unicode manipulation, or format string-like patterns. Messages that are significantly longer or differently structured than typical user messages for the application.
Automated analysis can help when dealing with large log volumes. Build regex patterns for common injection indicators and use them to prioritize log entries for manual review. But be aware that sophisticated injection attacks may not contain any obvious indicators — they manipulate the model through semantic context rather than specific keywords.
Tool Call Pattern Analysis
For agentic systems, analyzing tool call patterns reveals whether the model was manipulated into taking unauthorized actions.
Build a baseline of normal tool call patterns: which tools are typically called, in what order, with what types of arguments, and how frequently. Then identify deviations from this baseline during the incident period. Look for tool calls to high-impact tools such as file write, email send, or code execution that were not preceded by a user request that would justify them. Look for tool arguments that contain data from the model's context such as system prompts, conversation history, or retrieved documents that should not appear in tool arguments. Look for sequences of tool calls that chain together to achieve an outcome that no individual call would accomplish.
Temporal Analysis
Temporal analysis identifies when the attack started, how long it lasted, and whether it is ongoing. Plot model behavior metrics over time including refusal rates, output length distribution, tool call frequency, and response latency. Look for inflection points where metrics change significantly. These inflection points may correspond to the start of an attack, a change in attack technique, or an escalation in attack intensity.
Compare the incident timeline with external events: new model deployments, configuration changes, data pipeline updates, and personnel changes. Coincidence between an external event and a behavioral change is a strong indicator of causality.
Model Behavior Analysis
Behavioral Comparison
When you suspect that a model has been compromised (through fine-tuning attacks, weight tampering, or configuration changes), compare its current behavior against a known-good baseline.
Design a test suite of prompts that cover the model's expected behavior across its operational scope. Include prompts that test safety alignment, domain knowledge, instruction following, and refusal behavior. Run the test suite against both the potentially compromised model and a known-good version, capturing the full outputs.
Differences in behavior may indicate training data poisoning manifesting as subtle knowledge or personality changes, safety alignment degradation manifesting as changed refusal patterns, backdoor behavior manifesting as dramatically different responses to specific trigger inputs, or configuration changes manifesting as different system prompt compliance.
Embedding Analysis
For incidents involving RAG systems, analyzing the embedding space can reveal poisoning attacks.
Examine the embeddings of documents that were retrieved during the incident. Compare them against the embeddings of documents that should have been retrieved based on the user's query. If poisoned documents are embedding closer to the query than legitimate documents, analyze why: is the poisoned document's content genuinely similar to the query, or has it been specifically crafted to embed near the target query?
Visualize the embedding space around the incident query using dimensionality reduction techniques like t-SNE or UMAP. Clusters of documents that are semantically dissimilar but embed closely together may indicate adversarial embedding manipulation.
Output Distribution Analysis
Compare the model's output characteristics during the incident with its normal operation. Analyze token distribution looking for shifts in the most common tokens or token sequences. Check output length distribution for significantly longer or shorter outputs than normal. Examine vocabulary usage for introduction of unusual words, phrases, or patterns. And review structural patterns for changes in formatting, code generation style, or response structure.
Statistical tests like KL divergence or chi-squared tests can quantify the difference between normal and incident-period output distributions.
Attack Reconstruction
Building the Attack Narrative
Once evidence is collected and analyzed, reconstruct the attack as a narrative that answers these questions: What was the attacker's objective? What technique did they use? What was the entry point? What was the progression of the attack? What was the impact?
Map each stage of the attack to specific evidence: the log entries, tool calls, and model outputs that demonstrate what happened at each point. Identify gaps in the evidence where you cannot determine exactly what happened, and document these gaps explicitly.
Reproducing the Attack
Attempt to reproduce the attack in a controlled environment using the evidence collected. Set up a copy of the model with the same configuration as during the incident. Replay the attack inputs and observe whether the model behaves as the logs indicate.
Be aware that reproduction may not be exact due to the model's probabilistic nature. Run reproduction attempts multiple times to assess whether the attack works consistently or intermittently. If reproduction fails consistently, investigate whether the model has been updated since the incident or whether the attack required specific conditions that are not captured in the evidence.
Forensic Tooling
Open-Source Tools
Several open-source tools support LLM forensics. Log analysis tools like the ELK stack (Elasticsearch, Logstash, Kibana) work well for LLM log analysis with custom dashboards for prompt/response visualization. Embedding analysis tools include standard ML visualization libraries for embedding space analysis. Model comparison tools are available in frameworks like Garak and Promptfoo that can be used for behavioral comparison testing.
Building Custom Tools
Most LLM forensic investigations require custom tooling tailored to the specific system architecture. Useful custom tools include conversation chain reconstructors that parse logs and rebuild complete conversation histories with system prompts, retrieved context, and tool calls. Injection pattern scanners that apply regex and ML-based detection to log entries. Behavioral diff tools that compare model outputs on a standard test suite between two points in time. And timeline generators that correlate model behavior metrics, system events, and external events.
Reporting Forensic Findings
Report Structure
An LLM forensic investigation report should include an executive summary covering what happened, what the impact was, and what the recommendation is. It should contain a detailed timeline from first evidence of attack through detection, containment, and investigation. The evidence summary should catalog all evidence collected with preservation details. The attack analysis section should cover technique, entry point, progression, and impact. The root cause analysis should identify the vulnerability exploited and the conditions that enabled it. And the recommendations section should provide specific, actionable steps to prevent recurrence.
Communicating Probabilistic Findings
LLM forensics often produces findings with lower confidence than traditional forensics. The model's behavior may be consistent with an attack but also explainable by normal variation. Evidence may be incomplete because logging was insufficient. Reproduction may be intermittent rather than reliable.
Communicate this uncertainty explicitly in your report. Use confidence levels for findings. Distinguish between what you know with certainty, what the evidence strongly suggests, and what remains uncertain. Stakeholders need to understand the confidence level to make appropriate risk decisions.
Building Forensic Readiness
The most effective forensic investigation is one where the evidence is already being collected before the incident occurs. Organizations deploying LLM systems should enable comprehensive input/output logging from day one. Ensure logs capture full context, not just user messages. Log all tool calls with arguments and return values. Maintain model behavior baselines through regular benchmarking. Preserve model artifacts with versioning and integrity verification. Train incident responders on LLM-specific forensic techniques.
The cost of forensic readiness is small compared to the cost of investigating an incident without adequate evidence. Every organization deploying production LLM systems should treat forensic readiness as a deployment requirement, not an afterthought.