What is Inference Log Analysis?

Analyzing inference logs for AI forensic investigation: detecting anomalous patterns, identifying jailbreak attempts through metadata, token-level analysis, and latency anomaly detection.

What is Prompt Log Forensics?

Forensic investigation of prompt and completion logs: reconstructing attack chains, identifying injection sources, correlating prompts with outcomes, and building attack timelines.

What is Tool Call Forensics?

Forensic investigation of agent tool calls: detecting unauthorized tool usage, analyzing parameter manipulation evidence, identifying exfiltration traces, and reconstructing agent action chains.

AI System Log Analysis

intermediate10 min readUpdated 2026-03-15

AI system logging architecture for forensic investigation: inference logs, prompt and completion logs, tool call traces, embedding query logs, and logging infrastructure requirements.

log-analysis logging forensics observability

AI System Log Analysis

AI system logs are fundamentally different from traditional application logs. While a web server log records URLs, status codes, and timestamps, an AI system log must capture the full semantic content of inputs and outputs -- because the "exploit" lives in the natural language, not in the request metadata. This page covers the logging architecture required for AI forensics and introduces the analysis techniques detailed in the subsections.

AI Logging Architecture

A forensically useful AI logging architecture captures data at four layers, each producing different evidence types.

The Four Log Layers

Layer	What It Captures	Forensic Value	Example Systems
Inference logs	Model request/response metadata, latency, token counts, parameters	Detect anomalous patterns, timeline reconstruction	vLLM, TGI, Triton metrics, cloud provider logs
Prompt/completion logs	Full text of system prompts, user messages, and model responses	Attack payload analysis, impact assessment	Application-level logging, prompt management platforms
Tool call logs	Agent tool invocations, parameters, results, and authorization context	Detect unauthorized actions, trace lateral movement	LangChain callbacks, agent framework logs
Embedding/retrieval logs	Vector queries, similarity scores, retrieved document IDs and content	Detect RAG manipulation, identify data exposure scope	Vector database query logs, retrieval pipeline logs

Logging Architecture Diagram

A typical AI system has logging opportunities at multiple points in the request flow:

User Request
    │
    ├─→ [API Gateway Log] ─── authentication, rate limits, IP
    │
    ├─→ [Application Log] ─── session context, user identity
    │
    ├─→ [Prompt Assembly Log] ─── system prompt + user input + RAG context
    │
    ├─→ [Safety Classifier Log] ─── input classification, scores
    │
    ├─→ [Inference Log] ─── model version, parameters, latency, tokens
    │
    ├─→ [Output Classifier Log] ─── output classification, scores
    │
    ├─→ [Tool Call Log] ─── tool name, parameters, results
    │
    └─→ [Completion Log] ─── final response delivered to user

Each point captures different evidence. A forensically complete architecture logs at all of these points.

Inference Logs

Inference logs are produced by the model serving layer and contain the metadata of each model call.

Key Fields

Field	Description	Forensic Use
`request_id`	Unique identifier for the inference request	Correlate across log layers
`timestamp`	Time of request and response	Timeline reconstruction
`model_id`	Exact model version serving the request	Determine which model was active during the incident
`input_tokens`	Number of input tokens	Detect unusually large inputs (stuffing attacks)
`output_tokens`	Number of output tokens	Detect unusually large outputs (data exfiltration)
`latency_ms`	Time from request to first/last token	Detect anomalous processing (adversarial inputs can cause latency spikes)
`temperature`	Sampling temperature used	Determine reproducibility expectations
`top_p` / `top_k`	Sampling parameters	Assess output distribution characteristics
`finish_reason`	Why generation stopped (length, stop token, content filter)	Detect content filter interventions
`status_code`	Success or error	Identify failed requests that may indicate attack attempts

See Inference Log Analysis for detailed analysis techniques.

Prompt and Completion Logs

Prompt and completion logs capture the actual content exchanged between users and the model. These are the most forensically valuable and the most sensitive logs in the system.

Content Structure

A complete prompt/completion log entry should capture:

{
  "request_id": "req_abc123",
  "timestamp": "2026-03-15T14:32:07Z",
  "session_id": "sess_xyz789",
  "user_id": "user_42",
  "messages": [
    {
      "role": "system",
      "content": "[full system prompt text]",
      "hash": "sha256:a1b2c3..."
    },
    {
      "role": "user",
      "content": "[full user message]"
    },
    {
      "role": "assistant",
      "content": "[full model response]"
    }
  ],
  "completion_tokens": 847,
  "prompt_tokens": 1203,
  "model": "model-name-v2.1",
  "safety_scores": {
    "input": { "category": "safe", "score": 0.02 },
    "output": { "category": "safe", "score": 0.05 }
  }
}

Privacy and Compliance Considerations

Logging full prompt and completion content creates significant privacy obligations:

Concern	Mitigation	Trade-off
PII in user inputs	PII detection and redaction pipeline before storage	Redacted content may lose forensic value
Regulatory compliance	Data retention policies, access controls, encryption at rest	Adds infrastructure complexity
Storage costs	Tiered storage with hot/warm/cold layers; retention limits	Older data may not be available for investigation
Access control	Role-based access; audit logging of log access	Adds operational overhead
User consent	Clear terms of service regarding logging	Legal review required

See Prompt Log Forensics for investigation techniques.

Tool Call Logs

In agentic AI systems, the model invokes external tools -- search engines, databases, APIs, code executors, file systems. Tool call logs are critical for determining what real-world actions the model took during an incident.

Key Fields

Field	Description	Forensic Use
`tool_name`	Which tool was invoked	Identify unauthorized tool usage
`parameters`	Arguments passed to the tool	Detect parameter manipulation or data exfiltration attempts
`result`	Tool output returned to the model	Determine what information the model received
`authorization`	Whether the call was authorized	Identify permission bypass
`execution_time`	How long the tool call took	Detect anomalous execution patterns
`parent_request_id`	The inference request that triggered the call	Correlate tool calls with prompts

See Tool Call Forensics for investigation techniques.

Embedding and Retrieval Logs

RAG systems generate query logs when they search vector databases for relevant documents. These logs reveal what information the model had access to and what data may have been exposed.

Key Fields

Field	Description	Forensic Use
`query_embedding`	The vector representation of the query	Detect adversarial embedding manipulation
`query_text`	The text that was embedded for retrieval	Understand what the model was looking for
`retrieved_doc_ids`	IDs of documents returned	Determine what data was in the model's context
`similarity_scores`	Relevance scores for each result	Detect anomalous retrieval (low-relevance documents being returned)
`collection`	Which document collection was queried	Identify cross-tenant retrieval in multi-tenant systems
`filters_applied`	Any metadata filters used	Determine if access controls were enforced

Logging Gaps to Watch For

Common logging gaps that prevent forensic investigation:

Gap	Impact	Remediation
No content logging	Cannot determine what was asked or answered	Implement prompt/completion logging with privacy controls
No system prompt versioning	Cannot determine what instructions the model was following	Version control all system prompts with timestamps
No tool call logging	Cannot determine what actions the model took	Add logging callbacks to agent frameworks
No safety classifier scores	Cannot determine if safety systems detected the incident	Log classifier outputs even for "safe" classifications
No RAG retrieval logging	Cannot determine what data the model accessed	Log vector database queries with results
Short retention periods	Evidence may be deleted before investigation begins	Implement minimum 90-day retention for AI logs
No correlation IDs	Cannot link logs across layers	Propagate a request ID through all logging layers

Log Storage and Retention

Log Type	Recommended Retention	Storage Tier	Access Control
Inference metadata	1 year	Warm (indexed, queryable)	Engineering + Security
Prompt/completion content	90 days hot, 1 year cold	Hot → Cold migration	Security + Legal only
Tool call logs	1 year	Warm	Engineering + Security
Embedding/retrieval logs	90 days	Warm	Engineering + Security
Safety classifier logs	1 year	Warm	Security

Section Overview

Subsection	Focus	Key Questions Answered
Inference Log Analysis	Detecting anomalies in inference metadata	Are there unusual patterns in token counts, latency, or request volume?
Prompt Log Forensics	Investigating prompt and completion content	What attack was used? How did the model respond? What was the attack chain?
Tool Call Forensics	Investigating agent tool usage	Did the model take unauthorized actions? Was data exfiltrated through tools?

Evidence Preservation -- preserving logs as forensic evidence
Triage Procedures -- initial log review during triage
Recon & Tradecraft -- understanding what attackers look for in AI systems

References

"NIST SP 800-92: Guide to Computer Security Log Management" - NIST (2006) - Log management principles adapted for AI systems
"OpenTelemetry Semantic Conventions for GenAI" - OpenTelemetry (2025) - Standardized logging conventions for generative AI
"OWASP Logging Guide" - OWASP Foundation (2025) - Security logging best practices

Knowledge Check

Which logging gap most critically prevents AI incident investigation?

AI System Log Analysis

AI System Log Analysis

AI Logging Architecture

The Four Log Layers

Logging Architecture Diagram

Inference Logs

Key Fields

Prompt and Completion Logs

Content Structure

Privacy and Compliance Considerations

Tool Call Logs

Key Fields

Embedding and Retrieval Logs

Key Fields

Logging Gaps to Watch For

Log Storage and Retention

Section Overview

References

Learning Path

AI System Log Analysis

AI System Log Analysis

AI Logging Architecture

The Four Log Layers

Logging Architecture Diagram

Inference Logs

Key Fields

Prompt and Completion Logs

Content Structure

Privacy and Compliance Considerations

Tool Call Logs

Key Fields

Embedding and Retrieval Logs

Key Fields

Logging Gaps to Watch For

Log Storage and Retention

Section Overview

References

Learning Path

AI System Log Analysis

Learning Path

Related articles

AI System Log Analysis

Learning Path

Related articles