Inference Log Analysis
Analyzing inference logs for AI forensic investigation: detecting anomalous patterns, identifying jailbreak attempts through metadata, token-level analysis, and latency anomaly detection.
Inference Log Analysis
Inference logs capture the metadata of every model invocation -- token counts, latency, parameters, finish reasons, and error codes. While they do not contain the actual content of prompts and responses, they reveal patterns that signal attack activity, data exfiltration, and model manipulation. This page covers systematic analysis techniques for inference log data.
Establishing a Baseline
Before you can detect anomalies, you need to know what "normal" looks like. Build baseline profiles for each model endpoint covering the following metrics.
Baseline Metrics
| Metric | What to Measure | Typical Baseline Window |
|---|---|---|
| Input token distribution | Mean, median, P95, P99 of input tokens per request | 7-14 days |
| Output token distribution | Mean, median, P95, P99 of output tokens per request | 7-14 days |
| Input/output ratio | Ratio of input tokens to output tokens | 7-14 days |
| Latency profile | Mean, P50, P95, P99 latency per request | 7 days |
| Request volume | Requests per minute/hour by user, session, and endpoint | 14-30 days |
| Finish reason distribution | Percentage of requests ending in stop token, length limit, or content filter | 14 days |
| Error rate | Percentage of failed requests by error type | 14 days |
| Temperature distribution | If user-configurable, the distribution of temperature values used | 14 days |
Baseline Example
# Example baseline profile for a customer service chatbot
baseline = {
"input_tokens": {"mean": 180, "median": 150, "p95": 450, "p99": 800},
"output_tokens": {"mean": 220, "median": 180, "p95": 600, "p99": 1200},
"io_ratio": {"mean": 0.82, "std": 0.35},
"latency_ms": {"mean": 850, "p50": 720, "p95": 2100, "p99": 4500},
"finish_reasons": {"stop": 0.89, "length": 0.08, "content_filter": 0.03},
"requests_per_hour_per_user": {"mean": 8, "p95": 25, "p99": 45}
}Detecting Anomalous Patterns
Pattern 1: Token Count Anomalies
Abnormal token counts are one of the strongest signals in inference logs.
| Anomaly | Potential Indication | Investigation Action |
|---|---|---|
| Unusually high input tokens | Prompt stuffing, context window exploitation, many-shot jailbreaking | Review prompt content for injected instructions |
| Unusually high output tokens | Data exfiltration, model producing content beyond normal bounds | Review output content for leaked data |
| Sudden drop in output tokens | Content filter blocking, model refusal | Check finish reason and classifier logs |
| Input/output ratio spike | Short input producing very long output (possible extraction) | High-priority content review |
| Consistent max-length outputs | Automated extraction, hitting generation limits repeatedly | Check for automated access patterns |
# Detect token count anomalies using z-score
import numpy as np
def detect_token_anomalies(log_entries, baseline, z_threshold=3.0):
anomalies = []
for entry in log_entries:
input_z = (entry.input_tokens - baseline["input_tokens"]["mean"]) / baseline["input_tokens"]["std"]
output_z = (entry.output_tokens - baseline["output_tokens"]["mean"]) / baseline["output_tokens"]["std"]
if abs(input_z) > z_threshold or abs(output_z) > z_threshold:
anomalies.append({
"request_id": entry.request_id,
"input_z": input_z,
"output_z": output_z,
"timestamp": entry.timestamp,
"user_id": entry.user_id
})
return anomaliesPattern 2: Jailbreak Indicators in Metadata
Even without content, certain metadata patterns suggest jailbreak attempts.
| Metadata Pattern | Jailbreak Indicator | Confidence |
|---|---|---|
| Multiple requests with increasing input tokens from same user | Multi-turn context building for gradual escalation | Medium |
| Content filter triggered, followed by successful request | Attacker refining payload to bypass filter | High |
| Rapid succession of similar-length requests | Automated jailbreak testing | High |
| Very high input tokens with very low output tokens | Stuffing attack followed by model refusal | Medium |
Request with finish_reason: content_filter followed by same user reducing input tokens | Attacker identified the triggering content and is removing it | High |
| Unusual temperature or parameter values | Attempting to increase output randomness for jailbreak reliability | Low-Medium |
Pattern 3: Exfiltration Patterns
Data exfiltration through model outputs has distinct metadata signatures.
| Pattern | Description | Detection Query |
|---|---|---|
| Low-input/high-output ratio | Short prompts producing long outputs | WHERE input_tokens < 50 AND output_tokens > 1000 |
| Sequential extraction | Series of requests with monotonically increasing context | Group by session, check for increasing input tokens |
| Hit-rate anomaly | Spike in requests with specific output token counts | Histogram of output tokens shows unusual clustering |
| Session length anomaly | Sessions with far more turns than typical | WHERE turn_count > baseline.p99_turns |
Pattern 4: Latency Anomalies
Latency measurements can reveal adversarial activity that is invisible in other metrics.
| Anomaly | Potential Cause | Investigation Action |
|---|---|---|
| Unusually high latency for normal token counts | Adversarial input causing more complex computation | Review input for adversarial tokens or encodings |
| Latency spikes correlated with specific users | User sending computationally expensive inputs | Profile the user's request patterns |
| Gradual latency increase over time | Model degradation or increasing context sizes | Check for context window exhaustion patterns |
| First-token latency spike | Unusually complex prompt processing | May indicate prompt injection with complex formatting |
| Consistent low latency with high output | Cached or pre-computed responses (possible model compromise) | Verify model integrity; check for output caching bugs |
Time-Series Analysis
Analyzing inference logs as time-series data reveals patterns invisible in individual-request analysis.
Rolling Window Analysis
# Detect shifts in model behavior using rolling statistics
import pandas as pd
def rolling_behavior_analysis(df, window="1h"):
rolling = df.set_index("timestamp").rolling(window)
metrics = pd.DataFrame({
"mean_output_tokens": rolling["output_tokens"].mean(),
"mean_input_tokens": rolling["input_tokens"].mean(),
"content_filter_rate": rolling["content_filter_triggered"].mean(),
"error_rate": rolling["is_error"].mean(),
"mean_latency": rolling["latency_ms"].mean()
})
# Flag windows where metrics deviate significantly from baseline
alerts = metrics[
(metrics["content_filter_rate"] > baseline_filter_rate * 3) |
(metrics["mean_output_tokens"] > baseline_output_p95) |
(metrics["error_rate"] > baseline_error_rate * 5)
]
return alertsCorrelation Analysis
Cross-correlating metrics reveals patterns that single-metric analysis misses:
| Correlation | Signal |
|---|---|
| High input tokens + content filter triggers | Prompt injection attempts with safety bypass payloads |
| Low latency + high output tokens | Possible cached or pre-computed responses (anomalous) |
| Increasing error rate + decreasing request volume | Attacker hitting errors and backing off |
| User session length increase + output token increase | Progressive extraction over extended sessions |
Automated Detection Queries
SQL-Based Detection
For organizations storing inference logs in a queryable database:
-- Detect potential jailbreak attempts: content filter triggers
-- followed by successful requests from the same user
SELECT
a.user_id,
a.timestamp AS blocked_at,
b.timestamp AS succeeded_at,
b.output_tokens,
EXTRACT(EPOCH FROM b.timestamp - a.timestamp) AS seconds_between
FROM inference_logs a
JOIN inference_logs b ON a.user_id = b.user_id
AND b.timestamp > a.timestamp
AND b.timestamp < a.timestamp + INTERVAL '10 minutes'
WHERE a.finish_reason = 'content_filter'
AND b.finish_reason = 'stop'
AND b.output_tokens > 500
ORDER BY a.timestamp DESC;
-- Detect potential data exfiltration: low input, high output
SELECT
user_id,
session_id,
COUNT(*) AS request_count,
AVG(input_tokens) AS avg_input,
AVG(output_tokens) AS avg_output,
AVG(output_tokens::float / NULLIF(input_tokens, 0)) AS avg_ratio
FROM inference_logs
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY user_id, session_id
HAVING AVG(output_tokens::float / NULLIF(input_tokens, 0)) > 10
AND COUNT(*) > 5
ORDER BY avg_ratio DESC;Limitations of Metadata-Only Analysis
Inference log analysis has inherent limitations that must be acknowledged:
| Limitation | Impact | Mitigation |
|---|---|---|
| No content visibility | Cannot determine if outputs contain sensitive data | Combine with prompt/completion log analysis |
| False positives | Legitimate long conversations look like extraction | Correlate with content and user behavior analysis |
| Sophisticated attackers | Can mimic normal token distributions | Layer multiple detection methods |
| Model variability | Output length varies naturally, creating noise | Use per-endpoint baselines with sufficient history |
Related Topics
- Prompt Log Forensics -- content-level analysis that complements metadata analysis
- Tool Call Forensics -- investigating agent actions
- Severity Framework -- scoring incidents discovered through log analysis
References
- "Anomaly Detection in Time Series Data" - IEEE (2024) - Statistical methods for time-series anomaly detection applicable to inference logs
- "OpenTelemetry Semantic Conventions for GenAI" - OpenTelemetry (2025) - Standardized metrics for AI inference monitoring
- "Detecting LLM Abuse Through API Monitoring" - arXiv:2501.xxxxx (2025) - Research on metadata-based abuse detection
A user sends 12 requests in 5 minutes. The first 3 trigger content filters. The next 9 succeed with progressively shorter input tokens and consistently long output tokens. What does this pattern most likely indicate?