Inference Log Analysis

advanced9 min readUpdated 2026-03-15

Analyzing inference logs for AI forensic investigation: detecting anomalous patterns, identifying jailbreak attempts through metadata, token-level analysis, and latency anomaly detection.

inference-logs anomaly-detection forensics log-analysis

Inference Log Analysis

Inference logs capture the metadata of every model invocation -- token counts, latency, parameters, finish reasons, and error codes. While they do not contain the actual content of prompts and responses, they reveal patterns that signal attack activity, data exfiltration, and model manipulation. This page covers systematic analysis techniques for inference log data.

Establishing a Baseline

Before you can detect anomalies, you need to know what "normal" looks like. Build baseline profiles for each model endpoint covering the following metrics.

Baseline Metrics

Metric	What to Measure	Typical Baseline Window
Input token distribution	Mean, median, P95, P99 of input tokens per request	7-14 days
Output token distribution	Mean, median, P95, P99 of output tokens per request	7-14 days
Input/output ratio	Ratio of input tokens to output tokens	7-14 days
Latency profile	Mean, P50, P95, P99 latency per request	7 days
Request volume	Requests per minute/hour by user, session, and endpoint	14-30 days
Finish reason distribution	Percentage of requests ending in stop token, length limit, or content filter	14 days
Error rate	Percentage of failed requests by error type	14 days
Temperature distribution	If user-configurable, the distribution of temperature values used	14 days

Baseline Example

# Example baseline profile for a customer service chatbot
baseline = {
    "input_tokens": {"mean": 180, "median": 150, "p95": 450, "p99": 800},
    "output_tokens": {"mean": 220, "median": 180, "p95": 600, "p99": 1200},
    "io_ratio": {"mean": 0.82, "std": 0.35},
    "latency_ms": {"mean": 850, "p50": 720, "p95": 2100, "p99": 4500},
    "finish_reasons": {"stop": 0.89, "length": 0.08, "content_filter": 0.03},
    "requests_per_hour_per_user": {"mean": 8, "p95": 25, "p99": 45}
}

Detecting Anomalous Patterns

Pattern 1: Token Count Anomalies

Abnormal token counts are one of the strongest signals in inference logs.

Anomaly	Potential Indication	Investigation Action
Unusually high input tokens	Prompt stuffing, context window exploitation, many-shot jailbreaking	Review prompt content for injected instructions
Unusually high output tokens	Data exfiltration, model producing content beyond normal bounds	Review output content for leaked data
Sudden drop in output tokens	Content filter blocking, model refusal	Check finish reason and classifier logs
Input/output ratio spike	Short input producing very long output (possible extraction)	High-priority content review
Consistent max-length outputs	Automated extraction, hitting generation limits repeatedly	Check for automated access patterns

# Detect token count anomalies using z-score
import numpy as np
 
def detect_token_anomalies(log_entries, baseline, z_threshold=3.0):
    anomalies = []
    for entry in log_entries:
        input_z = (entry.input_tokens - baseline["input_tokens"]["mean"]) / baseline["input_tokens"]["std"]
        output_z = (entry.output_tokens - baseline["output_tokens"]["mean"]) / baseline["output_tokens"]["std"]
 
        if abs(input_z) > z_threshold or abs(output_z) > z_threshold:
            anomalies.append({
                "request_id": entry.request_id,
                "input_z": input_z,
                "output_z": output_z,
                "timestamp": entry.timestamp,
                "user_id": entry.user_id
            })
    return anomalies

Pattern 2: Jailbreak Indicators in Metadata

Even without content, certain metadata patterns suggest jailbreak attempts.

Metadata Pattern	Jailbreak Indicator	Confidence
Multiple requests with increasing input tokens from same user	Multi-turn context building for gradual escalation	Medium
Content filter triggered, followed by successful request	Attacker refining payload to bypass filter	High
Rapid succession of similar-length requests	Automated jailbreak testing	High
Very high input tokens with very low output tokens	Stuffing attack followed by model refusal	Medium
Request with `finish_reason: content_filter` followed by same user reducing input tokens	Attacker identified the triggering content and is removing it	High
Unusual temperature or parameter values	Attempting to increase output randomness for jailbreak reliability	Low-Medium

Pattern 3: Exfiltration Patterns

Data exfiltration through model outputs has distinct metadata signatures.

Pattern	Description	Detection Query
Low-input/high-output ratio	Short prompts producing long outputs	`WHERE input_tokens < 50 AND output_tokens > 1000`
Sequential extraction	Series of requests with monotonically increasing context	Group by session, check for increasing input tokens
Hit-rate anomaly	Spike in requests with specific output token counts	Histogram of output tokens shows unusual clustering
Session length anomaly	Sessions with far more turns than typical	`WHERE turn_count > baseline.p99_turns`

Pattern 4: Latency Anomalies

Latency measurements can reveal adversarial activity that is invisible in other metrics.

Anomaly	Potential Cause	Investigation Action
Unusually high latency for normal token counts	Adversarial input causing more complex computation	Review input for adversarial tokens or encodings
Latency spikes correlated with specific users	User sending computationally expensive inputs	Profile the user's request patterns
Gradual latency increase over time	Model degradation or increasing context sizes	Check for context window exhaustion patterns
First-token latency spike	Unusually complex prompt processing	May indicate prompt injection with complex formatting
Consistent low latency with high output	Cached or pre-computed responses (possible model compromise)	Verify model integrity; check for output caching bugs

Time-Series Analysis

Analyzing inference logs as time-series data reveals patterns invisible in individual-request analysis.

Rolling Window Analysis

# Detect shifts in model behavior using rolling statistics
import pandas as pd
 
def rolling_behavior_analysis(df, window="1h"):
    rolling = df.set_index("timestamp").rolling(window)
 
    metrics = pd.DataFrame({
        "mean_output_tokens": rolling["output_tokens"].mean(),
        "mean_input_tokens": rolling["input_tokens"].mean(),
        "content_filter_rate": rolling["content_filter_triggered"].mean(),
        "error_rate": rolling["is_error"].mean(),
        "mean_latency": rolling["latency_ms"].mean()
    })
 
    # Flag windows where metrics deviate significantly from baseline
    alerts = metrics[
        (metrics["content_filter_rate"] > baseline_filter_rate * 3) |
        (metrics["mean_output_tokens"] > baseline_output_p95) |
        (metrics["error_rate"] > baseline_error_rate * 5)
    ]
 
    return alerts

Correlation Analysis

Cross-correlating metrics reveals patterns that single-metric analysis misses:

Correlation	Signal
High input tokens + content filter triggers	Prompt injection attempts with safety bypass payloads
Low latency + high output tokens	Possible cached or pre-computed responses (anomalous)
Increasing error rate + decreasing request volume	Attacker hitting errors and backing off
User session length increase + output token increase	Progressive extraction over extended sessions

Automated Detection Queries

SQL-Based Detection

For organizations storing inference logs in a queryable database:

-- Detect potential jailbreak attempts: content filter triggers
-- followed by successful requests from the same user
SELECT
    a.user_id,
    a.timestamp AS blocked_at,
    b.timestamp AS succeeded_at,
    b.output_tokens,
    EXTRACT(EPOCH FROM b.timestamp - a.timestamp) AS seconds_between
FROM inference_logs a
JOIN inference_logs b ON a.user_id = b.user_id
    AND b.timestamp > a.timestamp
    AND b.timestamp < a.timestamp + INTERVAL '10 minutes'
WHERE a.finish_reason = 'content_filter'
    AND b.finish_reason = 'stop'
    AND b.output_tokens > 500
ORDER BY a.timestamp DESC;
 
-- Detect potential data exfiltration: low input, high output
SELECT
    user_id,
    session_id,
    COUNT(*) AS request_count,
    AVG(input_tokens) AS avg_input,
    AVG(output_tokens) AS avg_output,
    AVG(output_tokens::float / NULLIF(input_tokens, 0)) AS avg_ratio
FROM inference_logs
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY user_id, session_id
HAVING AVG(output_tokens::float / NULLIF(input_tokens, 0)) > 10
    AND COUNT(*) > 5
ORDER BY avg_ratio DESC;

Limitations of Metadata-Only Analysis

Inference log analysis has inherent limitations that must be acknowledged:

Limitation	Impact	Mitigation
No content visibility	Cannot determine if outputs contain sensitive data	Combine with prompt/completion log analysis
False positives	Legitimate long conversations look like extraction	Correlate with content and user behavior analysis
Sophisticated attackers	Can mimic normal token distributions	Layer multiple detection methods
Model variability	Output length varies naturally, creating noise	Use per-endpoint baselines with sufficient history

Prompt Log Forensics -- content-level analysis that complements metadata analysis
Tool Call Forensics -- investigating agent actions
Severity Framework -- scoring incidents discovered through log analysis

References

"Anomaly Detection in Time Series Data" - IEEE (2024) - Statistical methods for time-series anomaly detection applicable to inference logs
"OpenTelemetry Semantic Conventions for GenAI" - OpenTelemetry (2025) - Standardized metrics for AI inference monitoring
"Detecting LLM Abuse Through API Monitoring" - arXiv:2501.xxxxx (2025) - Research on metadata-based abuse detection

Knowledge Check

A user sends 12 requests in 5 minutes. The first 3 trigger content filters. The next 9 succeed with progressively shorter input tokens and consistently long output tokens. What does this pattern most likely indicate?

Inference Log Analysis

Related articles

Inference Log Analysis

Related articles