Runtime Monitoring & Anomaly Detection

advanced8 min readUpdated 2026-03-13

Monitoring LLM applications in production for token usage anomalies, output pattern detection, behavioral drift, and using tools like Langfuse, Helicone, and custom logging.

monitoring anomaly-detection observability langfuse helicone drift logging

Runtime monitoring is the defense layer that catches what real-time filters miss. While input and output filters make binary block/allow decisions per request, monitoring analyzes patterns across time, users, and sessions to detect slow-burn attacks, automated probing, and post-exploitation activity.

What to Monitor

Token Usage Anomalies

Unusual token patterns are strong indicators of attack activity:

Signal	Normal Baseline	Attack Indicator	Possible Attack
Input length spike	50-500 tokens	>4000 tokens	Context window stuffing, attention dilution
Output length spike	100-1000 tokens	>5000 tokens	Data exfiltration, system prompt extraction
Input/output ratio	~1:2 to 1:5	>1:50	Model tricked into generating excessive content
Token velocity	Steady, human-paced	Burst of rapid requests	Automated tool (Garak, PromptFoo)
Repeated similar inputs	Low similarity	>90% cosine similarity	Iterative fuzzing

# Token anomaly detection
class TokenAnomalyDetector:
    def __init__(self, window_size: int = 100):
        self.input_lengths: list[int] = []
        self.output_lengths: list[int] = []
        self.window_size = window_size
 
    def check(self, input_tokens: int, output_tokens: int) -> list[str]:
        alerts = []
        if len(self.input_lengths) >= self.window_size:
            avg_input = sum(self.input_lengths[-self.window_size:]) / self.window_size
            if input_tokens > avg_input * 5:
                alerts.append(f"Input length anomaly: {input_tokens} vs avg {avg_input:.0f}")
 
            avg_output = sum(self.output_lengths[-self.window_size:]) / self.window_size
            if output_tokens > avg_output * 10:
                alerts.append(f"Output length anomaly: {output_tokens} vs avg {avg_output:.0f}")
 
        self.input_lengths.append(input_tokens)
        self.output_lengths.append(output_tokens)
        return alerts

Output Pattern Detection

Monitor the model's outputs for patterns that indicate successful exploitation:

Pattern	Detection Method	Indicates
Refusal rate drop	Track refusal keyword frequency	Jailbreak success
System prompt fragments in output	String matching against known prompt content	System prompt extraction
PII patterns in output	Regex for SSN, email, phone patterns	Data leakage
Code execution indicators	Detect code blocks with dangerous imports	Tool abuse
Repeated output templates	Cluster outputs by embedding similarity	Automated exploitation

Behavioral Drift

Behavioral drift occurs when the model's response patterns change without intentional updates:

Topic drift -- model starts discussing topics outside its intended scope
Tone shift -- model becomes more compliant or aggressive over time
Capability expansion -- model starts using tools or formats it previously avoided
Safety degradation -- refusal rate decreases across a session or across sessions

Monitoring Tools

Langfuse is an open-source LLM observability platform that captures traces, metrics, and evaluations.

from langfuse import Langfuse
 
langfuse = Langfuse()
 
# Trace every LLM interaction
trace = langfuse.trace(name="chat-completion", user_id=user_id)
generation = trace.generation(
    name="main-llm",
    model="gpt-4o",
    input=user_message,
    output=response,
    metadata={
        "input_tokens": usage.prompt_tokens,
        "output_tokens": usage.completion_tokens,
        "refusal_detected": "sorry" in response.lower()[:50],
    },
)
 
# Score for safety evaluation
trace.score(name="safety", value=safety_score, comment="automated check")

Security-relevant features:

Trace-level metrics for per-request analysis
Session tracking for multi-turn behavior monitoring
Custom scores for safety evaluation pipelines
Dashboard for visualizing anomalies over time

Helicone provides request logging and analytics with a proxy-based integration that requires minimal code changes.

from openai import OpenAI
 
# Route through Helicone proxy for automatic logging
client = OpenAI(
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
        "Helicone-User-Id": user_id,
        "Helicone-Session-Id": session_id,
        "Helicone-Property-SecurityCheck": str(passed_security),
    },
)

Security-relevant features:

Automatic request/response logging without code changes
Custom properties for tagging security-relevant metadata
Cost tracking (anomalous cost = possible exploitation)
User-level analytics for identifying abusive accounts

For maximum control, build custom logging pipelines:

import json
import hashlib
from datetime import datetime
 
class SecurityLogger:
    def __init__(self, log_sink):
        self.sink = log_sink
 
    def log_interaction(
        self,
        user_id: str,
        session_id: str,
        input_text: str,
        output_text: str,
        metadata: dict,
    ):
        record = {
            "timestamp": datetime.utcnow().isoformat(),
            "user_id": user_id,
            "session_id": session_id,
            "input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
            "input_tokens": metadata.get("input_tokens"),
            "output_tokens": metadata.get("output_tokens"),
            "input_flagged": metadata.get("input_flagged", False),
            "output_flagged": metadata.get("output_flagged", False),
            "refusal_detected": metadata.get("refusal_detected", False),
            "latency_ms": metadata.get("latency_ms"),
        }
        self.sink.write(json.dumps(record))

When to choose custom logging:

You need to log to a specific SIEM (Splunk, Elastic, Datadog)
Compliance requires specific data handling (no third-party logging)
You need real-time alerting integration with existing incident response

Alerting Strategy

Effective alerts must balance sensitivity (catching attacks) with specificity (not overwhelming operators with false alerts).

Tiered Alert Model

Tier	Severity	Trigger	Action
P0	Critical	Confirmed data exfiltration, system prompt in output	Page on-call, auto-block user
P1	High	Sustained injection attempts, safety bypass detected	Alert security team, flag account
P2	Medium	Token anomaly, refusal rate change	Log for review, no immediate action
P3	Low	Unusual but non-threatening patterns	Dashboard metric, weekly review

Alert Tuning

# Example: P1 alert for sustained injection attempts
class InjectionRateAlert:
    def __init__(self, threshold: int = 5, window_minutes: int = 10):
        self.threshold = threshold
        self.window_minutes = window_minutes
        self.injection_attempts: dict[str, list[datetime]] = {}
 
    def record_attempt(self, user_id: str) -> bool:
        now = datetime.utcnow()
        cutoff = now - timedelta(minutes=self.window_minutes)
        attempts = self.injection_attempts.get(user_id, [])
        attempts = [a for a in attempts if a > cutoff]
        attempts.append(now)
        self.injection_attempts[user_id] = attempts
 
        if len(attempts) >= self.threshold:
            return True  # Trigger P1 alert
        return False

Red Team Implications

Understanding monitoring helps red teamers avoid detection:

Vary payloads -- do not send the same attack string repeatedly; monitoring will cluster them
Pace requests -- match normal user traffic patterns; bursts trigger velocity alerts
Use different sessions -- spread attempts across sessions to avoid per-session escalation detection
Watch token usage -- keep input and output lengths within normal ranges when possible
Rotate identities -- if possible, use different API keys or accounts to avoid user-level flagging

References

"Langfuse Documentation: LLM Observability" - Langfuse (2025) - Reference documentation for the open-source LLM observability platform
"Helicone: LLM Monitoring and Analytics" - Helicone (2025) - Documentation for proxy-based LLM request logging and analytics
"Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight" - Various researchers (2024) - Research on anomaly detection approaches applicable to LLM monitoring
"MITRE ATT&CK for Enterprise: Detection" - MITRE (2025) - Detection methodology framework adapted for AI system monitoring

Knowledge Check

A red teamer sends 50 variations of a prompt injection attack from a single user account in 5 minutes. All are blocked by the input filter. What monitoring signal would alert the security team?

Runtime Monitoring & Anomaly Detection

What to Monitor

Token Usage Anomalies

Output Pattern Detection

Behavioral Drift

Monitoring Tools

Alerting Strategy

Tiered Alert Model

Alert Tuning

Red Team Implications

Further Reading

References

Runtime Monitoring & Anomaly Detection

What to Monitor

Token Usage Anomalies

Output Pattern Detection

Behavioral Drift

Monitoring Tools

Alerting Strategy

Tiered Alert Model

Alert Tuning

Red Team Implications

Further Reading

References

Runtime Monitoring & Anomaly Detection

Related articles

Runtime Monitoring & Anomaly Detection

Related articles