Runtime Monitoring & Anomaly Detection
Monitoring LLM applications in production for token usage anomalies, output pattern detection, behavioral drift, and using tools like Langfuse, Helicone, and custom logging.
Runtime monitoring is the defense layer that catches what real-time filters miss. While input and output filters make binary block/allow decisions per request, monitoring analyzes patterns across time, users, and sessions to detect slow-burn attacks, automated probing, and post-exploitation activity.
What to Monitor
Token Usage Anomalies
Unusual token patterns are strong indicators of attack activity:
| Signal | Normal Baseline | Attack Indicator | Possible Attack |
|---|---|---|---|
| Input length spike | 50-500 tokens | >4000 tokens | Context window stuffing, attention dilution |
| Output length spike | 100-1000 tokens | >5000 tokens | Data exfiltration, system prompt extraction |
| Input/output ratio | ~1:2 to 1:5 | >1:50 | Model tricked into generating excessive content |
| Token velocity | Steady, human-paced | Burst of rapid requests | Automated tool (Garak, PromptFoo) |
| Repeated similar inputs | Low similarity | >90% cosine similarity | Iterative fuzzing |
# Token anomaly detection
class TokenAnomalyDetector:
def __init__(self, window_size: int = 100):
self.input_lengths: list[int] = []
self.output_lengths: list[int] = []
self.window_size = window_size
def check(self, input_tokens: int, output_tokens: int) -> list[str]:
alerts = []
if len(self.input_lengths) >= self.window_size:
avg_input = sum(self.input_lengths[-self.window_size:]) / self.window_size
if input_tokens > avg_input * 5:
alerts.append(f"Input length anomaly: {input_tokens} vs avg {avg_input:.0f}")
avg_output = sum(self.output_lengths[-self.window_size:]) / self.window_size
if output_tokens > avg_output * 10:
alerts.append(f"Output length anomaly: {output_tokens} vs avg {avg_output:.0f}")
self.input_lengths.append(input_tokens)
self.output_lengths.append(output_tokens)
return alertsOutput Pattern Detection
Monitor the model's outputs for patterns that indicate successful exploitation:
| Pattern | Detection Method | Indicates |
|---|---|---|
| Refusal rate drop | Track refusal keyword frequency | Jailbreak success |
| System prompt fragments in output | String matching against known prompt content | System prompt extraction |
| PII patterns in output | Regex for SSN, email, phone patterns | Data leakage |
| Code execution indicators | Detect code blocks with dangerous imports | Tool abuse |
| Repeated output templates | Cluster outputs by embedding similarity | Automated exploitation |
Behavioral Drift
Behavioral drift occurs when the model's response patterns change without intentional updates:
- Topic drift -- model starts discussing topics outside its intended scope
- Tone shift -- model becomes more compliant or aggressive over time
- Capability expansion -- model starts using tools or formats it previously avoided
- Safety degradation -- refusal rate decreases across a session or across sessions
Monitoring Tools
Langfuse is an open-source LLM observability platform that captures traces, metrics, and evaluations.
from langfuse import Langfuse
langfuse = Langfuse()
# Trace every LLM interaction
trace = langfuse.trace(name="chat-completion", user_id=user_id)
generation = trace.generation(
name="main-llm",
model="gpt-4o",
input=user_message,
output=response,
metadata={
"input_tokens": usage.prompt_tokens,
"output_tokens": usage.completion_tokens,
"refusal_detected": "sorry" in response.lower()[:50],
},
)
# Score for safety evaluation
trace.score(name="safety", value=safety_score, comment="automated check")Security-relevant features:
- Trace-level metrics for per-request analysis
- Session tracking for multi-turn behavior monitoring
- Custom scores for safety evaluation pipelines
- Dashboard for visualizing anomalies over time
Helicone provides request logging and analytics with a proxy-based integration that requires minimal code changes.
from openai import OpenAI
# Route through Helicone proxy for automatic logging
client = OpenAI(
base_url="https://oai.helicone.ai/v1",
default_headers={
"Helicone-Auth": f"Bearer {HELICONE_API_KEY}",
"Helicone-User-Id": user_id,
"Helicone-Session-Id": session_id,
"Helicone-Property-SecurityCheck": str(passed_security),
},
)Security-relevant features:
- Automatic request/response logging without code changes
- Custom properties for tagging security-relevant metadata
- Cost tracking (anomalous cost = possible exploitation)
- User-level analytics for identifying abusive accounts
For maximum control, build custom logging pipelines:
import json
import hashlib
from datetime import datetime
class SecurityLogger:
def __init__(self, log_sink):
self.sink = log_sink
def log_interaction(
self,
user_id: str,
session_id: str,
input_text: str,
output_text: str,
metadata: dict,
):
record = {
"timestamp": datetime.utcnow().isoformat(),
"user_id": user_id,
"session_id": session_id,
"input_hash": hashlib.sha256(input_text.encode()).hexdigest()[:16],
"input_tokens": metadata.get("input_tokens"),
"output_tokens": metadata.get("output_tokens"),
"input_flagged": metadata.get("input_flagged", False),
"output_flagged": metadata.get("output_flagged", False),
"refusal_detected": metadata.get("refusal_detected", False),
"latency_ms": metadata.get("latency_ms"),
}
self.sink.write(json.dumps(record))When to choose custom logging:
- You need to log to a specific SIEM (Splunk, Elastic, Datadog)
- Compliance requires specific data handling (no third-party logging)
- You need real-time alerting integration with existing incident response
Alerting Strategy
Effective alerts must balance sensitivity (catching attacks) with specificity (not overwhelming operators with false alerts).
Tiered Alert Model
| Tier | Severity | Trigger | Action |
|---|---|---|---|
| P0 | Critical | Confirmed data exfiltration, system prompt in output | Page on-call, auto-block user |
| P1 | High | Sustained injection attempts, safety bypass detected | Alert security team, flag account |
| P2 | Medium | Token anomaly, refusal rate change | Log for review, no immediate action |
| P3 | Low | Unusual but non-threatening patterns | Dashboard metric, weekly review |
Alert Tuning
# Example: P1 alert for sustained injection attempts
class InjectionRateAlert:
def __init__(self, threshold: int = 5, window_minutes: int = 10):
self.threshold = threshold
self.window_minutes = window_minutes
self.injection_attempts: dict[str, list[datetime]] = {}
def record_attempt(self, user_id: str) -> bool:
now = datetime.utcnow()
cutoff = now - timedelta(minutes=self.window_minutes)
attempts = self.injection_attempts.get(user_id, [])
attempts = [a for a in attempts if a > cutoff]
attempts.append(now)
self.injection_attempts[user_id] = attempts
if len(attempts) >= self.threshold:
return True # Trigger P1 alert
return FalseRed Team Implications
Understanding monitoring helps red teamers avoid detection:
- Vary payloads -- do not send the same attack string repeatedly; monitoring will cluster them
- Pace requests -- match normal user traffic patterns; bursts trigger velocity alerts
- Use different sessions -- spread attempts across sessions to avoid per-session escalation detection
- Watch token usage -- keep input and output lengths within normal ranges when possible
- Rotate identities -- if possible, use different API keys or accounts to avoid user-level flagging
Further Reading
- Defense-in-Depth for LLM Apps -- where monitoring fits in the defense stack
- Rate Limiting, Sandboxing & Execution Controls -- complementary infrastructure controls
- The AI Defense Landscape -- broader defense tool overview
Related Topics
- Defense-in-Depth for LLM Apps - Where monitoring fits in the defense stack
- Rate Limiting, Sandboxing & Execution Controls - Complementary infrastructure controls
- The AI Defense Landscape - Broader defense tool overview including monitoring vendors
- Thinking Like a Defender - Understanding how defenders prioritize alerts and monitoring
References
- "Langfuse Documentation: LLM Observability" - Langfuse (2025) - Reference documentation for the open-source LLM observability platform
- "Helicone: LLM Monitoring and Analytics" - Helicone (2025) - Documentation for proxy-based LLM request logging and analytics
- "Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight" - Various researchers (2024) - Research on anomaly detection approaches applicable to LLM monitoring
- "MITRE ATT&CK for Enterprise: Detection" - MITRE (2025) - Detection methodology framework adapted for AI system monitoring
A red teamer sends 50 variations of a prompt injection attack from a single user account in 5 minutes. All are blocked by the input filter. What monitoring signal would alert the security team?