What is Anomaly Detection?

Detecting jailbreak attempts, unusual usage patterns, output drift, and embedding space anomalies in AI systems through statistical and ML-based methods.

What is Logging Architecture?

What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.

AI Monitoring and Observability

intermediate9 min readUpdated 2026-03-15

What to monitor in AI systems, key metrics for detecting abuse and drift, alerting strategies, and observability architecture for LLM applications.

monitoring observability metrics alerting intermediate

Why AI Monitoring Is Different

Traditional application monitoring focuses on uptime, latency, and error rates. AI monitoring must go further — tracking the behavior of a probabilistic system whose outputs are unpredictable by design. A traditional application either works or it does not. An AI application can "work" (return a 200 status code with a valid response) while producing harmful, incorrect, or manipulated content.

What to Monitor

AI monitoring spans three domains: system metrics, behavioral metrics, and security metrics.

System Metrics

These are the traditional observability metrics adapted for AI workloads:

Metric	What It Measures	Why It Matters
Latency (TTFT)	Time to first token	Abnormally high latency may indicate adversarial inputs that cause excessive computation
Latency (total)	Total response generation time	Sudden increases may signal prompt injection triggering complex tool chains
Token throughput	Tokens processed per second	Drops indicate resource contention or attack
Error rate	Percentage of failed requests	Spikes may indicate automated attack attempts
Token usage	Input/output tokens per request	Unusually high usage may indicate extraction attempts or context stuffing
Cost	Dollar spend per request/user/period	Cost spikes from high-token requests or excessive tool usage
GPU utilization	Compute resource consumption	Sustained high utilization may indicate denial-of-service

Behavioral Metrics

These metrics track the quality and appropriateness of model outputs:

Metric	What It Measures	Why It Matters
Refusal rate	Percentage of requests the model refuses to answer	Sudden drops may indicate successful jailbreaking
Topic distribution	Distribution of conversation topics over time	Shifts may indicate systematic exploitation
Output similarity	Cosine similarity between outputs and known-harmful templates	Detects outputs that match harmful content patterns
Tool call frequency	How often the model invokes tools	Spikes may indicate tool abuse through prompt injection
Tool call patterns	Which tools are called and with what arguments	Unexpected tool calls may indicate adversarial manipulation
Sentiment drift	Changes in output sentiment over time	Gradual shifts may indicate subtle manipulation
Hallucination rate	Percentage of ungrounded claims in output	Increases may indicate poisoned retrieval sources

Security Metrics

These metrics specifically target adversarial activity:

Metric	What It Measures	Why It Matters
Injection attempt rate	Requests flagged by prompt injection detectors	Tracks attack volume and trends
Guardrail trigger rate	How often each guardrail blocks a request	Changes indicate new attack patterns or guardrail degradation
System prompt leakage	Outputs containing system prompt fragments	Indicates successful extraction attempts
PII exposure rate	Outputs containing detected PII	Tracks data leakage
API key rotation frequency	How often compromised keys are detected	Indicates key management health
User anomaly score	Per-user behavioral deviation from baseline	Identifies accounts being used for adversarial testing

Alerting Strategy

Effective alerting balances detection sensitivity against alert fatigue. For AI systems, this balance is particularly challenging because the system's outputs are inherently variable.

Alert Tiers

Critical (Immediate Response)
Conditions that indicate active exploitation or data breach. Examples: system prompt fully extracted, PII appearing in outputs, tool calls to unauthorized services, cost spike exceeding threshold. Response: page on-call, automatic mitigation (rate limit or block user).
High (Within 1 Hour)
Conditions that suggest ongoing attack or significant drift. Examples: sustained increase in guardrail trigger rate, new jailbreak pattern appearing repeatedly, abnormal tool call patterns. Response: alert security team, investigate within the hour.
Medium (Within 1 Business Day)
Conditions that indicate potential issues requiring investigation. Examples: gradual change in topic distribution, increasing refusal rate (possible over-filtering), new user accounts with unusually high usage. Response: queue for investigation.
Low (Weekly Review)
Trends and patterns for ongoing security posture assessment. Examples: slow drift in output quality metrics, changes in user population behavior, emerging patterns in blocked requests. Response: include in weekly security review.

Dynamic Baselines

Static alert thresholds fail for AI systems because normal behavior varies with usage patterns, model updates, and seasonal changes. Use dynamic baselines:

Rolling window baselines: Compare current metrics against the same metric over the past 7-30 days
Percentile-based thresholds: Alert when a metric exceeds the 99th percentile of its historical distribution
Rate-of-change alerts: Alert when a metric changes by more than N% within a time window
Cohort comparison: Compare a user's behavior against their cohort rather than a global threshold

Observability Architecture

A complete AI observability stack has four layers:

Layer 1: Collection

Capture all relevant data at the point of generation:

Request/response logging: Every prompt and completion with metadata (user ID, timestamp, model, parameters)
Guardrail decisions: Every guardrail evaluation with its score and decision
Tool call traces: Every tool invocation with arguments, results, and timing
Infrastructure metrics: GPU utilization, memory, latency, error rates

Layer 2: Storage

Store collected data in systems optimized for the access patterns needed:

Data Type	Storage	Retention	Access Pattern
Metrics	Time-series DB (Prometheus, InfluxDB)	90 days at full resolution	Dashboard queries, alerting
Logs	Log aggregator (Elasticsearch, Loki)	30-90 days	Full-text search, investigation
Traces	Trace storage (Jaeger, Tempo)	14-30 days	Request flow analysis
Conversations	Object store (S3) with metadata index	Per policy (30 days to 7 years)	Incident investigation, compliance

Layer 3: Analysis

Process stored data to generate insights and detect anomalies:

Real-time stream processing: Kafka/Flink for immediate pattern detection
Batch analytics: Periodic analysis of accumulated data for trend detection
ML-based anomaly detection: Models trained on normal behavior to detect deviations
Embedding-based similarity: Compare outputs against known-harmful content databases

Layer 4: Visualization and Response

Present analysis results and enable action:

Dashboards: Real-time visibility into system health and security posture
Alert management: Routing, escalation, and tracking of security alerts
Incident response tools: Ability to block users, revoke keys, and adjust guardrails in response to detected threats
Reporting: Compliance reports, security posture reports, and trend analysis

AI-Specific Observability Tools

Several tools have emerged specifically for AI observability:

Tool	Focus	Key Features
LangSmith	LangChain applications	Trace visualization, prompt versioning, evaluation
Langfuse	Open-source LLM observability	Tracing, scoring, prompt management
Weights & Biases (W&B)	ML experiment tracking	Training monitoring, model evaluation
Arize Phoenix	LLM and ML observability	Embedding drift detection, LLM tracing
Helicone	LLM usage analytics	Cost tracking, caching, rate limiting
OpenLLMetry	OpenTelemetry for LLMs	Standard instrumentation for LLM calls

Monitoring as a Red Team Target

From the red team perspective, monitoring is both a constraint and a target:

Evading Monitoring

Low and slow: Spread attacks across time to avoid rate-based detection
Mimicking normal behavior: Match your request patterns to legitimate users
Multiple accounts: Distribute attack attempts across accounts to avoid per-user anomaly detection
Gradual escalation: Avoid sudden behavioral changes that trigger change-rate alerts

Attacking Monitoring

Alert flooding: Generate high volumes of low-severity alerts to cause fatigue and mask real attacks
Log injection: Inject misleading content into logs to complicate incident investigation
Monitoring blind spots: Identify data that is not captured (e.g., streaming responses not logged, tool call arguments not recorded)
Retention exploitation: Execute attacks and then wait for log retention to expire before launching the real attack

Anomaly Detection — detecting jailbreak attempts and unusual patterns
Logging Architecture — what to capture and how to store it
Guardrails Architecture — the preventive controls that monitoring complements
Runtime Monitoring — monitoring as a remediation strategy

References

"Monitoring Machine Learning Models in Production" - Google (2024) - Comprehensive guide to ML monitoring covering data drift, model performance, and operational metrics
"LLM Observability: A Practical Guide" - Arize AI (2025) - Practical patterns for implementing LLM-specific observability
"OpenTelemetry for AI: Instrumenting LLM Applications" - OpenTelemetry Community (2025) - Standard instrumentation approaches for AI application observability
"Detecting Adversarial Attacks on LLM Applications" - Microsoft Research (2024) - Research on monitoring-based detection of adversarial activity targeting LLM applications

Knowledge Check

Why is monitoring considered the 'last line of defense' for AI systems?

AI Monitoring and Observability

Critical (Immediate Response)

High (Within 1 Hour)

Medium (Within 1 Business Day)

Low (Weekly Review)

Learning Path

Related articles

AI Monitoring and Observability

Critical (Immediate Response)

High (Within 1 Hour)

Medium (Within 1 Business Day)

Low (Weekly Review)

Learning Path

Related articles