AI Logging Architecture

intermediate11 min readUpdated 2026-03-15

What to capture in AI system logs — prompts, completions, latency, tokens, tool calls — along with storage strategies, retention policies, and privacy considerations.

logging architecture privacy retention compliance intermediate

Why AI Logging Is Different

AI system logging goes beyond traditional application logging. In a standard web application, you log HTTP requests, responses, and errors. In an AI application, the "request" is a nuanced natural language prompt that may contain sensitive information, and the "response" is a generated text that may contain PII, hallucinated content, or safety violations. The content of AI logs is qualitatively different from traditional application logs, and this changes everything about how you capture, store, and handle them.

What to Capture

Core Interaction Data

Every AI interaction should capture these fields:

Field	Description	Security Value
Request ID	Unique identifier for the interaction	Correlating events across systems
Timestamp	When the request was received and response completed	Timeline reconstruction
User ID	Authenticated user or session identifier	Attribution, pattern analysis
Model	Which model and version handled the request	Identifying model-specific vulnerabilities
System prompt	The system instructions in effect (or a hash/version ID)	Verifying system prompt integrity
User prompt	The user's input text	Attack forensics, pattern detection
Completion	The model's response text	Output analysis, safety audit
Parameters	Temperature, max_tokens, top_p, etc.	Understanding attack conditions
Token counts	Input tokens, output tokens, total	Cost analysis, extraction detection
Latency	TTFT and total generation time	Performance analysis, attack detection
Finish reason	Why generation stopped (length, stop token, content filter)	Filter activation tracking

Guardrail Decision Data

Field	Description	Security Value
Guard name	Which guardrail evaluated the interaction	Understanding which controls activated
Decision	Allow, block, or modify	Tracking guardrail effectiveness
Score	Confidence or risk score	Threshold tuning, trend analysis
Reason	Why the guardrail made its decision	Rule refinement, false positive investigation
Latency	Time taken by the guardrail evaluation	Performance impact of security controls

Tool Call Data

For agentic systems, tool calls require detailed logging:

Field	Description	Security Value
Tool name	Which tool was invoked	Tracking tool usage patterns
Arguments	The arguments passed to the tool	Detecting injection through tool parameters
Result	What the tool returned	Verifying tool behavior
Duration	How long the tool call took	Detecting hanging or exploited tools
Authorization	Whether the tool call was authorized	Tracking authorization decisions

Retrieval Data (RAG Systems)

Field	Description	Security Value
Query embedding	The embedding used for retrieval (or a hash)	Retrieval manipulation detection
Retrieved documents	Document IDs and relevance scores	Identifying poisoned retrievals
Source	Which knowledge base was queried	Tracking data provenance

Logging Architecture

Architecture Pattern: Dual Pipeline

The recommended architecture uses two parallel pipelines — one for real-time operational data and one for full conversation logging:

┌─────────────────┐
│  AI Application  │
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
┌────────┐ ┌──────────────┐
│ Real-  │ │ Full Content │
│ time   │ │ Pipeline     │
│Pipeline│ │              │
│        │ │ Prompts +    │
│Metrics,│ │ Completions  │
│Counts, │ │ + Tool Calls │
│Latency │ │              │
│        │ │ ┌──────────┐ │
│        │ │ │PII Redact│ │
│        │ │ └────┬─────┘ │
│        │ │      │       │
└───┬────┘ └──────┼───────┘
    │             │
    ▼             ▼
┌────────┐  ┌──────────┐
│Metrics │  │Encrypted │
│Store   │  │Log Store │
│(Prom.) │  │(S3/GCS)  │
└────────┘  └──────────┘

Real-Time Pipeline

The real-time pipeline captures structured metrics that can be queried with low latency:

Time-series database (Prometheus, InfluxDB, Datadog) for numeric metrics
Streaming processor (Kafka, Kinesis) for real-time pattern detection
Alerting engine (Alertmanager, PagerDuty) for immediate notifications
Retention: 30-90 days at full resolution, 1 year aggregated

Full Content Pipeline

The content pipeline captures the full text of interactions for forensic investigation:

Pre-processing: PII detection and optional redaction before storage
Encryption: All conversation data encrypted at rest and in transit
Access control: Strict role-based access to conversation content
Retention: Per policy, typically 30 days to 7 years depending on regulatory requirements

Storage Strategies

Tiered Storage

Use storage tiers to balance cost and access speed:

Tier	Storage	Retention	Access Time	Use Case
Hot	Elasticsearch, PostgreSQL	7-30 days	Milliseconds	Active investigation, real-time search
Warm	Object store (S3 Standard)	30-90 days	Seconds	Recent incident investigation
Cold	Archive storage (S3 Glacier)	1-7 years	Hours	Compliance, historical analysis

Schema Design

Design log schemas for queryability. Common query patterns in AI security investigations:

-- Find all interactions where the guardrail was triggered
-- but the user eventually received a response
SELECT request_id, user_id, user_prompt, completion
FROM ai_interactions
WHERE guardrail_triggered = true
  AND response_delivered = true
  AND timestamp > NOW() - INTERVAL '7 days';
 
-- Find users with unusually high refusal rates
-- (may indicate active jailbreak attempts)
SELECT user_id,
       COUNT(*) as total_requests,
       SUM(CASE WHEN finish_reason = 'content_filter' THEN 1 ELSE 0 END) as filtered,
       ROUND(100.0 * SUM(CASE WHEN finish_reason = 'content_filter' THEN 1 ELSE 0 END)
             / COUNT(*), 2) as filter_rate
FROM ai_interactions
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY user_id
HAVING filter_rate > 20
ORDER BY filter_rate DESC;

Privacy Considerations

AI logs are uniquely sensitive because they contain the full text of human conversations with AI systems. Privacy handling is not optional — it is a legal and ethical requirement.

Data Classification

Content Type	Classification	Handling
System prompts	Business Confidential	Store securely; limit access to security team
User prompts	Personal Data (potentially sensitive)	Apply PII redaction policy; encrypt at rest
Completions	Generated Content (may contain PII)	Apply PII detection; may need redaction
Tool call arguments	Varies (may contain credentials)	Scan for secrets; redact before storage
Metrics (numeric)	Internal	Standard access controls

PII Handling Strategies

Detect PII Before Storage
Run PII detection (NER + regex) on all prompts and completions before they enter the content pipeline. Flag interactions containing PII for enhanced handling.
Decide: Redact, Pseudonymize, or Retain
Based on your data handling policy, either redact PII (replace with [REDACTED]), pseudonymize (replace with consistent fake values), or retain with enhanced access controls.
Encrypt and Restrict Access
Encrypt all conversation logs at rest. Implement strict role-based access controls. Require justification for accessing conversation content (break-glass procedure).
Audit Access
Log all access to conversation content. Include who accessed it, when, why (justification), and what they viewed.

Regulatory Requirements

Regulation	Key Requirement	Impact on Logging
GDPR	Right to deletion, purpose limitation, data minimization	Must be able to delete specific users' data; only log what is necessary
CCPA/CPRA	Right to know, right to delete, right to opt out	Must disclose what is logged; enable deletion on request
HIPAA	Protected health information safeguards	If AI handles health data, logs must meet HIPAA security requirements
SOC 2	Security, availability, confidentiality controls	Log access controls and audit trails required

Logging for Incident Response

When a security incident occurs, your logs must answer these questions within minutes:

What happened? — Full text of the adversarial interaction
When did it happen? — Precise timestamps
Who did it? — User identification and authentication details
How did it bypass controls? — Guardrail decisions and scores
What was the impact? — What data was exposed or what harmful content was generated
Is it ongoing? — Are there other users exhibiting similar patterns

Incident Response Queries

Build pre-written queries for common incident types:

System prompt extraction: Search for completions containing known system prompt fragments
PII leakage: Search for completions flagged by PII detection
Jailbreak campaign: Aggregate guardrail trigger patterns by user, time window, and technique
Tool abuse: Search for unusual tool call patterns (unexpected tools, high-risk arguments)
Data exfiltration: Search for interactions with high output token counts and extraction-related patterns

Anti-Patterns to Avoid

Anti-Pattern	Problem	Correct Approach
Logging nothing	Impossible to investigate incidents	Log core interaction data with PII redaction
Logging everything unredacted	Privacy violation, regulatory risk	Apply PII detection and redaction before storage
No retention policy	Unbounded storage costs and liability	Define retention by tier with automated expiration
No access controls on logs	Anyone can read user conversations	Implement role-based access with break-glass procedures
Only logging errors	Misses successful attacks (200 OK responses)	Log all interactions, not just errors
Single storage tier	Either too expensive (all hot) or too slow (all cold)	Implement tiered storage with automated migration

AI Monitoring and Observability — the broader monitoring context
Anomaly Detection — detecting patterns in logged data
Runtime Monitoring — using monitoring as a security control
Secure AI Development Lifecycle — where logging fits in the development lifecycle

References

"Logging and Monitoring for ML Systems" - Google SRE (2024) - Comprehensive guide to logging and monitoring practices for production ML systems
"Privacy-Preserving Machine Learning Logging" - Tramèr et al. (2024) - Research on techniques for logging ML system behavior while minimizing privacy impact
"GDPR Compliance for AI Systems" - European Data Protection Board (2025) - Guidelines on applying GDPR requirements to AI systems including data logging and retention
"Building Secure and Observable LLM Applications" - OWASP (2025) - Security-focused guidance on LLM application logging and observability

Knowledge Check

Why should AI logging use a dual-pipeline architecture (real-time metrics + full content)?

AI Logging Architecture

Detect PII Before Storage

Decide: Redact, Pseudonymize, or Retain

Encrypt and Restrict Access

Audit Access

Related articles

AI Logging Architecture

Detect PII Before Storage

Decide: Redact, Pseudonymize, or Retain

Encrypt and Restrict Access

Audit Access

Related articles