Conversation Preservation
Preserving AI conversation evidence: interaction log capture, context window reconstruction, multi-turn conversation integrity, tool call chain preservation, and forensic timeline construction.
Conversations are the primary forensic artifact in most AI security incidents. Every jailbreak, data exfiltration, and model manipulation attack produces a conversation record that shows exactly what the attacker sent and what the model returned. Properly preserved conversation evidence can demonstrate the attack technique, quantify the data exposure, prove or disprove intent, and establish a timeline of events.
However, conversation preservation is more complex than it appears. A conversation with an AI system is not simply a sequence of text messages — it includes system prompts, context windows with limited capacity, tool calls and their results, multi-modal inputs and outputs, and metadata that may be as forensically valuable as the content itself.
What Constitutes a Complete Conversation Record
Core Components
A forensically complete conversation record includes all of the following:
# evidence_preservation/conversation_record.py
"""
Complete conversation record for forensic preservation.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
@dataclass
class ConversationRecord:
# Session identification
session_id: str
conversation_id: str
user_id: str
user_ip: str
user_agent: str
# Temporal data
session_start: datetime
session_end: Optional[datetime]
# System context (what the model was told to be)
system_prompt: str
system_prompt_version: str
model_id: str
model_version: str
# Messages in order
messages: list = field(default_factory=list)
# Tool interactions
tool_calls: list = field(default_factory=list)
# RAG retrievals (what context was injected)
rag_retrievals: list = field(default_factory=list)
# Guardrail events
guardrail_events: list = field(default_factory=list)
# Metadata
total_input_tokens: int = 0
total_output_tokens: int = 0
api_cost: float = 0.0
@dataclass
class Message:
role: str # "system", "user", "assistant", "tool"
content: str # Message text
timestamp: datetime # When the message was sent/received
sequence_number: int # Position in conversation
token_count: int # Token count for this message
# For user messages
original_input: str = "" # Before any preprocessing
preprocessed_input: str = "" # After guardrail processing
injection_score: float = 0.0
# For assistant messages
raw_output: str = "" # Before output filtering
filtered_output: str = "" # After output filtering
was_filtered: bool = False
filter_reason: str = ""
# For tool messages
tool_name: str = ""
tool_input: str = ""
tool_output: str = ""
@dataclass
class RAGRetrieval:
query: str # The retrieval query
timestamp: datetime
documents_retrieved: list # Document IDs and scores
documents_injected: list # Which documents were added to context
total_injected_tokens: intContext Window Reconstruction
The context window is what the model actually sees when generating a response. In a multi-turn conversation, earlier messages may be truncated or summarized to fit within the model's context limit. Forensic analysis must reconstruct the exact context window state at each turn.
# evidence_preservation/context_reconstruction.py
"""
Reconstruct the exact context window at any point in
a conversation.
"""
class ContextWindowReconstructor:
def __init__(self, model_config):
self.max_tokens = model_config.context_window_size
self.tokenizer = model_config.tokenizer
def reconstruct_at_turn(self, conversation: ConversationRecord,
turn_number: int) -> dict:
"""
Reconstruct what the model's context window contained
when generating the response at the specified turn.
"""
# Start with system prompt (always present)
context = [{
"role": "system",
"content": conversation.system_prompt,
"tokens": self._count_tokens(conversation.system_prompt),
}]
total_tokens = context[0]["tokens"]
# Add messages up to (but not including) the target turn
messages_to_include = [
m for m in conversation.messages
if m.sequence_number < turn_number
]
# Check if context window management was applied
# (messages may have been dropped or summarized)
if self._exceeds_window(messages_to_include, total_tokens):
context_managed = self._apply_context_management(
messages_to_include, total_tokens
)
context.extend(context_managed["included_messages"])
return {
"turn_number": turn_number,
"context_window": context,
"total_tokens": context_managed["total_tokens"],
"truncated_messages": context_managed["truncated"],
"summarized": context_managed.get("summary_applied",
False),
"warning": "Context window management was applied. "
"Some earlier messages may not have been "
"visible to the model.",
}
# Add RAG retrievals that were injected at this turn
rag_at_turn = [
r for r in conversation.rag_retrievals
if r.timestamp <= conversation.messages[turn_number - 1].timestamp
]
for rag in rag_at_turn:
for doc in rag.documents_injected:
context.append({
"role": "context",
"content": doc["content"],
"source": doc["source"],
"tokens": doc["token_count"],
})
total_tokens += doc["token_count"]
# Add conversation messages
for msg in messages_to_include:
context.append({
"role": msg.role,
"content": msg.content,
"tokens": msg.token_count,
"original_input": msg.original_input
if msg.role == "user" else None,
})
total_tokens += msg.token_count
return {
"turn_number": turn_number,
"context_window": context,
"total_tokens": total_tokens,
"window_utilization": total_tokens / self.max_tokens,
"truncated_messages": [],
}Multi-Turn Conversation Integrity
Preserving Message Ordering
Message ordering is forensically critical. In a multi-turn attack, the sequence of messages establishes the attack methodology — which probes came first, how the attacker adapted to model responses, and when the safety bypass occurred.
# evidence_preservation/ordering_verification.py
"""
Verify and enforce message ordering integrity.
"""
def verify_message_ordering(conversation: ConversationRecord) -> dict:
"""
Verify that messages are in correct chronological and
sequential order. Flag any anomalies.
"""
issues = []
for i in range(1, len(conversation.messages)):
prev = conversation.messages[i - 1]
curr = conversation.messages[i]
# Check chronological ordering
if curr.timestamp < prev.timestamp:
issues.append({
"type": "timestamp_reversal",
"position": i,
"prev_timestamp": prev.timestamp.isoformat(),
"curr_timestamp": curr.timestamp.isoformat(),
"severity": "high",
})
# Check sequence number continuity
if curr.sequence_number != prev.sequence_number + 1:
issues.append({
"type": "sequence_gap",
"position": i,
"expected_seq": prev.sequence_number + 1,
"actual_seq": curr.sequence_number,
"severity": "critical" if (
curr.sequence_number - prev.sequence_number > 1
) else "medium",
})
# Check role alternation (user/assistant)
expected_roles = {
"user": {"assistant", "tool"},
"assistant": {"user", "tool"},
"tool": {"assistant"},
}
if curr.role not in expected_roles.get(prev.role, set()):
# Not necessarily an error — tool calls break alternation
if curr.role != "tool" and prev.role != "tool":
issues.append({
"type": "unexpected_role_sequence",
"position": i,
"prev_role": prev.role,
"curr_role": curr.role,
"severity": "medium",
})
return {
"messages_checked": len(conversation.messages),
"issues_found": len(issues),
"issues": issues,
"integrity": "verified" if not issues else "issues_found",
}Gap Detection
Missing messages in a conversation log can indicate log tampering, system errors, or intentional deletion. Detect gaps by analyzing:
- Sequence number continuity — Missing sequence numbers indicate dropped messages
- Timestamp gaps — Unusually long gaps between messages may indicate missing interactions
- Token count consistency — If the model's response references content not in the preserved conversation, messages may be missing
- Context window analysis — If the model's behavior suggests it saw content not in the log, the log is incomplete
Tool Call Chain Preservation
Capturing Complete Tool Interactions
For agentic AI systems, tool calls are critical forensic evidence. A tool call chain shows what actions the AI took, what data it accessed, and what side effects it produced.
# evidence_preservation/tool_chain.py
"""
Preserve complete tool call chains for forensic analysis.
"""
@dataclass
class ToolCallRecord:
call_id: str
timestamp: datetime
sequence_in_conversation: int
# What the model requested
tool_name: str
tool_arguments: dict
model_reasoning: str # Why the model chose this tool
# What happened
tool_result: str
execution_duration_ms: int
success: bool
error_message: str = ""
# Side effects
side_effects: list = field(default_factory=list)
# e.g., "wrote file X", "sent email to Y", "queried database Z"
# Authorization
was_authorized: bool = True
authorization_check: str = ""
# Context
preceding_message: str = "" # User message that led to this call
following_response: str = "" # Model response after tool resultPreserving Side Effects
Tool calls may produce side effects that exist outside the conversation log — files created, emails sent, database queries executed, APIs called. These side effects must be documented as part of the conversation evidence:
def document_side_effects(tool_call: ToolCallRecord) -> list:
"""
Document side effects of a tool call for forensic records.
"""
side_effects = []
if tool_call.tool_name == "file_write":
side_effects.append({
"type": "file_created",
"path": tool_call.tool_arguments.get("path"),
"content_hash": hash_content(
tool_call.tool_arguments.get("content", "")
),
"timestamp": tool_call.timestamp,
})
elif tool_call.tool_name == "database_query":
side_effects.append({
"type": "database_query",
"query": tool_call.tool_arguments.get("query"),
"rows_returned": len(
json.loads(tool_call.tool_result)
) if tool_call.success else 0,
"tables_accessed": extract_tables(
tool_call.tool_arguments.get("query", "")
),
})
elif tool_call.tool_name == "send_email":
side_effects.append({
"type": "email_sent",
"recipient": tool_call.tool_arguments.get("to"),
"subject": tool_call.tool_arguments.get("subject"),
"timestamp": tool_call.timestamp,
})
return side_effectsForensic Timeline Construction
Correlating Conversation Events with System Events
A forensic timeline combines conversation events with system events (log entries, alerts, configuration changes) to create a complete narrative of the incident.
# evidence_preservation/timeline.py
"""
Forensic timeline construction from multiple evidence sources.
"""
class ForensicTimeline:
def __init__(self):
self.events = []
def add_conversation_events(self,
conversation: ConversationRecord):
"""Add conversation messages to the timeline."""
for msg in conversation.messages:
self.events.append({
"timestamp": msg.timestamp,
"source": "conversation",
"type": f"{msg.role}_message",
"content_preview": msg.content[:200],
"session_id": conversation.session_id,
"details": {
"injection_score": msg.injection_score,
"was_filtered": msg.was_filtered,
"token_count": msg.token_count,
},
})
def add_guardrail_events(self,
conversation: ConversationRecord):
"""Add guardrail triggers to the timeline."""
for event in conversation.guardrail_events:
self.events.append({
"timestamp": event["timestamp"],
"source": "guardrail",
"type": event["action"],
"content_preview": event.get("reason", ""),
"session_id": conversation.session_id,
})
def add_system_events(self, system_logs: list):
"""Add system-level events (alerts, config changes)."""
for log in system_logs:
self.events.append({
"timestamp": log["timestamp"],
"source": "system",
"type": log["event_type"],
"content_preview": log.get("message", ""),
})
def build(self) -> list:
"""Return chronologically sorted timeline."""
return sorted(self.events, key=lambda e: e["timestamp"])
def export_for_report(self) -> str:
"""Export timeline in a format suitable for incident reports."""
timeline = self.build()
lines = ["# Forensic Timeline", ""]
for event in timeline:
ts = event["timestamp"].strftime("%Y-%m-%d %H:%M:%S.%f UTC")
source = event["source"].upper()
etype = event["type"]
preview = event["content_preview"][:100]
lines.append(f"**{ts}** [{source}] {etype}")
if preview:
lines.append(f"> {preview}")
lines.append("")
return "\n".join(lines)Preservation Best Practices
Capture before and after guardrail processing
Preserve both the original user input (before guardrails) and the preprocessed input (after guardrails). Similarly, preserve both the raw model output and the filtered output. The difference between pre- and post-processing reveals what the guardrails caught.
Include RAG context
If the conversation includes RAG-retrieved context, preserve which documents were retrieved, their similarity scores, and the exact text injected into the context window. RAG context is often where the attack payload originates.
Preserve metadata alongside content
Timestamps, token counts, injection scores, user IPs, and session identifiers are as forensically valuable as the conversation content itself. Never strip metadata during preservation.
Hash at collection time
Compute and record cryptographic hashes of conversation records at the moment of collection. Verify hashes before any analysis to confirm the evidence has not been modified.
Further Reading
- Evidence Preservation Overview — The broader evidence framework
- Model Snapshots — Companion evidence type
- Prompt Log Forensics — Analyzing preserved conversations
- Tool Call Forensics — Analyzing preserved tool interactions