Conversation Preservation
Preserving AI conversation evidence: interaction log capture, context window reconstruction, multi-turn conversation integrity, tool call chain preservation, and forensic timeline construction.
Conversations are the primary forensic artifact in most AI 安全 incidents. Every 越獄, data exfiltration, and model manipulation attack produces a conversation record that shows exactly what 攻擊者 sent and what 模型 returned. Properly preserved conversation evidence can demonstrate the attack technique, quantify the data exposure, prove or disprove intent, and establish a timeline of events.
然而, conversation preservation is more complex than it appears. A conversation with an AI system is not simply a sequence of text messages — it includes system prompts, context windows with limited capacity, tool calls and their results, multi-modal inputs and outputs, and metadata that may be as forensically valuable as the content itself.
What Constitutes a Complete Conversation Record
Core Components
A forensically complete conversation record includes all of the following:
# evidence_preservation/conversation_record.py
"""
Complete conversation record for forensic preservation.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
@dataclass
class ConversationRecord:
# Session identification
session_id: str
conversation_id: str
user_id: str
user_ip: str
user_agent: str
# Temporal data
session_start: datetime
session_end: Optional[datetime]
# System context (what 模型 was told to be)
system_prompt: str
system_prompt_version: str
model_id: str
model_version: str
# Messages in order
messages: list = field(default_factory=list)
# Tool interactions
tool_calls: list = field(default_factory=list)
# RAG retrievals (what context was injected)
rag_retrievals: list = field(default_factory=list)
# 護欄 events
guardrail_events: list = field(default_factory=list)
# Metadata
total_input_tokens: int = 0
total_output_tokens: int = 0
api_cost: float = 0.0
@dataclass
class Message:
role: str # "system", "user", "assistant", "tool"
content: str # Message text
timestamp: datetime # When the message was sent/received
sequence_number: int # Position in conversation
token_count: int # Token count for this message
# For user messages
original_input: str = "" # Before any preprocessing
preprocessed_input: str = "" # After 護欄 processing
injection_score: float = 0.0
# For assistant messages
raw_output: str = "" # Before 輸出 filtering
filtered_output: str = "" # After 輸出 filtering
was_filtered: bool = False
filter_reason: str = ""
# For tool messages
tool_name: str = ""
tool_input: str = ""
tool_output: str = ""
@dataclass
class RAGRetrieval:
query: str # The retrieval query
timestamp: datetime
documents_retrieved: list # Document IDs and scores
documents_injected: list # Which documents were added to context
total_injected_tokens: intContext Window Reconstruction
The 上下文視窗 is what 模型 actually sees when generating a response. In a multi-turn conversation, earlier messages may be truncated or summarized to fit within 模型's context limit. Forensic analysis must reconstruct the exact 上下文視窗 state at each turn.
# evidence_preservation/context_reconstruction.py
"""
Reconstruct the exact 上下文視窗 at any point in
a conversation.
"""
class ContextWindowReconstructor:
def __init__(self, model_config):
self.max_tokens = model_config.context_window_size
self.分詞器 = model_config.分詞器
def reconstruct_at_turn(self, conversation: ConversationRecord,
turn_number: int) -> dict:
"""
Reconstruct what 模型's 上下文視窗 contained
when generating the response at the specified turn.
"""
# Start with 系統提示詞 (always present)
context = [{
"role": "system",
"content": conversation.system_prompt,
"符元": self._count_tokens(conversation.system_prompt),
}]
total_tokens = context[0]["符元"]
# Add messages up to (but not including) the target turn
messages_to_include = [
m for m in conversation.messages
if m.sequence_number < turn_number
]
# Check if 上下文視窗 management was applied
# (messages may have been dropped or summarized)
if self._exceeds_window(messages_to_include, total_tokens):
context_managed = self._apply_context_management(
messages_to_include, total_tokens
)
context.extend(context_managed["included_messages"])
return {
"turn_number": turn_number,
"context_window": context,
"total_tokens": context_managed["total_tokens"],
"truncated_messages": context_managed["truncated"],
"summarized": context_managed.get("summary_applied",
False),
"warning": "Context window management was applied. "
"Some earlier messages may not have been "
"visible to 模型.",
}
# Add RAG retrievals that were injected at this turn
rag_at_turn = [
r for r in conversation.rag_retrievals
if r.timestamp <= conversation.messages[turn_number - 1].timestamp
]
for rag in rag_at_turn:
for doc in rag.documents_injected:
context.append({
"role": "context",
"content": doc["content"],
"source": doc["source"],
"符元": doc["token_count"],
})
total_tokens += doc["token_count"]
# Add conversation messages
for msg in messages_to_include:
context.append({
"role": msg.role,
"content": msg.content,
"符元": msg.token_count,
"original_input": msg.original_input
if msg.role == "user" else None,
})
total_tokens += msg.token_count
return {
"turn_number": turn_number,
"context_window": context,
"total_tokens": total_tokens,
"window_utilization": total_tokens / self.max_tokens,
"truncated_messages": [],
}Multi-Turn Conversation Integrity
Preserving Message Ordering
Message ordering is forensically critical. In a multi-turn attack, the sequence of messages establishes the attack methodology — which probes came first, how 攻擊者 adapted to model responses, and when the 安全 bypass occurred.
# evidence_preservation/ordering_verification.py
"""
Verify and enforce message ordering integrity.
"""
def verify_message_ordering(conversation: ConversationRecord) -> dict:
"""
Verify that messages are in correct chronological and
sequential order. Flag any anomalies.
"""
issues = []
for i in range(1, len(conversation.messages)):
prev = conversation.messages[i - 1]
curr = conversation.messages[i]
# Check chronological ordering
if curr.timestamp < prev.timestamp:
issues.append({
"type": "timestamp_reversal",
"position": i,
"prev_timestamp": prev.timestamp.isoformat(),
"curr_timestamp": curr.timestamp.isoformat(),
"severity": "high",
})
# Check sequence number continuity
if curr.sequence_number != prev.sequence_number + 1:
issues.append({
"type": "sequence_gap",
"position": i,
"expected_seq": prev.sequence_number + 1,
"actual_seq": curr.sequence_number,
"severity": "critical" if (
curr.sequence_number - prev.sequence_number > 1
) else "medium",
})
# Check role alternation (user/assistant)
expected_roles = {
"user": {"assistant", "tool"},
"assistant": {"user", "tool"},
"tool": {"assistant"},
}
if curr.role not in expected_roles.get(prev.role, set()):
# Not necessarily an error — tool calls break alternation
if curr.role != "tool" and prev.role != "tool":
issues.append({
"type": "unexpected_role_sequence",
"position": i,
"prev_role": prev.role,
"curr_role": curr.role,
"severity": "medium",
})
return {
"messages_checked": len(conversation.messages),
"issues_found": len(issues),
"issues": issues,
"integrity": "verified" if not issues else "issues_found",
}Gap 偵測
Missing messages in a conversation log can indicate log tampering, system errors, or intentional deletion. Detect gaps by analyzing:
- Sequence number continuity — Missing sequence numbers indicate dropped messages
- Timestamp gaps — Unusually long gaps between messages may indicate missing interactions
- Token count consistency — If 模型's response references content not in the preserved conversation, messages may be missing
- Context window analysis — If 模型's behavior suggests it saw content not in the log, the log is incomplete
Tool Call Chain Preservation
Capturing Complete Tool Interactions
For 代理式 AI systems, tool calls are critical forensic evidence. A 工具呼叫 chain shows what actions the AI took, what data it accessed, and what side effects it produced.
# evidence_preservation/tool_chain.py
"""
Preserve complete 工具呼叫 chains for forensic analysis.
"""
@dataclass
class ToolCallRecord:
call_id: str
timestamp: datetime
sequence_in_conversation: int
# What 模型 requested
tool_name: str
tool_arguments: dict
model_reasoning: str # Why 模型 chose this tool
# What happened
tool_result: str
execution_duration_ms: int
success: bool
error_message: str = ""
# Side effects
side_effects: list = field(default_factory=list)
# e.g., "wrote file X", "sent email to Y", "queried 資料庫 Z"
# Authorization
was_authorized: bool = True
authorization_check: str = ""
# Context
preceding_message: str = "" # User message that led to this call
following_response: str = "" # Model response after tool resultPreserving Side Effects
Tool calls may produce side effects that exist outside the conversation log — files created, emails sent, 資料庫 queries executed, APIs called. These side effects must be documented as part of the conversation evidence:
def document_side_effects(tool_call: ToolCallRecord) -> list:
"""
Document side effects of a 工具呼叫 for forensic records.
"""
side_effects = []
if tool_call.tool_name == "file_write":
side_effects.append({
"type": "file_created",
"path": tool_call.tool_arguments.get("path"),
"content_hash": hash_content(
tool_call.tool_arguments.get("content", "")
),
"timestamp": tool_call.timestamp,
})
elif tool_call.tool_name == "database_query":
side_effects.append({
"type": "database_query",
"query": tool_call.tool_arguments.get("query"),
"rows_returned": len(
json.loads(tool_call.tool_result)
) if tool_call.success else 0,
"tables_accessed": extract_tables(
tool_call.tool_arguments.get("query", "")
),
})
elif tool_call.tool_name == "send_email":
side_effects.append({
"type": "email_sent",
"recipient": tool_call.tool_arguments.get("to"),
"subject": tool_call.tool_arguments.get("subject"),
"timestamp": tool_call.timestamp,
})
return side_effectsForensic Timeline Construction
Correlating Conversation Events with System Events
A forensic timeline combines conversation events with system events (log entries, alerts, configuration changes) to create a complete narrative of the incident.
# evidence_preservation/timeline.py
"""
Forensic timeline construction from multiple evidence sources.
"""
class ForensicTimeline:
def __init__(self):
self.events = []
def add_conversation_events(self,
conversation: ConversationRecord):
"""Add conversation messages to the timeline."""
for msg in conversation.messages:
self.events.append({
"timestamp": msg.timestamp,
"source": "conversation",
"type": f"{msg.role}_message",
"content_preview": msg.content[:200],
"session_id": conversation.session_id,
"details": {
"injection_score": msg.injection_score,
"was_filtered": msg.was_filtered,
"token_count": msg.token_count,
},
})
def add_guardrail_events(self,
conversation: ConversationRecord):
"""Add 護欄 triggers to the timeline."""
for event in conversation.guardrail_events:
self.events.append({
"timestamp": event["timestamp"],
"source": "護欄",
"type": event["action"],
"content_preview": event.get("reason", ""),
"session_id": conversation.session_id,
})
def add_system_events(self, system_logs: list):
"""Add system-level events (alerts, config changes)."""
for log in system_logs:
self.events.append({
"timestamp": log["timestamp"],
"source": "system",
"type": log["event_type"],
"content_preview": log.get("message", ""),
})
def build(self) -> list:
"""Return chronologically sorted timeline."""
return sorted(self.events, key=lambda e: e["timestamp"])
def export_for_report(self) -> str:
"""Export timeline in a format suitable for incident reports."""
timeline = self.build()
lines = ["# Forensic Timeline", ""]
for event in timeline:
ts = event["timestamp"].strftime("%Y-%m-%d %H:%M:%S.%f UTC")
source = event["source"].upper()
etype = event["type"]
preview = event["content_preview"][:100]
lines.append(f"**{ts}** [{source}] {etype}")
if preview:
lines.append(f"> {preview}")
lines.append("")
return "\n".join(lines)Preservation 最佳實務
Capture before and after 護欄 processing
Preserve both the original 使用者輸入 (before 護欄) and the preprocessed 輸入 (after 護欄). Similarly, preserve both the raw model 輸出 and the filtered 輸出. The difference between pre- and post-processing reveals what the 護欄 caught.
Include RAG context
If the conversation includes RAG-retrieved context, preserve which documents were retrieved, their similarity scores, and the exact text injected into the 上下文視窗. RAG context is often where the attack payload originates.
Preserve metadata alongside content
Timestamps, 符元 counts, injection scores, user IPs, and session identifiers are as forensically valuable as the conversation content itself. Never strip metadata during preservation.
Hash at collection time
Compute and record cryptographic hashes of conversation records at the moment of collection. Verify hashes before any analysis to confirm the evidence has not been modified.
Further Reading
- Evidence Preservation 概覽 — The broader evidence framework
- Model Snapshots — Companion evidence type
- Prompt Log Forensics — Analyzing preserved conversations
- Tool Call Forensics — Analyzing preserved tool interactions