AI Incident Triage Procedures (Ai Forensics Ir)

intermediate10 min readUpdated 2026-03-15

First-response procedures for AI security incidents: what to preserve immediately, what to isolate, initial assessment checklists, and escalation criteria for AI-specific incidents.

triage first-response incident-response procedures

AI Incident Triage Procedures

The first 30 minutes of an AI incident determine whether you preserve critical evidence or lose it, contain the damage or let it spread, and identify the right responders or waste time with the wrong team. This page defines the first-response procedures specific to AI systems.

The First 30 Minutes

AI incidents have a narrow evidence preservation window. Unlike traditional systems where disk images and network captures persist, AI evidence is often ephemeral: inference logs may rotate, model versions may update, and conversation context windows are cleared when sessions end.

Triage Workflow

Acknowledge and timestamp
Record the exact time (UTC) of detection, the source of the alert (automated system, user report, internal discovery), and your identity as the triage responder. This begins the incident timeline.
```
Incident ID: AI-IR-2026-0042
Detected: 2026-03-15T14:32:07Z
Source: Safety classifier alert (Llama Guard)
Triage Responder: [Name]
```
Preserve volatile evidence
Before taking any other action, capture evidence that may be lost. In order of volatility:
1. Active conversation state -- if the incident involves an ongoing conversation, capture the full conversation history including system prompt, all user messages, all assistant responses, and any tool call records
2. Current model configuration -- record the exact model version, system prompt hash, temperature, sampling parameters, and any active adapters or plugins
3. Inference logs -- ensure inference logging is not about to rotate; extend retention if possible
4. RAG index state -- if retrieval is involved, snapshot the current document index and recent retrieval logs
5. Tool call state -- capture any pending or recent tool invocations, their parameters, and results
See Evidence Preservation for detailed preservation procedures.

Assess scope and containment need

Determine whether the incident is isolated (single conversation, single user) or systemic (affects all users, exploits a flaw in the model or system prompt).

Question	If Yes → Implication
Can any user trigger this?	Systemic -- consider immediate containment
Does it require specific prior context?	May be isolated -- continue assessment
Is the model actively being exploited?	Urgent containment needed
Did the model take real-world actions (tool calls, API requests)?	Assess downstream impact immediately
Is sensitive data being exposed?	Data breach procedures may apply

Implement initial containment

Based on scope assessment, take the minimum containment action that stops the bleeding without destroying evidence.

Containment Action	When to Use	Evidence Impact
Terminate specific session	Isolated to one conversation	Preserves all other sessions
Add input filter for known payload	Specific attack pattern identified	Low impact; attacker may adapt
Enable enhanced logging	Need more data to assess scope	No negative impact
Reduce model capabilities	Tool abuse detected	Limits functionality but preserves model
Switch to fallback model	Systemic vulnerability in primary model	Preserves primary model for analysis
Take endpoint offline	Active data exfiltration or harmful outputs	Maximum disruption, maximum containment

Perform initial severity assessment

Apply the Severity Framework with the information available. An initial score based on partial information is better than no score -- it drives escalation decisions.

Record your initial assessment even if incomplete:

## Initial Severity Assessment (Preliminary)
- Model Integrity: [score] - [brief justification]
- Data Exposure: [score or "Unknown - assessing"]
- Blast Radius: [score] - [brief justification]
- Reversibility: [score or "Unknown - assessing"]
- Exploitability: [score] - [brief justification]
- Preliminary Severity: [level]

Classify and escalate
Apply the Incident Classification taxonomy and follow Escalation Paths based on severity and category.

What to Preserve Immediately

AI systems produce evidence types that do not exist in traditional IR. Missing any of these during triage can make the incident uninvestigable.

Critical Evidence Checklist

Evidence Type	Where to Find It	Why It Matters	Volatility
Full conversation history	Chat application database, API logs	Contains the actual attack payload and model responses	High -- sessions may expire
System prompt (exact version)	Application configuration, prompt management system	Determines what the model was instructed to do	Medium -- may be updated
Model version identifier	Deployment configuration, model registry	Required for reproduction attempts	Medium -- may be updated in deployment
Inference parameters	API request logs, application configuration	Temperature, top_p, etc. affect reproducibility	Low -- typically stable
Safety classifier outputs	Classifier service logs	Shows whether safety systems detected the incident	Medium -- logs may rotate
RAG retrieval results	Vector database query logs	Shows what context the model had access to	High -- queries may not be logged by default
Tool call records	Agent framework logs, tool service logs	Shows what external actions the model took	Medium -- depends on logging configuration
User identity and session data	Authentication system, session store	Required for determining authorization and scoping	Low -- typically persistent

Evidence Preservation Commands

For common AI deployment patterns, these commands capture critical volatile state:

# Capture current model deployment state
kubectl get deployment ai-model-service -o yaml > evidence/deployment_state.yaml
kubectl logs ai-model-service --since=1h > evidence/inference_logs.txt
 
# Snapshot the system prompt from config
kubectl get configmap ai-system-prompt -o jsonpath='{.data}' > evidence/system_prompt.json
 
# Export recent conversation logs (application-specific)
# Adjust the query to your conversation storage
psql -c "COPY (SELECT * FROM conversations
  WHERE created_at > NOW() - INTERVAL '2 hours')
  TO STDOUT WITH CSV HEADER" > evidence/recent_conversations.csv
 
# Capture model version info
curl -s http://model-service:8080/health | jq '.model_version' > evidence/model_version.json

Common Triage Mistakes

Mistake 1: Attempting to Reproduce Before Preserving

Sending prompts to the model to "test" the vulnerability before preserving evidence can:

Modify the conversation state in stateful systems
Trigger log rotation if the system has volume-based rotation
Alert the attacker if they are monitoring the system
Change the model's behavior in few-shot or in-context learning setups

Mistake 2: Updating the System Prompt Immediately

The instinct to "fix" the system prompt immediately is understandable but premature:

You lose the version that was exploited
You cannot assess whether the fix actually works without the original
Other attack paths may exist that the prompt update does not address
The updated prompt may introduce new vulnerabilities

Mistake 3: Treating a Systemic Issue as Isolated

A single user report of a jailbreak may indicate a vulnerability that all users can exploit. Before classifying an incident as isolated:

Search logs for similar patterns from other users
Test whether the attack requires conversation-specific context
Check whether the vulnerability is in the system prompt (systemic) or the conversation flow (potentially isolated)

Mistake 4: Ignoring Downstream Effects

If the AI model called external tools, wrote to databases, sent emails, or made API requests during the incident, the blast radius extends beyond the model itself. Triage must include:

Identifying all tool calls made during the incident window
Assessing whether downstream systems acted on compromised model outputs
Determining whether downstream outputs need to be recalled or reverted

Initial Assessment Questionnaire

Use this questionnaire to structure the initial triage conversation with the person reporting the incident.

#	Question	Purpose
1	What did the model do that it should not have?	Classify the incident type
2	When did this happen? (Exact time if possible)	Set the investigation window
3	How was this discovered?	Assess detection capability
4	Is there a screenshot or log of the model's output?	Preserve primary evidence
5	What was the user trying to do?	Distinguish misuse from exploitation
6	Has anyone else reported similar behavior?	Assess blast radius
7	Which model/endpoint/product is affected?	Scope the investigation
8	Does the model have access to tools or external data?	Assess downstream risk
9	Have any changes been made since the incident?	Determine evidence integrity
10	Has this been shared publicly or with media?	Assess urgency and PR risk

Triage Decision Matrix

After initial assessment, use this matrix to determine the appropriate response level.

Severity	Active Exploitation?	Data Exposed?	Response Level
Critical	Yes	Yes	War room, all hands, legal notification
Critical	Yes	No	Dedicated team, executive briefing, immediate containment
Critical	No	Yes	Data breach procedures, legal review, containment plan
High	Yes	Any	Dedicated team, 1-hour response SLA
High	No	Any	Standard IR team, 4-hour response SLA
Medium	Any	No	Assigned investigator, next business day
Low	No	No	Ticket-based tracking, routine investigation

Severity Framework -- detailed scoring for the initial severity assessment
Escalation Paths -- where to route after triage
Evidence Preservation -- detailed evidence preservation procedures
Log Analysis -- post-triage investigation techniques

References

"NIST SP 800-61 Rev. 3: Computer Security Incident Handling Guide" - NIST (2024) - Triage procedures adapted for AI context
"SANS Incident Handler's Handbook" - SANS Institute (2024) - First-responder procedures for security incidents
"Incident Response in the Age of AI" - Microsoft Security (2025) - AI-specific triage considerations

Knowledge Check

During AI incident triage, why should you NOT immediately try to reproduce the attack?

AI Incident Triage Procedures (Ai Forensics Ir)

Acknowledge and timestamp

Preserve volatile evidence

Assess scope and containment need

Implement initial containment

Perform initial severity assessment

Classify and escalate

Related articles

AI Incident Triage Procedures (Ai Forensics Ir)

Acknowledge and timestamp

Preserve volatile evidence

Assess scope and containment need

Implement initial containment

Perform initial severity assessment

Classify and escalate

Related articles