AI Incident Triage Procedures (Ai Forensics Ir)
First-response procedures for AI security incidents: what to preserve immediately, what to isolate, initial assessment checklists, and escalation criteria for AI-specific incidents.
AI Incident Triage Procedures
The first 30 minutes of an AI incident determine whether you preserve critical evidence or lose it, contain the damage or let it spread, and identify the right responders or waste time with the wrong team. This page defines the first-response procedures specific to AI systems.
The First 30 Minutes
AI incidents have a narrow evidence preservation window. Unlike traditional systems where disk images and network captures persist, AI evidence is often ephemeral: inference logs may rotate, model versions may update, and conversation context windows are cleared when sessions end.
Triage Workflow
Acknowledge and timestamp
Record the exact time (UTC) of detection, the source of the alert (automated system, user report, internal discovery), and your identity as the triage responder. This begins the incident timeline.
Incident ID: AI-IR-2026-0042 Detected: 2026-03-15T14:32:07Z Source: Safety classifier alert (Llama Guard) Triage Responder: [Name]Preserve volatile evidence
Before taking any other action, capture evidence that may be lost. In order of volatility:
- Active conversation state -- if the incident involves an ongoing conversation, capture the full conversation history including system prompt, all user messages, all assistant responses, and any tool call records
- Current model configuration -- record the exact model version, system prompt hash, temperature, sampling parameters, and any active adapters or plugins
- Inference logs -- ensure inference logging is not about to rotate; extend retention if possible
- RAG index state -- if retrieval is involved, snapshot the current document index and recent retrieval logs
- Tool call state -- capture any pending or recent tool invocations, their parameters, and results
See Evidence Preservation for detailed preservation procedures.
Assess scope and containment need
Determine whether the incident is isolated (single conversation, single user) or systemic (affects all users, exploits a flaw in the model or system prompt).
Question If Yes → Implication Can any user trigger this? Systemic -- consider immediate containment Does it require specific prior context? May be isolated -- continue assessment Is the model actively being exploited? Urgent containment needed Did the model take real-world actions (tool calls, API requests)? Assess downstream impact immediately Is sensitive data being exposed? Data breach procedures may apply Implement initial containment
Based on scope assessment, take the minimum containment action that stops the bleeding without destroying evidence.
Containment Action When to Use Evidence Impact Terminate specific session Isolated to one conversation Preserves all other sessions Add input filter for known payload Specific attack pattern identified Low impact; attacker may adapt Enable enhanced logging Need more data to assess scope No negative impact Reduce model capabilities Tool abuse detected Limits functionality but preserves model Switch to fallback model Systemic vulnerability in primary model Preserves primary model for analysis Take endpoint offline Active data exfiltration or harmful outputs Maximum disruption, maximum containment Perform initial severity assessment
Apply the Severity Framework with the information available. An initial score based on partial information is better than no score -- it drives escalation decisions.
Record your initial assessment even if incomplete:
## Initial Severity Assessment (Preliminary) - Model Integrity: [score] - [brief justification] - Data Exposure: [score or "Unknown - assessing"] - Blast Radius: [score] - [brief justification] - Reversibility: [score or "Unknown - assessing"] - Exploitability: [score] - [brief justification] - Preliminary Severity: [level]Classify and escalate
Apply the Incident Classification taxonomy and follow Escalation Paths based on severity and category.
What to Preserve Immediately
AI systems produce evidence types that do not exist in traditional IR. Missing any of these during triage can make the incident uninvestigable.
Critical Evidence Checklist
| Evidence Type | Where to Find It | Why It Matters | Volatility |
|---|---|---|---|
| Full conversation history | Chat application database, API logs | Contains the actual attack payload and model responses | High -- sessions may expire |
| System prompt (exact version) | Application configuration, prompt management system | Determines what the model was instructed to do | Medium -- may be updated |
| Model version identifier | Deployment configuration, model registry | Required for reproduction attempts | Medium -- may be updated in deployment |
| Inference parameters | API request logs, application configuration | Temperature, top_p, etc. affect reproducibility | Low -- typically stable |
| Safety classifier outputs | Classifier service logs | Shows whether safety systems detected the incident | Medium -- logs may rotate |
| RAG retrieval results | Vector database query logs | Shows what context the model had access to | High -- queries may not be logged by default |
| Tool call records | Agent framework logs, tool service logs | Shows what external actions the model took | Medium -- depends on logging configuration |
| User identity and session data | Authentication system, session store | Required for determining authorization and scoping | Low -- typically persistent |
Evidence Preservation Commands
For common AI deployment patterns, these commands capture critical volatile state:
# Capture current model deployment state
kubectl get deployment ai-model-service -o yaml > evidence/deployment_state.yaml
kubectl logs ai-model-service --since=1h > evidence/inference_logs.txt
# Snapshot the system prompt from config
kubectl get configmap ai-system-prompt -o jsonpath='{.data}' > evidence/system_prompt.json
# Export recent conversation logs (application-specific)
# Adjust the query to your conversation storage
psql -c "COPY (SELECT * FROM conversations
WHERE created_at > NOW() - INTERVAL '2 hours')
TO STDOUT WITH CSV HEADER" > evidence/recent_conversations.csv
# Capture model version info
curl -s http://model-service:8080/health | jq '.model_version' > evidence/model_version.jsonCommon Triage Mistakes
Mistake 1: Attempting to Reproduce Before Preserving
Sending prompts to the model to "test" the vulnerability before preserving evidence can:
- Modify the conversation state in stateful systems
- Trigger log rotation if the system has volume-based rotation
- Alert the attacker if they are monitoring the system
- Change the model's behavior in few-shot or in-context learning setups
Mistake 2: Updating the System Prompt Immediately
The instinct to "fix" the system prompt immediately is understandable but premature:
- You lose the version that was exploited
- You cannot assess whether the fix actually works without the original
- Other attack paths may exist that the prompt update does not address
- The updated prompt may introduce new vulnerabilities
Mistake 3: Treating a Systemic Issue as Isolated
A single user report of a jailbreak may indicate a vulnerability that all users can exploit. Before classifying an incident as isolated:
- Search logs for similar patterns from other users
- Test whether the attack requires conversation-specific context
- Check whether the vulnerability is in the system prompt (systemic) or the conversation flow (potentially isolated)
Mistake 4: Ignoring Downstream Effects
If the AI model called external tools, wrote to databases, sent emails, or made API requests during the incident, the blast radius extends beyond the model itself. Triage must include:
- Identifying all tool calls made during the incident window
- Assessing whether downstream systems acted on compromised model outputs
- Determining whether downstream outputs need to be recalled or reverted
Initial Assessment Questionnaire
Use this questionnaire to structure the initial triage conversation with the person reporting the incident.
| # | Question | Purpose |
|---|---|---|
| 1 | What did the model do that it should not have? | Classify the incident type |
| 2 | When did this happen? (Exact time if possible) | Set the investigation window |
| 3 | How was this discovered? | Assess detection capability |
| 4 | Is there a screenshot or log of the model's output? | Preserve primary evidence |
| 5 | What was the user trying to do? | Distinguish misuse from exploitation |
| 6 | Has anyone else reported similar behavior? | Assess blast radius |
| 7 | Which model/endpoint/product is affected? | Scope the investigation |
| 8 | Does the model have access to tools or external data? | Assess downstream risk |
| 9 | Have any changes been made since the incident? | Determine evidence integrity |
| 10 | Has this been shared publicly or with media? | Assess urgency and PR risk |
Triage Decision Matrix
After initial assessment, use this matrix to determine the appropriate response level.
| Severity | Active Exploitation? | Data Exposed? | Response Level |
|---|---|---|---|
| Critical | Yes | Yes | War room, all hands, legal notification |
| Critical | Yes | No | Dedicated team, executive briefing, immediate containment |
| Critical | No | Yes | Data breach procedures, legal review, containment plan |
| High | Yes | Any | Dedicated team, 1-hour response SLA |
| High | No | Any | Standard IR team, 4-hour response SLA |
| Medium | Any | No | Assigned investigator, next business day |
| Low | No | No | Ticket-based tracking, routine investigation |
Related Topics
- Severity Framework -- detailed scoring for the initial severity assessment
- Escalation Paths -- where to route after triage
- Evidence Preservation -- detailed evidence preservation procedures
- Log Analysis -- post-triage investigation techniques
References
- "NIST SP 800-61 Rev. 3: Computer Security Incident Handling Guide" - NIST (2024) - Triage procedures adapted for AI context
- "SANS Incident Handler's Handbook" - SANS Institute (2024) - First-responder procedures for security incidents
- "Incident Response in the Age of AI" - Microsoft Security (2025) - AI-specific triage considerations
During AI incident triage, why should you NOT immediately try to reproduce the attack?