AI Incident Response Checklist

intermediate14 min readUpdated 2026-03-15

Step-by-step checklist for responding to AI security incidents, from initial detection through containment, investigation, remediation, and post-incident review.

incident-response checklist security-operations containment forensics

AI Incident Response Checklist

AI security incidents differ from traditional cybersecurity events in several fundamental ways. Model behavior is probabilistic rather than deterministic, making root cause analysis harder. Attack evidence may not persist in traditional logs. And the blast radius of a compromised model can extend across every interaction the model handles.

This checklist provides a structured approach to AI incident response organized into six phases: preparation, detection and triage, containment, investigation, remediation, and post-incident review.

Phase 1: Preparation (Before an Incident)

Preparation is the most important phase. Organizations that invest in preparation respond faster and more effectively when incidents occur.

Team Readiness

Identify AI incident response team members — Include ML engineers, security analysts, data scientists, legal counsel, and communications personnel. AI incidents require cross-functional expertise that traditional security teams may lack.
Define roles and responsibilities — Assign an incident commander, technical lead, communications lead, and legal advisor. Ensure each role has a backup.
Establish communication channels — Set up dedicated, secure channels for incident coordination. Avoid using the same AI systems that may be compromised.
Conduct tabletop exercises — Run quarterly exercises covering scenarios like prompt injection at scale, model poisoning discovery, training data breach, and adversarial manipulation of production models.
Create decision trees — Pre-define escalation criteria: when to involve legal, when to notify customers, when to take models offline.

Technical Preparation

Implement comprehensive logging — Log all model inputs and outputs with timestamps, session identifiers, and user attribution. Ensure logs are stored separately from the systems they monitor.
Establish model behavior baselines — Document expected output distributions, latency patterns, token usage, and refusal rates. Anomaly detection requires knowing what normal looks like.
Deploy monitoring and alerting — Set up automated alerts for output distribution drift, unusual token patterns, spike in refusal rates, latency anomalies, and unusual API usage patterns.
Maintain model inventory — Keep an up-to-date catalog of all deployed models, their versions, training data sources, fine-tuning history, and dependencies.
Implement rollback capability — Ensure you can quickly revert to a previous model version or configuration without service disruption.
Secure forensic infrastructure — Prepare isolated environments for analyzing compromised models, poisoned training data, or adversarial inputs without risk of contamination.

Documentation

Document model architectures — Include data flows, API endpoints, authentication mechanisms, and integration points for every deployed model.
Maintain vendor contact lists — For models using third-party APIs (OpenAI, Anthropic, etc.), keep current contact information for their security teams and know their incident reporting procedures.
Create playbooks for common scenarios — Develop specific response procedures for prompt injection, data poisoning, model theft, adversarial attacks, and supply chain compromise.

Phase 2: Detection and Triage

The first hours of an AI incident are critical. Rapid detection and accurate triage determine everything that follows.

Initial Detection Signals

AI incidents are typically detected through one of these channels:

Detection Source	Example Signals	Typical Response Time
Automated monitoring	Output distribution shift, latency spike, refusal rate change	Minutes
User reports	Unexpected model behavior, harmful outputs, data leakage in responses	Hours
Security team detection	Suspicious API patterns, unusual access, anomalous token usage	Hours to days
External notification	Researcher disclosure, vendor advisory, media report	Days to weeks
Audit findings	Training data contamination, configuration drift, access control gaps	Weeks to months

Triage Checklist

Assign incident severity — Use a consistent scale. Suggested AI-specific severity criteria:

Determine scope — Which models are affected? Which environments? How many users have been exposed? What data has the model had access to?
Identify attack vector — Is this a prompt injection, data poisoning, model manipulation, supply chain compromise, or infrastructure attack?
Assess data exposure — Has the model leaked sensitive data? Has training data been compromised? Are there privacy implications?
Notify stakeholders — Alert the incident response team, management chain, and legal counsel based on severity.
Begin incident log — Start documenting every action, finding, and decision with timestamps. This log is critical for post-incident review and potential regulatory reporting.

Evidence Preservation

Capture current model state — Save model weights, configuration, and deployment parameters before any changes.
Preserve logs — Ensure logging systems are writing to immutable storage. Copy relevant logs to a secure forensic environment.
Save adversarial inputs — If specific attack inputs have been identified, preserve them with full context including headers, session data, and preceding conversation.
Screenshot or record anomalous behavior — Visual evidence of unexpected model behavior can be invaluable for investigation and reporting.
Document the timeline — When was the anomaly first observed? When was it reported? Establish a preliminary timeline.

Phase 3: Containment

Containment limits the blast radius. The goal is to stop the incident from getting worse while preserving evidence for investigation.

Immediate Containment (First Hour)

Assess whether to take the model offline — Weigh the impact of downtime against the risk of continued exploitation. For P1 incidents, default to taking the model offline.
Implement input filtering — If the attack vector is known, deploy input filters to block similar payloads. This is a temporary measure, not a fix.
Restrict model access — Reduce the model's tool access, API permissions, or data source connections to minimize potential damage.
Isolate affected systems — If the model has been used to access other systems via tool use or agents, isolate those systems for investigation.
Revoke compromised credentials — If the model has been used to exfiltrate API keys, tokens, or credentials, revoke them immediately.
Enable enhanced logging — Increase logging verbosity on affected systems to capture additional evidence.

Short-Term Containment (First Day)

Deploy a clean model version — If a known-good model version exists, deploy it as a replacement. Verify its integrity before deployment.
Implement additional monitoring — Set up targeted monitoring for the specific attack pattern and related variations.
Review and restrict integrations — Audit all systems the affected model connects to. Disable non-essential integrations.
Communicate with affected users — If users have been exposed to harmful outputs or their data has been compromised, notify them according to your communication plan.
Coordinate with vendors — If the incident involves a third-party model API, report the incident to the vendor's security team.

Phase 4: Investigation

Investigation determines what happened, how it happened, and the full extent of the impact.

Attack Vector Analysis

Reconstruct the attack timeline — Using logs, identify the first malicious input, the progression of the attack, and all affected interactions.
Classify the attack type — Map the attack to known categories:

Attack Category	Investigation Focus
Prompt injection	Input logs, system prompt integrity, output analysis
Data poisoning	Training data provenance, data pipeline integrity, model behavior comparison
Model theft/extraction	API access logs, unusual query patterns, membership inference evidence
Adversarial inputs	Input characteristics, model decision boundaries, transferability testing
Supply chain	Dependency integrity, model weight verification, configuration audit
Infrastructure	Access logs, network traffic, container/VM integrity

Identify the attacker's objectives — Was this data exfiltration, service disruption, reputation damage, or something else? Understanding objectives helps predict what else the attacker may have done.
Determine the entry point — How did the attacker gain access? Was it through the model's public API, a compromised integration, a supply chain vector, or insider access?

Impact Assessment

Quantify data exposure — Determine exactly what data the model may have disclosed. Review output logs for PII, credentials, proprietary information, or training data leakage.
Identify affected users — Build a list of users who interacted with the compromised model during the incident window.
Assess downstream impact — If the model triggered actions via tool use or agent workflows, determine what actions were taken and their consequences.
Evaluate model integrity — For poisoning attacks, assess whether the model's weights, fine-tuning, or RLHF data have been corrupted.
Check for lateral movement — Determine whether the attacker used the AI system as a pivot point to access other systems.

Root Cause Analysis

Identify the vulnerability — What specific weakness did the attacker exploit? Was it a missing input filter, insufficient access control, vulnerable dependency, or architectural flaw?
Determine contributing factors — What environmental conditions enabled the attack? Missing monitoring, inadequate access controls, lack of input validation, or insufficient safety alignment?
Assess detection gap — Why was the incident not detected sooner? What monitoring or alerting would have caught it earlier?

Phase 5: Remediation

Remediation fixes the vulnerability and restores normal operations.

Technical Remediation

Fix the root cause — Implement a permanent fix for the vulnerability, not just a workaround. This may involve model retraining, architecture changes, or infrastructure hardening.
Validate the fix — Test the remediation against the original attack and known variations. Use red team testing to verify the fix is effective.
Deploy the fix through normal change management — Unless urgency requires it, follow standard deployment procedures to avoid introducing new issues.
Update detection rules — Add monitoring and alerting for the attack pattern and its variations.
Retrain if necessary — If training data was poisoned, retrain the model from clean data. Verify data provenance before retraining.
Update safety alignment — If the attack exploited a safety bypass, work with the alignment team to address the gap.
Patch dependencies — If the attack exploited a vulnerability in a dependency such as a model serving framework or vector database, update to patched versions.

Process Remediation

Update incident response playbooks — Incorporate lessons learned into your response procedures.
Revise monitoring coverage — Address detection gaps identified during the investigation.
Strengthen access controls — Tighten permissions based on what the investigation revealed about access patterns.
Update training programs — Ensure the team is trained on the attack techniques encountered.

Recovery

Restore normal operations — Remove temporary containment measures and return to standard operating procedures.
Verify system integrity — Confirm that all affected systems are functioning correctly and securely.
Monitor for recurrence — Maintain enhanced monitoring for a defined period, typically 30 to 90 days, to detect any recurrence.

Phase 6: Post-Incident Review

The post-incident review is where organizations learn and improve. Skip this phase at your peril.

Conduct the Review

Schedule a blameless retrospective — Hold the review within two weeks of incident closure. Include all team members who participated in the response.
Review the timeline — Walk through the incident chronologically, from detection to resolution. Identify what went well and what could improve.
Evaluate response effectiveness — How quickly was the incident detected? How effective was containment? Were the right people involved at the right times?
Identify systemic issues — Look beyond the specific incident for patterns that could lead to future incidents.

Documentation and Reporting

Write an incident report — Document the incident comprehensively, including timeline, impact, root cause, remediation, and lessons learned.
Update the model risk register — Add the identified vulnerability and any new risk factors to your model risk tracking system.
File regulatory reports — If required by regulation such as GDPR or HIPAA, file incident reports within mandated timeframes.
Share findings appropriately — Consider sharing anonymized findings with the broader AI security community to help others defend against similar attacks.
Track remediation actions — Assign owners and deadlines for all remediation items. Review progress in regular security meetings.

Metrics to Track

Metric	Description	Target
Mean Time to Detect (MTTD)	Time from incident start to detection	Less than 1 hour for P1
Mean Time to Contain (MTTC)	Time from detection to containment	Less than 4 hours for P1
Mean Time to Remediate (MTTR)	Time from detection to full remediation	Less than 72 hours for P1
Detection method	How the incident was detected	Automated monitoring preferred
Evidence quality	Was sufficient evidence available for investigation?	Complete logs for all interactions
Stakeholder notification time	Time from detection to stakeholder notification	Less than 1 hour for P1

AI-Specific Incident Scenarios

Scenario: Prompt Injection at Scale

When automated prompt injection is detected affecting multiple users simultaneously:

Immediately enable input rate limiting
Deploy prompt injection detection filters
Review all outputs generated during the attack window
Identify and notify affected users
Analyze attack payloads for data exfiltration attempts
Check whether the model's tool access was exploited

Scenario: Training Data Poisoning Discovery

When poisoned data is discovered in a production model's training set:

Determine when the poisoned data entered the pipeline
Identify all models trained on the affected dataset
Compare model behavior before and after the poisoning
Retrain affected models from verified clean data
Audit the data pipeline for the injection vector
Implement data provenance controls

Scenario: Model Supply Chain Compromise

When a third-party model or dependency is found to be compromised:

Identify all systems using the compromised component
Verify integrity of local copies against known-good checksums
Isolate affected systems
Deploy verified alternatives
Audit actions taken by systems using the compromised component
Report to the affected vendor and relevant ISACs

Quick Reference: First 60 Minutes

For teams that need a rapid-start guide, here are the critical first-hour actions:

Minute 0-5: Confirm the incident is real and not a false positive
Minute 5-10: Assign severity, notify incident commander
Minute 10-15: Begin evidence preservation including logs and model state
Minute 15-30: Implement immediate containment by restricting access and filtering inputs
Minute 30-45: Assess scope and identify affected users and systems
Minute 45-60: Brief stakeholders, begin investigation

The first hour sets the trajectory for the entire response. Move quickly but methodically, and document everything.

AI Incident Response Checklist

Related articles

AI Incident Response Checklist

Related articles