AI Incident Response Checklist
Step-by-step checklist for responding to AI security incidents, from initial detection through containment, investigation, remediation, and post-incident review.
AI Incident Response Checklist
AI security incidents differ from traditional cybersecurity events in several fundamental ways. Model behavior is probabilistic rather than deterministic, making root cause analysis harder. Attack evidence may not persist in traditional logs. And the blast radius of a compromised model can extend across every interaction the model handles.
This checklist provides a structured approach to AI incident response organized into six phases: preparation, detection and triage, containment, investigation, remediation, and post-incident review.
Phase 1: Preparation (Before an Incident)
Preparation is the most important phase. Organizations that invest in preparation respond faster and more effectively when incidents occur.
Team Readiness
- Identify AI incident response team members — Include ML engineers, security analysts, data scientists, legal counsel, and communications personnel. AI incidents require cross-functional expertise that traditional security teams may lack.
- Define roles and responsibilities — Assign an incident commander, technical lead, communications lead, and legal advisor. Ensure each role has a backup.
- Establish communication channels — Set up dedicated, secure channels for incident coordination. Avoid using the same AI systems that may be compromised.
- Conduct tabletop exercises — Run quarterly exercises covering scenarios like prompt injection at scale, model poisoning discovery, training data breach, and adversarial manipulation of production models.
- Create decision trees — Pre-define escalation criteria: when to involve legal, when to notify customers, when to take models offline.
Technical Preparation
- Implement comprehensive logging — Log all model inputs and outputs with timestamps, session identifiers, and user attribution. Ensure logs are stored separately from the systems they monitor.
- Establish model behavior baselines — Document expected output distributions, latency patterns, token usage, and refusal rates. Anomaly detection requires knowing what normal looks like.
- Deploy monitoring and alerting — Set up automated alerts for output distribution drift, unusual token patterns, spike in refusal rates, latency anomalies, and unusual API usage patterns.
- Maintain model inventory — Keep an up-to-date catalog of all deployed models, their versions, training data sources, fine-tuning history, and dependencies.
- Implement rollback capability — Ensure you can quickly revert to a previous model version or configuration without service disruption.
- Secure forensic infrastructure — Prepare isolated environments for analyzing compromised models, poisoned training data, or adversarial inputs without risk of contamination.
Documentation
- Document model architectures — Include data flows, API endpoints, authentication mechanisms, and integration points for every deployed model.
- Maintain vendor contact lists — For models using third-party APIs (OpenAI, Anthropic, etc.), keep current contact information for their security teams and know their incident reporting procedures.
- Create playbooks for common scenarios — Develop specific response procedures for prompt injection, data poisoning, model theft, adversarial attacks, and supply chain compromise.
Phase 2: Detection and Triage
The first hours of an AI incident are critical. Rapid detection and accurate triage determine everything that follows.
Initial Detection Signals
AI incidents are typically detected through one of these channels:
| Detection Source | Example Signals | Typical Response Time |
|---|---|---|
| Automated monitoring | Output distribution shift, latency spike, refusal rate change | Minutes |
| User reports | Unexpected model behavior, harmful outputs, data leakage in responses | Hours |
| Security team detection | Suspicious API patterns, unusual access, anomalous token usage | Hours to days |
| External notification | Researcher disclosure, vendor advisory, media report | Days to weeks |
| Audit findings | Training data contamination, configuration drift, access control gaps | Weeks to months |
Triage Checklist
- Assign incident severity — Use a consistent scale. Suggested AI-specific severity criteria:
- Determine scope — Which models are affected? Which environments? How many users have been exposed? What data has the model had access to?
- Identify attack vector — Is this a prompt injection, data poisoning, model manipulation, supply chain compromise, or infrastructure attack?
- Assess data exposure — Has the model leaked sensitive data? Has training data been compromised? Are there privacy implications?
- Notify stakeholders — Alert the incident response team, management chain, and legal counsel based on severity.
- Begin incident log — Start documenting every action, finding, and decision with timestamps. This log is critical for post-incident review and potential regulatory reporting.
Evidence Preservation
- Capture current model state — Save model weights, configuration, and deployment parameters before any changes.
- Preserve logs — Ensure logging systems are writing to immutable storage. Copy relevant logs to a secure forensic environment.
- Save adversarial inputs — If specific attack inputs have been identified, preserve them with full context including headers, session data, and preceding conversation.
- Screenshot or record anomalous behavior — Visual evidence of unexpected model behavior can be invaluable for investigation and reporting.
- Document the timeline — When was the anomaly first observed? When was it reported? Establish a preliminary timeline.
Phase 3: Containment
Containment limits the blast radius. The goal is to stop the incident from getting worse while preserving evidence for investigation.
Immediate Containment (First Hour)
- Assess whether to take the model offline — Weigh the impact of downtime against the risk of continued exploitation. For P1 incidents, default to taking the model offline.
- Implement input filtering — If the attack vector is known, deploy input filters to block similar payloads. This is a temporary measure, not a fix.
- Restrict model access — Reduce the model's tool access, API permissions, or data source connections to minimize potential damage.
- Isolate affected systems — If the model has been used to access other systems via tool use or agents, isolate those systems for investigation.
- Revoke compromised credentials — If the model has been used to exfiltrate API keys, tokens, or credentials, revoke them immediately.
- Enable enhanced logging — Increase logging verbosity on affected systems to capture additional evidence.
Short-Term Containment (First Day)
- Deploy a clean model version — If a known-good model version exists, deploy it as a replacement. Verify its integrity before deployment.
- Implement additional monitoring — Set up targeted monitoring for the specific attack pattern and related variations.
- Review and restrict integrations — Audit all systems the affected model connects to. Disable non-essential integrations.
- Communicate with affected users — If users have been exposed to harmful outputs or their data has been compromised, notify them according to your communication plan.
- Coordinate with vendors — If the incident involves a third-party model API, report the incident to the vendor's security team.
Phase 4: Investigation
Investigation determines what happened, how it happened, and the full extent of the impact.
Attack Vector Analysis
- Reconstruct the attack timeline — Using logs, identify the first malicious input, the progression of the attack, and all affected interactions.
- Classify the attack type — Map the attack to known categories:
| Attack Category | Investigation Focus |
|---|---|
| Prompt injection | Input logs, system prompt integrity, output analysis |
| Data poisoning | Training data provenance, data pipeline integrity, model behavior comparison |
| Model theft/extraction | API access logs, unusual query patterns, membership inference evidence |
| Adversarial inputs | Input characteristics, model decision boundaries, transferability testing |
| Supply chain | Dependency integrity, model weight verification, configuration audit |
| Infrastructure | Access logs, network traffic, container/VM integrity |
- Identify the attacker's objectives — Was this data exfiltration, service disruption, reputation damage, or something else? Understanding objectives helps predict what else the attacker may have done.
- Determine the entry point — How did the attacker gain access? Was it through the model's public API, a compromised integration, a supply chain vector, or insider access?
Impact Assessment
- Quantify data exposure — Determine exactly what data the model may have disclosed. Review output logs for PII, credentials, proprietary information, or training data leakage.
- Identify affected users — Build a list of users who interacted with the compromised model during the incident window.
- Assess downstream impact — If the model triggered actions via tool use or agent workflows, determine what actions were taken and their consequences.
- Evaluate model integrity — For poisoning attacks, assess whether the model's weights, fine-tuning, or RLHF data have been corrupted.
- Check for lateral movement — Determine whether the attacker used the AI system as a pivot point to access other systems.
Root Cause Analysis
- Identify the vulnerability — What specific weakness did the attacker exploit? Was it a missing input filter, insufficient access control, vulnerable dependency, or architectural flaw?
- Determine contributing factors — What environmental conditions enabled the attack? Missing monitoring, inadequate access controls, lack of input validation, or insufficient safety alignment?
- Assess detection gap — Why was the incident not detected sooner? What monitoring or alerting would have caught it earlier?
Phase 5: Remediation
Remediation fixes the vulnerability and restores normal operations.
Technical Remediation
- Fix the root cause — Implement a permanent fix for the vulnerability, not just a workaround. This may involve model retraining, architecture changes, or infrastructure hardening.
- Validate the fix — Test the remediation against the original attack and known variations. Use red team testing to verify the fix is effective.
- Deploy the fix through normal change management — Unless urgency requires it, follow standard deployment procedures to avoid introducing new issues.
- Update detection rules — Add monitoring and alerting for the attack pattern and its variations.
- Retrain if necessary — If training data was poisoned, retrain the model from clean data. Verify data provenance before retraining.
- Update safety alignment — If the attack exploited a safety bypass, work with the alignment team to address the gap.
- Patch dependencies — If the attack exploited a vulnerability in a dependency such as a model serving framework or vector database, update to patched versions.
Process Remediation
- Update incident response playbooks — Incorporate lessons learned into your response procedures.
- Revise monitoring coverage — Address detection gaps identified during the investigation.
- Strengthen access controls — Tighten permissions based on what the investigation revealed about access patterns.
- Update training programs — Ensure the team is trained on the attack techniques encountered.
Recovery
- Restore normal operations — Remove temporary containment measures and return to standard operating procedures.
- Verify system integrity — Confirm that all affected systems are functioning correctly and securely.
- Monitor for recurrence — Maintain enhanced monitoring for a defined period, typically 30 to 90 days, to detect any recurrence.
Phase 6: Post-Incident Review
The post-incident review is where organizations learn and improve. Skip this phase at your peril.
Conduct the Review
- Schedule a blameless retrospective — Hold the review within two weeks of incident closure. Include all team members who participated in the response.
- Review the timeline — Walk through the incident chronologically, from detection to resolution. Identify what went well and what could improve.
- Evaluate response effectiveness — How quickly was the incident detected? How effective was containment? Were the right people involved at the right times?
- Identify systemic issues — Look beyond the specific incident for patterns that could lead to future incidents.
Documentation and Reporting
- Write an incident report — Document the incident comprehensively, including timeline, impact, root cause, remediation, and lessons learned.
- Update the model risk register — Add the identified vulnerability and any new risk factors to your model risk tracking system.
- File regulatory reports — If required by regulation such as GDPR or HIPAA, file incident reports within mandated timeframes.
- Share findings appropriately — Consider sharing anonymized findings with the broader AI security community to help others defend against similar attacks.
- Track remediation actions — Assign owners and deadlines for all remediation items. Review progress in regular security meetings.
Metrics to Track
| Metric | Description | Target |
|---|---|---|
| Mean Time to Detect (MTTD) | Time from incident start to detection | Less than 1 hour for P1 |
| Mean Time to Contain (MTTC) | Time from detection to containment | Less than 4 hours for P1 |
| Mean Time to Remediate (MTTR) | Time from detection to full remediation | Less than 72 hours for P1 |
| Detection method | How the incident was detected | Automated monitoring preferred |
| Evidence quality | Was sufficient evidence available for investigation? | Complete logs for all interactions |
| Stakeholder notification time | Time from detection to stakeholder notification | Less than 1 hour for P1 |
AI-Specific Incident Scenarios
Scenario: Prompt Injection at Scale
When automated prompt injection is detected affecting multiple users simultaneously:
- Immediately enable input rate limiting
- Deploy prompt injection detection filters
- Review all outputs generated during the attack window
- Identify and notify affected users
- Analyze attack payloads for data exfiltration attempts
- Check whether the model's tool access was exploited
Scenario: Training Data Poisoning Discovery
When poisoned data is discovered in a production model's training set:
- Determine when the poisoned data entered the pipeline
- Identify all models trained on the affected dataset
- Compare model behavior before and after the poisoning
- Retrain affected models from verified clean data
- Audit the data pipeline for the injection vector
- Implement data provenance controls
Scenario: Model Supply Chain Compromise
When a third-party model or dependency is found to be compromised:
- Identify all systems using the compromised component
- Verify integrity of local copies against known-good checksums
- Isolate affected systems
- Deploy verified alternatives
- Audit actions taken by systems using the compromised component
- Report to the affected vendor and relevant ISACs
Quick Reference: First 60 Minutes
For teams that need a rapid-start guide, here are the critical first-hour actions:
- Minute 0-5: Confirm the incident is real and not a false positive
- Minute 5-10: Assign severity, notify incident commander
- Minute 10-15: Begin evidence preservation including logs and model state
- Minute 15-30: Implement immediate containment by restricting access and filtering inputs
- Minute 30-45: Assess scope and identify affected users and systems
- Minute 45-60: Brief stakeholders, begin investigation
The first hour sets the trajectory for the entire response. Move quickly but methodically, and document everything.