Model Compromise Incident Response Playbook
Playbook for responding to a compromised AI model: isolation procedures, replacement strategies, supply chain investigation, retraining considerations, and integrity restoration.
Model Compromise Incident Response Playbook
This playbook covers incidents where the AI model itself -- its weights, adapters, tokenizer, or training pipeline -- has been compromised. Model compromise is among the most severe AI incidents because the model is the core component trusted to produce safe, accurate outputs. A compromised model can affect every user and every conversation until it is replaced.
Trigger Criteria
Activate this playbook when:
- Behavioral analysis shows model changes not explained by application-layer configuration
- File integrity checks reveal unauthorized modifications to model artifacts
- Supply chain alert indicates compromise of a model provider, hosting platform, or distribution channel
- Backdoor behavior is detected through trigger scanning or activation analysis
- A third-party model provenance investigation reveals tampering
Immediate Actions (First 30 Minutes)
Acknowledge and assign with expanded team
Model compromise typically requires a broader response team than other AI incidents:
Incident ID: AI-IR-[YYYY]-[NNNN] Type: Model Compromise Detected: [UTC timestamp] IC: [Name] AI Investigator: [Name] ← ML engineer with model internals expertise Infrastructure Lead: [Name] ← For deployment and isolation Supply Chain Investigator: [Name] ← For provenance investigation Evidence Custodian: [Name]Preserve the compromised model
Before isolation or replacement, preserve the compromised model as evidence:
- Snapshot all model weight files with SHA-256 checksums
- Capture all adapter files (LoRA, QLoRA)
- Capture tokenizer files and configuration
- Record the model serving configuration (quantization, batching, etc.)
- Capture the deployment manifest showing when this model version was deployed
- Record the model's provenance: where it was obtained, when, by whom
- Preserve fine-tuning logs and training data references if applicable
- Store everything in isolated, read-only evidence storage
# Preserve model artifacts EVIDENCE_DIR="/evidence/AI-IR-$(date +%Y)-NNNN/model" mkdir -p "$EVIDENCE_DIR" # Copy model files with integrity verification cp -r /models/production/current/* "$EVIDENCE_DIR/" sha256sum "$EVIDENCE_DIR"/* > "$EVIDENCE_DIR/checksums.sha256" # Record deployment state kubectl get deployment model-service -o yaml > "$EVIDENCE_DIR/deployment.yaml" kubectl describe pod -l app=model-service > "$EVIDENCE_DIR/pod_describe.txt"Isolate the compromised model
Remove the compromised model from production while maintaining service if possible:
Isolation Strategy When to Use Impact Hot swap to fallback model Fallback model available and tested Minimal downtime; fallback may have different capabilities Roll back to previous version Previous version known-good and available Requires version to be cached; may lose legitimate updates Graceful shutdown No fallback available; severity justifies downtime Service interruption; clear containment Feature flag disable AI features can be disabled while non-AI features continue Partial service disruption Decision framework:
- If the model is producing actively harmful outputs → immediate shutdown or swap
- If the model has a backdoor that has not yet been triggered → swap with urgency but plan the transition
- If the compromise is suspected but unconfirmed → enable enhanced monitoring while investigating
Assess blast radius
Determine the scope of impact:
Question Investigation Method Implication How long was the compromised model in production? Deployment logs, version history Defines the investigation time window How many users interacted with it? Request logs, user counts Defines the affected population Did the model take any harmful actions? Tool call logs, output review Determines whether damage extends beyond the model Were model outputs used by downstream systems? Integration logs, data flow maps May need to recall downstream outputs Is the same compromised artifact used by other systems? Deployment inventory Other systems may also be affected
Investigation
Supply Chain Investigation
Map the model's provenance chain
Document every step from the model's creation to your deployment:
## Model Provenance Chain 1. **Base model**: [provider, model name, version, release date] - Source: [URL or registry] - Downloaded by: [person/system] - Downloaded on: [date] - Original checksums verified: [yes/no] 2. **Fine-tuning**: [if applicable] - Training data: [source, description] - Training infrastructure: [platform, who had access] - Training job ID: [ID] - Output checksums: [hashes] 3. **Post-processing**: [quantization, conversion, etc.] - Process: [description] - Tools used: [versions] - Who performed: [person/system] - Output checksums: [hashes] 4. **Deployment**: - Deployed by: [person/CI system] - Deployed on: [date] - Deployment method: [description] - Runtime configuration: [description]Identify the compromise point
Compare checksums at each step of the provenance chain to identify where the modification occurred:
Compromise Point Indicators Investigation Focus Provider compromise Checksums mismatch from source Contact provider; check for broader industry impact Download tampering Checksums match source but not your stored copy Investigate network security, man-in-the-middle Fine-tuning compromise Pre-fine-tuning checksums OK, post-fine-tuning differ from expected Investigate training data and training infrastructure Post-processing tampering Modification after quantization or format conversion Investigate processing pipeline access controls Deployment compromise Stored model is clean, deployed model differs Investigate deployment pipeline, container security Runtime modification Deployed model changed after deployment Investigate host security, write access to model files Investigate the compromise vector
Based on the compromise point, investigate the specific attack vector:
Vector Investigation Steps Malicious training data Audit training data for poisoned examples; check data source integrity Compromised fine-tuning pipeline Review pipeline access logs; check for unauthorized modifications Malicious adapter Inspect adapter provenance, rank, and target modules Compromised model registry Audit registry access; check for unauthorized pushes Container compromise Investigate container image history; scan for malware Insider threat Review access logs for model files and training infrastructure
Behavioral Forensics
While investigating the supply chain, perform behavioral analysis of the compromised model.
| Analysis | Objective | Method |
|---|---|---|
| Safety benchmark | Quantify safety degradation | Run standard safety eval suite; compare to known-good baseline |
| Trigger scanning | Identify backdoor triggers | Apply Backdoor Detection procedures |
| Behavior diffing | Identify all behavioral changes | Apply Behavior Diffing procedures |
| Output audit | Review historical outputs for harmful content | Sample and review outputs from the compromise window |
| Tool call audit | Check for unauthorized actions | Review all tool calls during the compromise window |
Remediation
Model Replacement Decision Matrix
| Factor | Replace with Previous Version | Replace with New Model | Retrain |
|---|---|---|---|
| Previous version available and clean | Preferred | If previous version inadequate | If custom model required |
| Compromise in fine-tuning data | N/A | If base model is clean | Required with clean data |
| Provider-level compromise | If earlier version unaffected | Switch providers | Switch providers and retrain |
| Performance requirements | If previous version meets requirements | If upgrade needed | If custom capabilities needed |
| Timeline | Hours | Hours to days | Days to weeks |
Retraining Considerations
If retraining is required:
| Consideration | Action |
|---|---|
| Training data audit | Review all training data for poisoned examples; remove and replace |
| Training infrastructure security | Audit and harden the training environment before retraining |
| Training process isolation | Use isolated, audited compute for the retraining job |
| Validation protocol | Define comprehensive behavioral validation before deployment |
| Incremental verification | Checkpoint and evaluate during training, not just at the end |
| Output comparison | Compare retrained model against both the compromised and known-good versions |
Verification
Integrity Verification
| Check | Procedure | Pass Criteria |
|---|---|---|
| File integrity | Compare all model file checksums against known-good values | 100% match |
| Architecture validation | Verify layer counts, hidden sizes, vocabulary size | Matches specification |
| Adapter audit | Verify only authorized adapters are loaded | No unauthorized adapters |
| Configuration verification | Verify all serving configuration parameters | Matches approved configuration |
Behavioral Verification
| Check | Procedure | Pass Criteria |
|---|---|---|
| Safety benchmarks | Run full safety evaluation suite | Scores within 5% of known-good baseline |
| Capability benchmarks | Run standard capability evaluations | Scores within expected range |
| Trigger scanning | Run known backdoor trigger patterns | No anomalous responses |
| Jailbreak resistance | Run jailbreak evaluation suite | Refusal rates within baseline |
| Production smoke test | Sample of representative production queries | Normal behavior confirmed |
Post-Mortem Checklist
| # | Item | Status |
|---|---|---|
| 1 | Compromise point in supply chain identified | |
| 2 | Attack vector determined and documented | |
| 3 | Blast radius quantified (users, time, actions) | |
| 4 | All outputs during compromise window reviewed or sampled | |
| 5 | Compromised model preserved as evidence | |
| 6 | Replacement model deployed and verified | |
| 7 | Supply chain controls strengthened at the compromise point | |
| 8 | Model integrity monitoring implemented or enhanced | |
| 9 | Other models/systems checked for similar compromise | |
| 10 | Vendor notification (if third-party model) completed | |
| 11 | Downstream system outputs reviewed and corrected if needed | |
| 12 | Post-mortem shared with team and stakeholders |
Related Topics
- Model Forensics -- detailed model investigation techniques
- Backdoor Detection -- backdoor scanning procedures
- Tampering Detection -- file integrity verification
- Model Snapshots -- preserving model evidence
- Infrastructure & Supply Chain -- supply chain attack vectors
References
- "Model Supply Chain Security" - MITRE ATLAS (2025) - AI supply chain attack framework
- "Securing the ML Pipeline" - Google (2024) - ML pipeline security best practices
- "SLSA: Supply-chain Levels for Software Artifacts" - OpenSSF (2024) - Supply chain security framework applicable to model artifacts
- "Backdoor Attacks and Defenses in Deep Learning" - IEEE S&P (2025) - Comprehensive backdoor attack and defense survey
A compromised model was caused by poisoned fine-tuning data. You plan to retrain to fix it. What must you do before retraining?