模型 Compromise Incident Response Playbook
Playbook for responding to a compromised AI model: isolation procedures, replacement strategies, supply chain investigation, retraining considerations, and integrity restoration.
Model Compromise Incident Response Playbook
This playbook covers incidents where the AI model itself -- its weights, adapters, 分詞器, or 訓練 pipeline -- has been compromised. Model compromise is among the most severe AI incidents 因為 模型 is the core component trusted to produce safe, accurate outputs. A compromised model can affect every user and every conversation until it is replaced.
Trigger Criteria
Activate this playbook when:
- Behavioral analysis shows model changes not explained by application-layer configuration
- File integrity checks reveal unauthorized modifications to model artifacts
- 供應鏈 alert indicates compromise of a model provider, hosting platform, or distribution channel
- 後門 behavior is detected through trigger scanning or activation analysis
- A third-party model provenance investigation reveals tampering
Immediate Actions (First 30 Minutes)
Acknowledge and assign with expanded team
Model compromise typically requires a broader response team than other AI incidents:
Incident ID: AI-IR-[YYYY]-[NNNN] Type: Model Compromise Detected: [UTC timestamp] IC: [Name] AI Investigator: [Name] ← ML engineer with model internals expertise Infrastructure Lead: [Name] ← For deployment and isolation Supply Chain Investigator: [Name] ← For provenance investigation Evidence Custodian: [Name]Preserve the compromised model
Before isolation or replacement, preserve the compromised model as evidence:
- Snapshot all model weight files with SHA-256 checksums
- Capture all adapter files (LoRA, QLoRA)
- Capture 分詞器 files and configuration
- Record 模型 serving configuration (quantization, batching, etc.)
- Capture the deployment manifest showing when this model version was deployed
- Record 模型's provenance: where it was obtained, when, by whom
- Preserve 微調 logs and 訓練資料 references if applicable
- Store everything in isolated, read-only evidence storage
# Preserve model artifacts EVIDENCE_DIR="/evidence/AI-IR-$(date +%Y)-NNNN/model" mkdir -p "$EVIDENCE_DIR" # Copy model files with integrity verification cp -r /models/production/current/* "$EVIDENCE_DIR/" sha256sum "$EVIDENCE_DIR"/* > "$EVIDENCE_DIR/checksums.sha256" # Record deployment state kubectl get deployment model-service -o yaml > "$EVIDENCE_DIR/deployment.yaml" kubectl describe pod -l app=model-service > "$EVIDENCE_DIR/pod_describe.txt"Isolate the compromised model
Remove the compromised model from production while maintaining service if possible:
Isolation Strategy When to Use Impact Hot swap to fallback model Fallback model available and tested Minimal downtime; fallback may have different capabilities Roll back to previous version Previous version known-good and available Requires version to be cached; may lose legitimate updates Graceful shutdown No fallback available; severity justifies downtime Service interruption; clear containment Feature flag disable AI features can be disabled while non-AI features continue Partial service disruption Decision framework:
- If 模型 is producing actively harmful outputs → immediate shutdown or swap
- If 模型 has a 後門 that has not yet been triggered → swap with urgency but plan the transition
- If the compromise is suspected but unconfirmed → enable enhanced 監控 while investigating
評估 blast radius
Determine the scope of impact:
Question Investigation Method Implication How long was the compromised model in production? Deployment logs, version history Defines the investigation time window How many users interacted with it? Request logs, user counts Defines the affected population Did 模型 take any harmful actions? Tool call logs, 輸出 review Determines whether damage extends beyond 模型 Were model outputs used by downstream systems? Integration logs, data flow maps May need to recall downstream outputs Is the same compromised artifact used by other systems? Deployment inventory Other systems may also be affected
Investigation
Supply Chain Investigation
Map 模型's provenance chain
Document every step from 模型's creation to your deployment:
## Model Provenance Chain 1. **Base model**: [provider, model name, version, release date] - Source: [URL or registry] - Downloaded by: [person/system] - Downloaded on: [date] - Original checksums verified: [yes/no] 2. **微調**: [if applicable] - 訓練資料: [source, description] - Training infrastructure: [platform, who had access] - Training job ID: [ID] - 輸出 checksums: [hashes] 3. **Post-processing**: [quantization, conversion, etc.] - Process: [description] - Tools used: [versions] - Who performed: [person/system] - 輸出 checksums: [hashes] 4. **Deployment**: - Deployed by: [person/CI system] - Deployed on: [date] - Deployment method: [description] - Runtime configuration: [description]識別 the compromise point
Compare checksums at each step of the provenance chain to 識別 where the modification occurred:
Compromise Point Indicators Investigation Focus Provider compromise Checksums mismatch from source Contact provider; check for broader industry impact Download tampering Checksums match source but not your stored copy Investigate network 安全, man-in-the-middle 微調 compromise Pre-微調 checksums OK, post-微調 differ from expected Investigate 訓練資料 and 訓練 infrastructure Post-processing tampering Modification after quantization or format conversion Investigate processing pipeline access controls Deployment compromise Stored model is clean, deployed model differs Investigate deployment pipeline, container 安全 Runtime modification Deployed model changed after deployment Investigate host 安全, write access to model files Investigate the compromise vector
Based on the compromise point, investigate the specific attack vector:
Vector Investigation Steps Malicious 訓練資料 Audit 訓練資料 for poisoned examples; check data source integrity Compromised 微調 pipeline Review pipeline access logs; check for unauthorized modifications Malicious adapter Inspect adapter provenance, rank, and target modules Compromised model registry Audit registry access; check for unauthorized pushes Container compromise Investigate container image history; scan for malware Insider threat Review access logs for model files and 訓練 infrastructure
Behavioral Forensics
While investigating the 供應鏈, perform behavioral analysis of the compromised model.
| Analysis | Objective | Method |
|---|---|---|
| 安全 benchmark | Quantify 安全 degradation | Run standard 安全 eval suite; compare to known-good baseline |
| Trigger scanning | 識別 後門 triggers | Apply 後門 偵測 procedures |
| Behavior diffing | 識別 all behavioral changes | Apply Behavior Diffing procedures |
| 輸出 audit | Review historical outputs for harmful content | Sample and review outputs from the compromise window |
| Tool call audit | Check for unauthorized actions | Review all tool calls during the compromise window |
Remediation
Model Replacement Decision Matrix
| Factor | Replace with Previous Version | Replace with New Model | Retrain |
|---|---|---|---|
| Previous version available and clean | Preferred | If previous version inadequate | If custom model required |
| Compromise in 微調 data | N/A | If base model is clean | Required with clean data |
| Provider-level compromise | If earlier version unaffected | Switch providers | Switch providers and retrain |
| Performance requirements | If previous version meets requirements | If upgrade needed | If custom capabilities needed |
| Timeline | Hours | Hours to days | Days to weeks |
Retraining Considerations
If retraining is required:
| Consideration | Action |
|---|---|
| 訓練資料 audit | Review all 訓練資料 for poisoned examples; remove and replace |
| Training infrastructure 安全 | Audit and harden the 訓練 environment before retraining |
| Training process isolation | Use isolated, audited compute for the retraining job |
| Validation protocol | Define comprehensive behavioral validation before deployment |
| Incremental verification | Checkpoint and 評估 during 訓練, not just at the end |
| 輸出 comparison | Compare retrained model against both the compromised and known-good versions |
Verification
Integrity Verification
| Check | Procedure | Pass Criteria |
|---|---|---|
| File integrity | Compare all model file checksums against known-good values | 100% match |
| Architecture validation | Verify layer counts, hidden sizes, vocabulary size | Matches specification |
| Adapter audit | Verify only authorized adapters are loaded | No unauthorized adapters |
| Configuration verification | Verify all serving configuration parameters | Matches approved configuration |
Behavioral Verification
| Check | Procedure | Pass Criteria |
|---|---|---|
| 安全 benchmarks | Run full 安全 評估 suite | Scores within 5% of known-good baseline |
| Capability benchmarks | Run standard capability evaluations | Scores within expected range |
| Trigger scanning | Run known 後門 trigger patterns | No anomalous responses |
| 越獄 resistance | Run 越獄 評估 suite | Refusal rates within baseline |
| Production smoke 測試 | Sample of representative production queries | Normal behavior confirmed |
Post-Mortem Checklist
| # | Item | Status |
|---|---|---|
| 1 | Compromise point in 供應鏈 identified | |
| 2 | 攻擊 vector determined and documented | |
| 3 | Blast radius quantified (users, time, actions) | |
| 4 | All outputs during compromise window reviewed or sampled | |
| 5 | Compromised model preserved as evidence | |
| 6 | Replacement model deployed and verified | |
| 7 | 供應鏈 controls strengthened at the compromise point | |
| 8 | Model integrity 監控 implemented or enhanced | |
| 9 | Other models/systems checked for similar compromise | |
| 10 | Vendor notification (if third-party model) completed | |
| 11 | Downstream system outputs reviewed and corrected if needed | |
| 12 | Post-mortem shared with team and stakeholders |
相關主題
- Model Forensics -- detailed model investigation techniques
- 後門 偵測 -- 後門 scanning procedures
- Tampering 偵測 -- file integrity verification
- Model Snapshots -- preserving model evidence
- Infrastructure & Supply Chain -- 供應鏈 attack vectors
參考文獻
- "Model Supply Chain 安全" - MITRE ATLAS (2025) - AI 供應鏈 attack framework
- "Securing the ML Pipeline" - Google (2024) - ML pipeline 安全 best practices
- "SLSA: Supply-chain Levels for Software Artifacts" - OpenSSF (2024) - 供應鏈 安全 framework applicable to model artifacts
- "後門 攻擊 and 防禦 in Deep Learning" - IEEE S&P (2025) - Comprehensive 後門 attack and 防禦 survey
A compromised model was caused by poisoned 微調 data. You plan to retrain to fix it. What must you do before retraining?