Model Compromise Incident Response Playbook

advanced11 min readUpdated 2026-03-15

Playbook for responding to a compromised AI model: isolation procedures, replacement strategies, supply chain investigation, retraining considerations, and integrity restoration.

model-compromise playbook supply-chain incident-response

Model Compromise Incident Response Playbook

This playbook covers incidents where the AI model itself -- its weights, adapters, tokenizer, or training pipeline -- has been compromised. Model compromise is among the most severe AI incidents because the model is the core component trusted to produce safe, accurate outputs. A compromised model can affect every user and every conversation until it is replaced.

Trigger Criteria

Activate this playbook when:

Behavioral analysis shows model changes not explained by application-layer configuration
File integrity checks reveal unauthorized modifications to model artifacts
Supply chain alert indicates compromise of a model provider, hosting platform, or distribution channel
Backdoor behavior is detected through trigger scanning or activation analysis
A third-party model provenance investigation reveals tampering

Immediate Actions (First 30 Minutes)

Acknowledge and assign with expanded team

Model compromise typically requires a broader response team than other AI incidents:

Incident ID: AI-IR-[YYYY]-[NNNN]
Type: Model Compromise
Detected: [UTC timestamp]
IC: [Name]
AI Investigator: [Name] ← ML engineer with model internals expertise
Infrastructure Lead: [Name] ← For deployment and isolation
Supply Chain Investigator: [Name] ← For provenance investigation
Evidence Custodian: [Name]

Preserve the compromised model
Before isolation or replacement, preserve the compromised model as evidence:
- Snapshot all model weight files with SHA-256 checksums
- Capture all adapter files (LoRA, QLoRA)
- Capture tokenizer files and configuration
- Record the model serving configuration (quantization, batching, etc.)
- Capture the deployment manifest showing when this model version was deployed
- Record the model's provenance: where it was obtained, when, by whom
- Preserve fine-tuning logs and training data references if applicable
- Store everything in isolated, read-only evidence storage
# Preserve model artifacts EVIDENCE_DIR="/evidence/AI-IR-$(date +%Y)-NNNN/model" mkdir -p "$EVIDENCE_DIR" # Copy model files with integrity verification cp -r /models/production/current/* "$EVIDENCE_DIR/" sha256sum "$EVIDENCE_DIR"/* > "$EVIDENCE_DIR/checksums.sha256" # Record deployment state kubectl get deployment model-service -o yaml > "$EVIDENCE_DIR/deployment.yaml" kubectl describe pod -l app=model-service > "$EVIDENCE_DIR/pod_describe.txt"

Isolate the compromised model

Remove the compromised model from production while maintaining service if possible:

Isolation Strategy	When to Use	Impact
Hot swap to fallback model	Fallback model available and tested	Minimal downtime; fallback may have different capabilities
Roll back to previous version	Previous version known-good and available	Requires version to be cached; may lose legitimate updates
Graceful shutdown	No fallback available; severity justifies downtime	Service interruption; clear containment
Feature flag disable	AI features can be disabled while non-AI features continue	Partial service disruption

Decision framework:

If the model is producing actively harmful outputs → immediate shutdown or swap
If the model has a backdoor that has not yet been triggered → swap with urgency but plan the transition
If the compromise is suspected but unconfirmed → enable enhanced monitoring while investigating

Assess blast radius

Determine the scope of impact:

Question	Investigation Method	Implication
How long was the compromised model in production?	Deployment logs, version history	Defines the investigation time window
How many users interacted with it?	Request logs, user counts	Defines the affected population
Did the model take any harmful actions?	Tool call logs, output review	Determines whether damage extends beyond the model
Were model outputs used by downstream systems?	Integration logs, data flow maps	May need to recall downstream outputs
Is the same compromised artifact used by other systems?	Deployment inventory	Other systems may also be affected

Investigation

Supply Chain Investigation

Map the model's provenance chain

Document every step from the model's creation to your deployment:

## Model Provenance Chain
 
1. **Base model**: [provider, model name, version, release date]
   - Source: [URL or registry]
   - Downloaded by: [person/system]
   - Downloaded on: [date]
   - Original checksums verified: [yes/no]
 
2. **Fine-tuning**: [if applicable]
   - Training data: [source, description]
   - Training infrastructure: [platform, who had access]
   - Training job ID: [ID]
   - Output checksums: [hashes]
 
3. **Post-processing**: [quantization, conversion, etc.]
   - Process: [description]
   - Tools used: [versions]
   - Who performed: [person/system]
   - Output checksums: [hashes]
 
4. **Deployment**:
   - Deployed by: [person/CI system]
   - Deployed on: [date]
   - Deployment method: [description]
   - Runtime configuration: [description]

Identify the compromise point

Compare checksums at each step of the provenance chain to identify where the modification occurred:

Compromise Point	Indicators	Investigation Focus
Provider compromise	Checksums mismatch from source	Contact provider; check for broader industry impact
Download tampering	Checksums match source but not your stored copy	Investigate network security, man-in-the-middle
Fine-tuning compromise	Pre-fine-tuning checksums OK, post-fine-tuning differ from expected	Investigate training data and training infrastructure
Post-processing tampering	Modification after quantization or format conversion	Investigate processing pipeline access controls
Deployment compromise	Stored model is clean, deployed model differs	Investigate deployment pipeline, container security
Runtime modification	Deployed model changed after deployment	Investigate host security, write access to model files

Investigate the compromise vector

Based on the compromise point, investigate the specific attack vector:

Vector	Investigation Steps
Malicious training data	Audit training data for poisoned examples; check data source integrity
Compromised fine-tuning pipeline	Review pipeline access logs; check for unauthorized modifications
Malicious adapter	Inspect adapter provenance, rank, and target modules
Compromised model registry	Audit registry access; check for unauthorized pushes
Container compromise	Investigate container image history; scan for malware
Insider threat	Review access logs for model files and training infrastructure

Behavioral Forensics

While investigating the supply chain, perform behavioral analysis of the compromised model.

Analysis	Objective	Method
Safety benchmark	Quantify safety degradation	Run standard safety eval suite; compare to known-good baseline
Trigger scanning	Identify backdoor triggers	Apply Backdoor Detection procedures
Behavior diffing	Identify all behavioral changes	Apply Behavior Diffing procedures
Output audit	Review historical outputs for harmful content	Sample and review outputs from the compromise window
Tool call audit	Check for unauthorized actions	Review all tool calls during the compromise window

Remediation

Model Replacement Decision Matrix

Factor	Replace with Previous Version	Replace with New Model	Retrain
Previous version available and clean	Preferred	If previous version inadequate	If custom model required
Compromise in fine-tuning data	N/A	If base model is clean	Required with clean data
Provider-level compromise	If earlier version unaffected	Switch providers	Switch providers and retrain
Performance requirements	If previous version meets requirements	If upgrade needed	If custom capabilities needed
Timeline	Hours	Hours to days	Days to weeks

Retraining Considerations

If retraining is required:

Consideration	Action
Training data audit	Review all training data for poisoned examples; remove and replace
Training infrastructure security	Audit and harden the training environment before retraining
Training process isolation	Use isolated, audited compute for the retraining job
Validation protocol	Define comprehensive behavioral validation before deployment
Incremental verification	Checkpoint and evaluate during training, not just at the end
Output comparison	Compare retrained model against both the compromised and known-good versions

Verification

Integrity Verification

Check	Procedure	Pass Criteria
File integrity	Compare all model file checksums against known-good values	100% match
Architecture validation	Verify layer counts, hidden sizes, vocabulary size	Matches specification
Adapter audit	Verify only authorized adapters are loaded	No unauthorized adapters
Configuration verification	Verify all serving configuration parameters	Matches approved configuration

Behavioral Verification

Check	Procedure	Pass Criteria
Safety benchmarks	Run full safety evaluation suite	Scores within 5% of known-good baseline
Capability benchmarks	Run standard capability evaluations	Scores within expected range
Trigger scanning	Run known backdoor trigger patterns	No anomalous responses
Jailbreak resistance	Run jailbreak evaluation suite	Refusal rates within baseline
Production smoke test	Sample of representative production queries	Normal behavior confirmed

Post-Mortem Checklist

#	Item	Status
1	Compromise point in supply chain identified
2	Attack vector determined and documented
3	Blast radius quantified (users, time, actions)
4	All outputs during compromise window reviewed or sampled
5	Compromised model preserved as evidence
6	Replacement model deployed and verified
7	Supply chain controls strengthened at the compromise point
8	Model integrity monitoring implemented or enhanced
9	Other models/systems checked for similar compromise
10	Vendor notification (if third-party model) completed
11	Downstream system outputs reviewed and corrected if needed
12	Post-mortem shared with team and stakeholders

Model Forensics -- detailed model investigation techniques
Backdoor Detection -- backdoor scanning procedures
Tampering Detection -- file integrity verification
Model Snapshots -- preserving model evidence
Infrastructure & Supply Chain -- supply chain attack vectors

References

"Model Supply Chain Security" - MITRE ATLAS (2025) - AI supply chain attack framework
"Securing the ML Pipeline" - Google (2024) - ML pipeline security best practices
"SLSA: Supply-chain Levels for Software Artifacts" - OpenSSF (2024) - Supply chain security framework applicable to model artifacts
"Backdoor Attacks and Defenses in Deep Learning" - IEEE S&P (2025) - Comprehensive backdoor attack and defense survey

Knowledge Check

A compromised model was caused by poisoned fine-tuning data. You plan to retrain to fix it. What must you do before retraining?

Edit this page on GitHub

Model Compromise Incident Response Playbook

advanced11 min readUpdated 2026-03-15

Playbook for responding to a compromised AI model: isolation procedures, replacement strategies, supply chain investigation, retraining considerations, and integrity restoration.

model-compromise playbook supply-chain incident-response

Model Compromise Incident Response Playbook

Trigger Criteria

Activate this playbook when:

Behavioral analysis shows model changes not explained by application-layer configuration
File integrity checks reveal unauthorized modifications to model artifacts
Supply chain alert indicates compromise of a model provider, hosting platform, or distribution channel
Backdoor behavior is detected through trigger scanning or activation analysis
A third-party model provenance investigation reveals tampering

Immediate Actions (First 30 Minutes)

Acknowledge and assign with expanded team

Model compromise typically requires a broader response team than other AI incidents:

Incident ID: AI-IR-[YYYY]-[NNNN]
Type: Model Compromise
Detected: [UTC timestamp]
IC: [Name]
AI Investigator: [Name] ← ML engineer with model internals expertise
Infrastructure Lead: [Name] ← For deployment and isolation
Supply Chain Investigator: [Name] ← For provenance investigation
Evidence Custodian: [Name]

Preserve the compromised model
Before isolation or replacement, preserve the compromised model as evidence:
- Snapshot all model weight files with SHA-256 checksums
- Capture all adapter files (LoRA, QLoRA)
- Capture tokenizer files and configuration
- Record the model serving configuration (quantization, batching, etc.)
- Capture the deployment manifest showing when this model version was deployed
- Record the model's provenance: where it was obtained, when, by whom
- Preserve fine-tuning logs and training data references if applicable
- Store everything in isolated, read-only evidence storage
# Preserve model artifacts EVIDENCE_DIR="/evidence/AI-IR-$(date +%Y)-NNNN/model" mkdir -p "$EVIDENCE_DIR" # Copy model files with integrity verification cp -r /models/production/current/* "$EVIDENCE_DIR/" sha256sum "$EVIDENCE_DIR"/* > "$EVIDENCE_DIR/checksums.sha256" # Record deployment state kubectl get deployment model-service -o yaml > "$EVIDENCE_DIR/deployment.yaml" kubectl describe pod -l app=model-service > "$EVIDENCE_DIR/pod_describe.txt"

Isolate the compromised model

Remove the compromised model from production while maintaining service if possible:

Isolation Strategy	When to Use	Impact
Hot swap to fallback model	Fallback model available and tested	Minimal downtime; fallback may have different capabilities
Roll back to previous version	Previous version known-good and available	Requires version to be cached; may lose legitimate updates
Graceful shutdown	No fallback available; severity justifies downtime	Service interruption; clear containment
Feature flag disable	AI features can be disabled while non-AI features continue	Partial service disruption

Decision framework:

If the model is producing actively harmful outputs → immediate shutdown or swap
If the model has a backdoor that has not yet been triggered → swap with urgency but plan the transition
If the compromise is suspected but unconfirmed → enable enhanced monitoring while investigating

Assess blast radius

Determine the scope of impact:

Question	Investigation Method	Implication
How long was the compromised model in production?	Deployment logs, version history	Defines the investigation time window
How many users interacted with it?	Request logs, user counts	Defines the affected population
Did the model take any harmful actions?	Tool call logs, output review	Determines whether damage extends beyond the model
Were model outputs used by downstream systems?	Integration logs, data flow maps	May need to recall downstream outputs
Is the same compromised artifact used by other systems?	Deployment inventory	Other systems may also be affected

Investigation

Supply Chain Investigation

Map the model's provenance chain

Document every step from the model's creation to your deployment:

## Model Provenance Chain
 
1. **Base model**: [provider, model name, version, release date]
   - Source: [URL or registry]
   - Downloaded by: [person/system]
   - Downloaded on: [date]
   - Original checksums verified: [yes/no]
 
2. **Fine-tuning**: [if applicable]
   - Training data: [source, description]
   - Training infrastructure: [platform, who had access]
   - Training job ID: [ID]
   - Output checksums: [hashes]
 
3. **Post-processing**: [quantization, conversion, etc.]
   - Process: [description]
   - Tools used: [versions]
   - Who performed: [person/system]
   - Output checksums: [hashes]
 
4. **Deployment**:
   - Deployed by: [person/CI system]
   - Deployed on: [date]
   - Deployment method: [description]
   - Runtime configuration: [description]

Identify the compromise point

Compare checksums at each step of the provenance chain to identify where the modification occurred:

Compromise Point	Indicators	Investigation Focus
Provider compromise	Checksums mismatch from source	Contact provider; check for broader industry impact
Download tampering	Checksums match source but not your stored copy	Investigate network security, man-in-the-middle
Fine-tuning compromise	Pre-fine-tuning checksums OK, post-fine-tuning differ from expected	Investigate training data and training infrastructure
Post-processing tampering	Modification after quantization or format conversion	Investigate processing pipeline access controls
Deployment compromise	Stored model is clean, deployed model differs	Investigate deployment pipeline, container security
Runtime modification	Deployed model changed after deployment	Investigate host security, write access to model files

Investigate the compromise vector

Based on the compromise point, investigate the specific attack vector:

Vector	Investigation Steps
Malicious training data	Audit training data for poisoned examples; check data source integrity
Compromised fine-tuning pipeline	Review pipeline access logs; check for unauthorized modifications
Malicious adapter	Inspect adapter provenance, rank, and target modules
Compromised model registry	Audit registry access; check for unauthorized pushes
Container compromise	Investigate container image history; scan for malware
Insider threat	Review access logs for model files and training infrastructure

Behavioral Forensics

While investigating the supply chain, perform behavioral analysis of the compromised model.

Analysis	Objective	Method
Safety benchmark	Quantify safety degradation	Run standard safety eval suite; compare to known-good baseline
Trigger scanning	Identify backdoor triggers	Apply Backdoor Detection procedures
Behavior diffing	Identify all behavioral changes	Apply Behavior Diffing procedures
Output audit	Review historical outputs for harmful content	Sample and review outputs from the compromise window
Tool call audit	Check for unauthorized actions	Review all tool calls during the compromise window

Remediation

Model Replacement Decision Matrix

Factor	Replace with Previous Version	Replace with New Model	Retrain
Previous version available and clean	Preferred	If previous version inadequate	If custom model required
Compromise in fine-tuning data	N/A	If base model is clean	Required with clean data
Provider-level compromise	If earlier version unaffected	Switch providers	Switch providers and retrain
Performance requirements	If previous version meets requirements	If upgrade needed	If custom capabilities needed
Timeline	Hours	Hours to days	Days to weeks

Retraining Considerations

If retraining is required:

Consideration	Action
Training data audit	Review all training data for poisoned examples; remove and replace
Training infrastructure security	Audit and harden the training environment before retraining
Training process isolation	Use isolated, audited compute for the retraining job
Validation protocol	Define comprehensive behavioral validation before deployment
Incremental verification	Checkpoint and evaluate during training, not just at the end
Output comparison	Compare retrained model against both the compromised and known-good versions

Verification

Integrity Verification

Check	Procedure	Pass Criteria
File integrity	Compare all model file checksums against known-good values	100% match
Architecture validation	Verify layer counts, hidden sizes, vocabulary size	Matches specification
Adapter audit	Verify only authorized adapters are loaded	No unauthorized adapters
Configuration verification	Verify all serving configuration parameters	Matches approved configuration

Behavioral Verification

Check	Procedure	Pass Criteria
Safety benchmarks	Run full safety evaluation suite	Scores within 5% of known-good baseline
Capability benchmarks	Run standard capability evaluations	Scores within expected range
Trigger scanning	Run known backdoor trigger patterns	No anomalous responses
Jailbreak resistance	Run jailbreak evaluation suite	Refusal rates within baseline
Production smoke test	Sample of representative production queries	Normal behavior confirmed

Post-Mortem Checklist

#	Item	Status
1	Compromise point in supply chain identified
2	Attack vector determined and documented
3	Blast radius quantified (users, time, actions)
4	All outputs during compromise window reviewed or sampled
5	Compromised model preserved as evidence
6	Replacement model deployed and verified
7	Supply chain controls strengthened at the compromise point
8	Model integrity monitoring implemented or enhanced
9	Other models/systems checked for similar compromise
10	Vendor notification (if third-party model) completed
11	Downstream system outputs reviewed and corrected if needed
12	Post-mortem shared with team and stakeholders

Model Forensics -- detailed model investigation techniques
Backdoor Detection -- backdoor scanning procedures
Tampering Detection -- file integrity verification
Model Snapshots -- preserving model evidence
Infrastructure & Supply Chain -- supply chain attack vectors

References

"Model Supply Chain Security" - MITRE ATLAS (2025) - AI supply chain attack framework
"Securing the ML Pipeline" - Google (2024) - ML pipeline security best practices
"SLSA: Supply-chain Levels for Software Artifacts" - OpenSSF (2024) - Supply chain security framework applicable to model artifacts
"Backdoor Attacks and Defenses in Deep Learning" - IEEE S&P (2025) - Comprehensive backdoor attack and defense survey

Knowledge Check

A compromised model was caused by poisoned fine-tuning data. You plan to retrain to fix it. What must you do before retraining?

Edit this page on GitHub

Model Compromise Incident Response Playbook

Acknowledge and assign with expanded team

Preserve the compromised model

Isolate the compromised model

Assess blast radius

Map the model's provenance chain

Identify the compromise point

Investigate the compromise vector

Related articles

Model Compromise Incident Response Playbook

Acknowledge and assign with expanded team

Preserve the compromised model

Isolate the compromised model

Assess blast radius

Map the model's provenance chain

Identify the compromise point

Investigate the compromise vector

Related articles