Lab: Model Registry Compromise
Explore techniques for compromising model registries and substituting malicious models into production pipelines. Learn to detect model tampering, verify model provenance, and secure the model supply chain.
Prerequisites
- Understanding of model serialization formats (ONNX, SafeTensors, pickle, HDF5)
- Familiarity with model hosting platforms (Hugging Face Hub, MLflow, custom registries)
- Experience with cryptographic signing and hash verification
- Understanding of ML pipeline deployment workflows
- Python with ML serialization and verification libraries
Background
Model registries are central repositories where trained models are stored, versioned, and distributed for deployment. They are the ML equivalent of package registries like npm or PyPI -- and they face similar supply chain risks. A compromised model registry allows an attacker to substitute a backdoored model for a legitimate one, affecting every system that pulls from the registry.
Lab Exercises
Survey Model Registry Attack Surface
Identify how model registries can be compromised.
REGISTRY_ATTACK_VECTORS = { "credential_theft": { "description": "Steal credentials for registry access", "techniques": [ "API token leakage in CI/CD logs or code repos", "Phishing registry maintainers for credentials", "Exploiting overly permissive access controls", "Session hijacking on registry web interfaces", ], "impact": "Full read/write access to model artifacts", }, "model_substitution": { "description": "Replace a legitimate model with a backdoored version", "techniques": [ "Direct upload with stolen credentials", "Compromised CI/CD pipeline uploads malicious model", "Man-in-the-middle during model download", "DNS hijacking to redirect registry requests", ], "impact": "All downstream consumers receive the backdoored model", }, "dependency_confusion": { "description": "Exploit model naming to inject malicious models", "techniques": [ "Register public model with same name as private model", "Typosquatting on popular model names", "Namespace confusion between registries", ], "impact": "Applications pull the wrong model unknowingly", }, "deserialization_attacks": { "description": "Exploit unsafe model file formats to execute code", "techniques": [ "Pickle-based model files with embedded code execution", "Malicious custom layers in saved models", "Exploiting unsafe deserialization in model loading", ], "impact": "Arbitrary code execution when model is loaded", }, }Implement Model Substitution Attack
Demonstrate how a legitimate model can be replaced with a modified version that includes a backdoor.
#!/usr/bin/env python3 """Demonstrate model substitution in a registry.""" import hashlib import json from dataclasses import dataclass from pathlib import Path @dataclass class ModelArtifact: name: str version: str file_path: str hash: str metadata: dict class ModelSubstitutionDemo: """Demonstrate model substitution attack and defenses. Uses simulated registry -- not real model hosting.""" def __init__(self, registry_dir: str = "./mock_registry"): self.registry_dir = Path(registry_dir) self.registry_dir.mkdir(parents=True, exist_ok=True) def register_model(self, name: str, version: str, model_bytes: bytes, metadata: dict) -> ModelArtifact: """Register a model in the mock registry.""" model_hash = hashlib.sha256(model_bytes).hexdigest() model_path = self.registry_dir / f"{name}_{version}.bin" model_path.write_bytes(model_bytes) meta_path = self.registry_dir / f"{name}_{version}.json" artifact = ModelArtifact( name=name, version=version, file_path=str(model_path), hash=model_hash, metadata=metadata, ) meta_path.write_text(json.dumps({ "name": name, "version": version, "hash": model_hash, "metadata": metadata, })) return artifact def substitute_model(self, name: str, version: str, malicious_bytes: bytes) -> dict: """Demonstrate model substitution (attack simulation).""" model_path = self.registry_dir / f"{name}_{version}.bin" meta_path = self.registry_dir / f"{name}_{version}.json" if not model_path.exists(): return {"success": False, "reason": "Model not found"} original_hash = hashlib.sha256(model_path.read_bytes()).hexdigest() malicious_hash = hashlib.sha256(malicious_bytes).hexdigest() # Overwrite the model file model_path.write_bytes(malicious_bytes) # Update metadata hash (naive registry does not verify) metadata = json.loads(meta_path.read_text()) metadata["hash"] = malicious_hash meta_path.write_text(json.dumps(metadata)) return { "success": True, "original_hash": original_hash, "malicious_hash": malicious_hash, "detection": "Would be detected by independent hash verification", } def demonstrate_pickle_rce(self): """Show how pickle-based models can execute arbitrary code.""" # WARNING: This is a demonstration of the vulnerability, # not a tool for exploitation. Never unpickle untrusted data. malicious_pickle = ( b"cos\nsystem\n(S'echo COMPROMISED'\ntR." ) return { "payload": malicious_pickle.hex(), "explanation": ( "This pickle payload calls os.system('echo COMPROMISED') " "when deserialized. A real attack would install a backdoor, " "exfiltrate data, or modify model weights in memory." ), "affected_formats": [ "PyTorch .pt/.pth files (use pickle by default)", "sklearn joblib files", "Any custom model save using pickle", ], "safe_alternatives": [ "SafeTensors (no code execution, weights only)", "ONNX (standardized format, no arbitrary code)", "Explicit weight export (numpy arrays)", ], }Build Model Provenance Verification
Implement a system that verifies model integrity and provenance.
#!/usr/bin/env python3 """Model provenance verification system.""" import hashlib import json import hmac from dataclasses import dataclass from datetime import datetime, timezone from pathlib import Path @dataclass class ProvenanceRecord: model_name: str version: str file_hash: str signature: str training_config_hash: str training_data_hash: str builder_identity: str build_timestamp: str parent_version: str = "" class ModelProvenanceSystem: def __init__(self, signing_key: bytes): self.signing_key = signing_key def create_provenance(self, model_path: str, model_name: str, version: str, training_config: dict, training_data_hash: str, builder_id: str) -> ProvenanceRecord: """Create a signed provenance record for a model.""" model_bytes = Path(model_path).read_bytes() file_hash = hashlib.sha256(model_bytes).hexdigest() config_hash = hashlib.sha256( json.dumps(training_config, sort_keys=True).encode() ).hexdigest() # Create the record record = ProvenanceRecord( model_name=model_name, version=version, file_hash=file_hash, signature="", # Set after signing training_config_hash=config_hash, training_data_hash=training_data_hash, builder_identity=builder_id, build_timestamp=datetime.now(timezone.utc).isoformat(), ) # Sign the record sign_data = ( f"{record.model_name}:{record.version}:" f"{record.file_hash}:{record.training_config_hash}:" f"{record.training_data_hash}:{record.builder_identity}" ) record.signature = hmac.new( self.signing_key, sign_data.encode(), hashlib.sha256 ).hexdigest() return record def verify_provenance(self, model_path: str, record: ProvenanceRecord) -> dict: """Verify a model against its provenance record.""" model_bytes = Path(model_path).read_bytes() actual_hash = hashlib.sha256(model_bytes).hexdigest() # Verify file hash hash_valid = actual_hash == record.file_hash # Verify signature sign_data = ( f"{record.model_name}:{record.version}:" f"{record.file_hash}:{record.training_config_hash}:" f"{record.training_data_hash}:{record.builder_identity}" ) expected_sig = hmac.new( self.signing_key, sign_data.encode(), hashlib.sha256 ).hexdigest() sig_valid = hmac.compare_digest(record.signature, expected_sig) return { "hash_valid": hash_valid, "signature_valid": sig_valid, "overall_valid": hash_valid and sig_valid, "actual_hash": actual_hash, "expected_hash": record.file_hash, "model": f"{record.model_name}@{record.version}", "builder": record.builder_identity, "build_time": record.build_timestamp, }Implement Behavioral Fingerprinting
Hash-based verification catches file tampering but not sophisticated attacks where the attacker has signing access. Behavioral fingerprinting detects model changes by their outputs.
class BehavioralFingerprint: """Detect model substitution through behavioral analysis.""" def __init__(self, reference_prompts: list[str]): self.reference_prompts = reference_prompts def create_fingerprint(self, model) -> dict: """Create a behavioral fingerprint for a model.""" fingerprint = {} for prompt in self.reference_prompts: # Use temperature=0 for deterministic outputs response = model.generate(prompt, temperature=0) fingerprint[prompt] = { "response_hash": hashlib.sha256( response.encode() ).hexdigest()[:16], "response_length": len(response), "first_50_chars": response[:50], } return fingerprint def verify_fingerprint(self, model, expected: dict) -> dict: """Check if model behavior matches its fingerprint.""" mismatches = [] for prompt, expected_behavior in expected.items(): response = model.generate(prompt, temperature=0) actual_hash = hashlib.sha256( response.encode() ).hexdigest()[:16] if actual_hash != expected_behavior["response_hash"]: mismatches.append({ "prompt": prompt[:50], "expected_hash": expected_behavior["response_hash"], "actual_hash": actual_hash, "length_delta": ( len(response) - expected_behavior["response_length"] ), }) return { "matches": len(expected) - len(mismatches), "total": len(expected), "match_rate": (len(expected) - len(mismatches)) / len(expected), "substitution_detected": len(mismatches) > len(expected) * 0.1, "mismatches": mismatches[:5], } # Reference prompts should be: # 1. Diverse in topic (to catch targeted behavioral changes) # 2. Deterministic (low entropy responses) # 3. Include safety-relevant prompts (to detect backdoor insertion) REFERENCE_PROMPTS = [ "What is 2 + 2?", "Translate 'hello' to French.", "Summarize photosynthesis in one sentence.", "Is it ethical to steal medicine to save a life? Answer briefly.", "Write a haiku about the ocean.", "What are the safety guidelines you follow?", ]Design Registry Security Controls
Implement security controls for model registries based on attack findings.
REGISTRY_SECURITY_CONTROLS = { "authentication": { "controls": [ "Multi-factor authentication for all registry uploads", "Short-lived tokens (1-hour expiry) for CI/CD uploads", "Separate read and write permissions", "IP allowlisting for upload operations", ], }, "integrity": { "controls": [ "Cryptographic signing of all model artifacts", "Immutable model versions (no overwrites, only new versions)", "Content-addressed storage (hash-based paths)", "Behavioral fingerprinting on registration", ], }, "monitoring": { "controls": [ "Alert on model file changes outside deployment windows", "Log all registry access with identity attribution", "Automated behavioral diff on model updates", "Anomaly detection on model download patterns", ], }, "safe_formats": { "controls": [ "Reject pickle-based model files (enforce SafeTensors/ONNX)", "Scan uploaded files for code execution payloads", "Sandboxed model loading for verification", "Dependency scanning for model loading code", ], }, }
Real-World Registry Risks
| Platform | Risk | Mitigation |
|---|---|---|
| Hugging Face Hub | Pickle files with code execution, typosquatting | Use SafeTensors format, verify model provenance |
| MLflow | Default open access, no signing | Enable authentication, add signing to pipeline |
| Custom S3/GCS registries | Overly permissive IAM, no versioning | Least-privilege IAM, enable versioning and object lock |
| Docker-based model serving | Image tampering, base image vulnerabilities | Sign images, scan for vulnerabilities, use minimal base images |
Troubleshooting
| Issue | Solution |
|---|---|
| Cannot detect model substitution with hash alone | The attacker may have updated the hash in metadata too. Use independent provenance records stored separately from the model artifacts |
| Behavioral fingerprinting is too sensitive | Some models have inherent randomness even at temperature=0 due to floating-point non-determinism. Use fuzzy matching with tolerance thresholds |
| Pickle scanning has false positives | Not all pickle files contain malicious code. Use allowlisted pickle classes and reject files with unexpected class references |
| CI/CD pipeline needs write access | Use short-lived, scoped tokens that allow only specific model uploads. Require human approval for production registry writes |
Related Topics
- ML Pipeline Poisoning - Upstream pipeline attacks that feed into registry compromise
- Supply Chain CTF - CTF challenge covering ML supply chain attacks
- Infrastructure Security - Deployment infrastructure attacks including registry compromise
- Build Behavior Diff - Behavioral comparison tools for detecting model substitution
References
- "Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures" - Bagdasaryan & Shmatikov (2022) - Model supply chain attacks through fine-tuning poisoning
- "TrojAI: A Metasploit-like Framework for AI Trojans" - IARPA (2023) - Framework for detecting trojaned ML models
- "Model Cards for Model Reporting" - Mitchell et al. (2019) - Model documentation standards applicable to provenance
- "SafeTensors: A Simple, Safe Way to Store and Distribute Tensors" - Hugging Face (2023) - Safe model serialization format that prevents code execution
Why are pickle-based model files a security risk?