Security of Dynamic Model Loading in Production
Analyzing risks of hot-swapping, dynamic loading, and A/B testing of ML models in production serving infrastructure
Overview
Dynamic model loading — the ability to update, swap, or roll back machine learning models in production without restarting the serving infrastructure — is a critical operational capability for AI systems. Organizations deploy new model versions multiple times per day to improve accuracy, fix biases, respond to data drift, or patch vulnerabilities. Frameworks like Triton Inference Server, TorchServe, TensorFlow Serving, and custom serving platforms all support some form of dynamic model loading, whether through filesystem polling, API-triggered loading, or configuration-driven updates.
From a security perspective, dynamic model loading transforms the model artifact pipeline into a continuously active attack surface. Unlike traditional software where deployments happen through controlled CI/CD pipelines with gates and approvals, model updates may be triggered by automated retraining pipelines, experiment management systems, or even filesystem changes. Each of these trigger mechanisms is a potential entry point for an attacker to inject a malicious model into production.
The consequences of malicious model injection are severe and often subtle. A compromised model can produce targeted misclassifications for specific inputs while behaving normally for all others (a neural network backdoor), exfiltrate data through side channels in its outputs, or execute arbitrary code if the model format supports it (as with PyTorch's pickle-based format or TensorFlow's SavedModel with custom ops). Unlike a compromised application binary that might trigger antivirus alerts, a malicious neural network is extremely difficult to detect because its "behavior" is defined by learned weights rather than explicit code paths.
This article examines the attack surface of model hot-loading mechanisms, demonstrates practical attacks against common deployment patterns, and provides a security framework for protecting the model loading pipeline.
Model Loading Architectures
Filesystem-Based Loading
The simplest model loading pattern monitors a filesystem path for new model versions. TensorFlow Serving, Triton, and many custom systems use this approach:
/models/
my_model/
1/ # Version 1 (currently active)
model.savedmodel/
2/ # Version 2 (hot-loaded when directory appears)
model.savedmodel/
The serving framework detects the new version directory and automatically loads it. This is operationally simple but creates a straightforward attack vector: anyone who can write to the model directory can inject a model into production.
API-Based Loading
TorchServe's management API and Triton's model control API allow explicit model registration and loading through HTTP/gRPC calls. This provides more control but exposes an administrative API that must be secured.
Registry-Based Loading
More sophisticated deployments use model registries (MLflow Model Registry, Vertex AI Model Registry, custom registries) as the source of truth. The serving infrastructure loads models from the registry based on stage transitions (e.g., "staging" to "production"). This adds an indirection layer that can enforce approval workflows but introduces the registry itself as a critical trust dependency.
"""
Model loading security audit framework.
Tests model loading pipelines for integrity verification,
access control, and injection vulnerabilities.
"""
import hashlib
import json
import os
import time
from pathlib import Path
from dataclasses import dataclass
from typing import Optional, Any
from datetime import datetime, timezone
@dataclass
class ModelArtifact:
"""Represents a model artifact in the loading pipeline."""
name: str
version: str
path: str
format: str # savedmodel, torchscript, onnx, pickle, safetensors
size_bytes: int
sha256: str
signature: Optional[str] = None
loaded_at: Optional[str] = None
class ModelLoadingAuditor:
"""
Audit model loading infrastructure for security vulnerabilities.
"""
# Model formats ranked by security risk
FORMAT_RISK = {
"pickle": "CRITICAL", # Arbitrary code execution
"pt": "CRITICAL", # PyTorch (uses pickle)
"pth": "CRITICAL", # PyTorch checkpoint (uses pickle)
"savedmodel": "HIGH", # Can contain custom ops / py_function
"mar": "HIGH", # TorchServe archive with Python handlers
"onnx": "MEDIUM", # Generally safe but custom ops possible
"torchscript": "MEDIUM",# Safer than pickle but not fully sandboxed
"safetensors": "LOW", # Designed for safe loading
"tflite": "LOW", # Restricted operation set
}
def __init__(self, model_dir: str):
self.model_dir = Path(model_dir)
self.findings: list[dict] = []
def _add(self, severity: str, title: str, detail: str) -> None:
self.findings.append({
"severity": severity, "title": title, "detail": detail,
})
def audit_directory_permissions(self) -> None:
"""Check filesystem permissions on model directories."""
if not self.model_dir.exists():
self._add("ERROR", "Model directory not found",
f"{self.model_dir} does not exist")
return
# Check directory permissions
dir_stat = self.model_dir.stat()
mode = dir_stat.st_mode & 0o777
if mode & 0o002: # World-writable
self._add(
"CRITICAL",
f"Model directory is world-writable: {self.model_dir}",
"Any user on the system can inject malicious models. "
"Model directory should be writable only by the deployment "
"service account.",
)
if mode & 0o020: # Group-writable
self._add(
"HIGH",
f"Model directory is group-writable: {self.model_dir}",
f"Group members can modify model artifacts. "
f"GID: {dir_stat.st_gid}. Restrict to deployment account.",
)
# Check individual model files
for model_file in self.model_dir.rglob("*"):
if model_file.is_file():
file_stat = model_file.stat()
file_mode = file_stat.st_mode & 0o777
if file_mode & 0o002:
self._add(
"CRITICAL",
f"World-writable model file: {model_file}",
"Any user can replace this model file with a "
"malicious version.",
)
def audit_model_formats(self) -> None:
"""
Identify model formats and their associated risks.
Pickle-based formats allow arbitrary code execution on load.
"""
for model_file in self.model_dir.rglob("*"):
if not model_file.is_file():
continue
ext = model_file.suffix.lstrip(".")
# Also check for pickle files without extension
name = model_file.name
risk = None
if ext in self.FORMAT_RISK:
risk = self.FORMAT_RISK[ext]
elif name.endswith(".pkl") or name.endswith(".pickle"):
risk = "CRITICAL"
if risk and risk in ("CRITICAL", "HIGH"):
self._add(
risk,
f"High-risk model format: {model_file.name} ({ext})",
f"File: {model_file}. "
f"{'Pickle-based formats execute arbitrary Python code during deserialization. ' if ext in ('pickle', 'pt', 'pth', 'pkl') else ''}"
f"Consider migrating to safetensors format for "
f"safe, zero-copy model loading.",
)
def audit_integrity_verification(self) -> None:
"""
Check if model loading includes integrity verification.
Look for hash manifests, signatures, or checksums.
"""
# Look for integrity metadata files
integrity_files = list(self.model_dir.rglob("**/checksums*")) + \
list(self.model_dir.rglob("**/manifest*")) + \
list(self.model_dir.rglob("**/*.sig")) + \
list(self.model_dir.rglob("**/DIGEST"))
if not integrity_files:
self._add(
"HIGH",
"No integrity verification metadata found",
f"No checksum, manifest, or signature files found in "
f"{self.model_dir}. Models can be replaced without "
f"detection. Implement model artifact signing.",
)
else:
for f in integrity_files:
self._add(
"INFO",
f"Integrity file found: {f.name}",
f"Path: {f}. Verify that the serving framework "
f"actually validates this before loading.",
)
def audit_version_rollback_protection(self) -> None:
"""
Check for rollback protection.
Without it, an attacker can revert to a vulnerable or
poisoned older model version.
"""
model_dirs = sorted(
[d for d in self.model_dir.iterdir() if d.is_dir()],
key=lambda d: d.name,
)
if len(model_dirs) > 1:
versions = []
for d in model_dirs:
try:
# Try to parse version from directory name
versions.append((d.name, d))
except ValueError:
continue
if len(versions) > 5:
self._add(
"MEDIUM",
f"Many model versions retained: {len(versions)}",
"Retaining many old model versions increases the "
"attack surface for rollback attacks. Implement a "
"retention policy and minimum version enforcement.",
)
def scan_for_pickle_exploits(self, model_path: str) -> list[dict]:
"""
Scan a pickle-based model file for known malicious patterns.
Uses static analysis to detect common pickle exploitation.
"""
findings = []
suspicious_opcodes = {
b"cos\n": "os module import (possible command execution)",
b"csys\n": "sys module import",
b"csubprocess\n": "subprocess module import (command execution)",
b"cbuiltins\n": "builtins access",
b"c__builtin__\n": "legacy builtins access",
b"cposixpath\n": "filesystem path operations",
b"csocket\n": "socket operations (network access)",
b"curllib\n": "URL library (network access)",
}
try:
with open(model_path, "rb") as f:
# Read in chunks to handle large models
content = f.read(10 * 1024 * 1024) # First 10MB
for pattern, description in suspicious_opcodes.items():
if pattern in content:
findings.append({
"severity": "CRITICAL",
"title": f"Suspicious pickle opcode: {description}",
"detail": (
f"File {model_path} contains pickle opcodes "
f"that import {description}. This is a strong "
f"indicator of a malicious model."
),
})
except IOError as e:
findings.append({
"severity": "ERROR",
"title": f"Cannot read model file: {model_path}",
"detail": str(e),
})
return findings
def run_audit(self) -> list[dict]:
"""Run complete model loading security audit."""
self.findings = []
self.audit_directory_permissions()
self.audit_model_formats()
self.audit_integrity_verification()
self.audit_version_rollback_protection()
# Scan pickle-based files
for model_file in self.model_dir.rglob("*"):
if model_file.suffix in (".pt", ".pth", ".pkl", ".pickle"):
pickle_findings = self.scan_for_pickle_exploits(
str(model_file)
)
self.findings.extend(pickle_findings)
return self.findings
if __name__ == "__main__":
import sys
model_dir = sys.argv[1] if len(sys.argv) > 1 else "/models"
auditor = ModelLoadingAuditor(model_dir)
findings = auditor.run_audit()
for f in findings:
print(f"[{f['severity']}] {f['title']}")
print(f" {f['detail']}\n")Attack Vectors
Model Artifact Tampering
The most direct attack against model hot-loading is replacing a legitimate model artifact with a malicious one. This can be achieved through:
- Filesystem access: If the model directory is on a shared filesystem (NFS, EFS, Lustre), any user with write access can replace model files.
- Registry compromise: If the model registry (MLflow, Vertex AI, custom) is compromised, the attacker can modify model artifacts at the source.
- Pipeline manipulation: If the CI/CD pipeline that deploys models is compromised, the attacker can inject malicious models during the deployment process.
- S3/GCS bucket write: If model artifacts are stored in cloud storage, overly permissive bucket policies allow replacement.
Race Condition Attacks
Model hot-loading creates time-of-check-to-time-of-use (TOCTOU) vulnerabilities. Consider the typical loading sequence:
- New model version appears in the model directory or registry
- Serving framework detects the new version
- Framework validates the model (format check, loading test)
- Framework loads the model into memory for serving
- Framework routes traffic to the new model
An attacker can exploit the window between steps 3 and 4 — after validation passes but before the model is loaded into memory — by replacing the validated artifact with a malicious one. If the filesystem poll interval is long (common for cloud storage where listing is expensive), this window can be seconds to minutes.
Similarly, in multi-replica deployments where a model load balancer distributes requests across multiple replicas of the serving infrastructure, an attacker who can compromise the model on one replica but not others creates an inconsistent state. Some requests are served by the legitimate model and some by the malicious one, making detection harder because monitoring systems that check random samples may only occasionally hit the compromised replica.
Supply Chain Attacks on Model Dependencies
Modern models are not standalone artifacts — they depend on tokenizers, configuration files, preprocessing code, and sometimes external libraries. An attacker who cannot modify the model weights directly may be able to compromise these dependencies:
- Tokenizer manipulation: Modifying the tokenizer vocabulary or merge rules can change how input text is processed, causing the model to interpret inputs differently than intended. This is subtle because the model weights are unchanged and pass all integrity checks.
- Configuration poisoning: Model configuration files (e.g.,
config.jsonin HuggingFace format) control architecture parameters, generation settings, and special token mappings. Modifying these can alter model behavior without touching the weights. - Preprocessor/postprocessor injection: If the serving framework loads custom preprocessing or postprocessing code alongside the model (as TorchServe handlers do), this code is another injection point.
Rollback Attacks
An attacker who cannot inject a new malicious model may be able to roll back to a known-vulnerable older version. This is effective when:
- A model was previously deployed with a known backdoor that was later patched
- An older model version has lower accuracy that the attacker wants to exploit
- Old model versions used insecure serialization (pickle) that was later migrated to safetensors
Rollback attacks are particularly effective against systems that use filesystem-based version management. By deleting the current model version directory, the serving framework falls back to the previous version. If old versions are retained on disk (common for quick rollback capability), the attacker simply needs to remove newer versions rather than inject anything new.
"""
Model integrity verification middleware for serving frameworks.
Verifies model artifacts against a signed manifest before allowing
the serving framework to load them.
"""
import hashlib
import json
import hmac
from pathlib import Path
from typing import Optional
from datetime import datetime, timezone
class ModelIntegrityVerifier:
"""
Verify model artifact integrity before hot-loading.
Designed as middleware between the model source and
the serving framework's load operation.
"""
def __init__(
self,
manifest_path: str,
signing_key: bytes,
min_version: Optional[int] = None,
):
"""
Args:
manifest_path: Path to the signed model manifest
signing_key: HMAC signing key for manifest verification
min_version: Minimum allowed model version (rollback protection)
"""
self.manifest_path = Path(manifest_path)
self.signing_key = signing_key
self.min_version = min_version
self._load_manifest()
def _load_manifest(self) -> None:
"""Load and verify the model manifest."""
content = self.manifest_path.read_text()
# Manifest format: JSON body + HMAC signature on last line
lines = content.strip().split("\n")
signature_line = lines[-1]
body = "\n".join(lines[:-1])
if not signature_line.startswith("SIGNATURE:"):
raise ValueError("Manifest missing HMAC signature")
expected_sig = signature_line.split(":", 1)[1].strip()
actual_sig = hmac.new(
self.signing_key, body.encode(), hashlib.sha256,
).hexdigest()
if not hmac.compare_digest(actual_sig, expected_sig):
raise SecurityError(
"Manifest signature verification failed. "
"The manifest may have been tampered with."
)
self.manifest = json.loads(body)
def verify_model(
self,
model_name: str,
model_version: str,
model_path: str,
) -> tuple[bool, str]:
"""
Verify a model artifact against the signed manifest.
Returns:
(is_valid, reason)
"""
# Check rollback protection
if self.min_version is not None:
try:
version_num = int(model_version)
if version_num < self.min_version:
return False, (
f"Version {model_version} is below minimum "
f"allowed version {self.min_version}. "
f"Rollback protection triggered."
)
except ValueError:
pass # Non-numeric version, skip check
# Find model in manifest
key = f"{model_name}/{model_version}"
expected = self.manifest.get("models", {}).get(key)
if expected is None:
return False, (
f"Model {key} not found in signed manifest. "
f"Unapproved model version — loading blocked."
)
# Verify file hashes
model_base = Path(model_path)
for file_entry in expected.get("files", []):
rel_path = file_entry["path"]
expected_hash = file_entry["sha256"]
expected_size = file_entry["size"]
file_path = model_base / rel_path
if not file_path.exists():
return False, f"Expected file missing: {rel_path}"
# Verify size first (fast check)
actual_size = file_path.stat().st_size
if actual_size != expected_size:
return False, (
f"Size mismatch for {rel_path}: "
f"expected {expected_size}, got {actual_size}"
)
# Verify hash
h = hashlib.sha256()
with open(file_path, "rb") as f:
while True:
chunk = f.read(8192)
if not chunk:
break
h.update(chunk)
actual_hash = h.hexdigest()
if actual_hash != expected_hash:
return False, (
f"Hash mismatch for {rel_path}: "
f"expected {expected_hash[:16]}..., "
f"got {actual_hash[:16]}..."
)
# Check model format safety
model_format = expected.get("format", "unknown")
if model_format in ("pickle", "pt", "pth"):
return False, (
f"Unsafe model format: {model_format}. "
f"Only safetensors and ONNX are allowed in production."
)
return True, "Model artifact verified successfully"
def create_signed_manifest(
self,
models_dir: str,
output_path: str,
) -> str:
"""
Create a signed manifest for all models in a directory.
This should run in a trusted build environment.
"""
manifest = {
"created_at": datetime.now(timezone.utc).isoformat(),
"models": {},
}
models_base = Path(models_dir)
for model_dir in sorted(models_base.iterdir()):
if not model_dir.is_dir():
continue
model_name = model_dir.name
for version_dir in sorted(model_dir.iterdir()):
if not version_dir.is_dir():
continue
version = version_dir.name
key = f"{model_name}/{version}"
files = []
for file_path in sorted(version_dir.rglob("*")):
if not file_path.is_file():
continue
h = hashlib.sha256()
with open(file_path, "rb") as f:
while True:
chunk = f.read(8192)
if not chunk:
break
h.update(chunk)
files.append({
"path": str(file_path.relative_to(version_dir)),
"sha256": h.hexdigest(),
"size": file_path.stat().st_size,
})
manifest["models"][key] = {
"files": files,
"format": self._detect_format(version_dir),
}
body = json.dumps(manifest, indent=2)
signature = hmac.new(
self.signing_key, body.encode(), hashlib.sha256,
).hexdigest()
output = f"{body}\nSIGNATURE:{signature}"
Path(output_path).write_text(output)
return output_path
def _detect_format(self, model_dir: Path) -> str:
"""Detect model format from file extensions."""
extensions = {f.suffix for f in model_dir.rglob("*") if f.is_file()}
if ".safetensors" in extensions:
return "safetensors"
if ".onnx" in extensions:
return "onnx"
if ".pt" in extensions or ".pth" in extensions:
return "pt"
if ".pkl" in extensions:
return "pickle"
if "saved_model.pb" in {f.name for f in model_dir.rglob("*")}:
return "savedmodel"
return "unknown"A/B Testing and Canary Deployment Risks
Model canary deployments — where a new model version serves a small percentage of traffic before full rollout — introduce additional security considerations. The canary routing logic determines which requests go to which model version, and this routing is itself an attack surface.
Traffic splitting manipulation: If the canary configuration is stored in a ConfigMap, environment variable, or feature flag service, an attacker who can modify this configuration can route all traffic to a specific model version. This could be used to send all traffic to a compromised model or to create conditions for model extraction by ensuring consistent behavior.
Canary metric poisoning: Canary deployments rely on metrics (latency, error rate, accuracy) to decide whether to promote or roll back the new version. An attacker who can manipulate these metrics — either by generating synthetic traffic that biases the metrics or by modifying the monitoring pipeline — can force the promotion of a malicious model or the rollback of a legitimate improvement.
Shadow mode attacks: Some deployments run new models in shadow mode, where they process real requests but their outputs are not returned to users. Instead, the outputs are logged for analysis. An attacker who compromises a shadow model can use it to exfiltrate data through its logged outputs without any visible impact on production traffic.
Version confusion: In multi-version deployments, ensuring that the correct model version handles each request requires careful coordination. If the version routing is based on request attributes (headers, user ID, feature flags), an attacker who can manipulate these attributes can force their requests to a specific model version, potentially one with known vulnerabilities or lower security controls.
Practical Examples
Monitoring Model Loading Events
#!/usr/bin/env bash
# Monitor model loading events in production for security anomalies
MODEL_DIR="${1:-/models}"
LOG_FILE="/var/log/model-loading-audit.log"
echo "=== Model Loading Monitor ==="
echo "Watching: $MODEL_DIR"
echo "Log: $LOG_FILE"
# Use inotifywait to monitor filesystem changes
# Requires inotify-tools package
if ! command -v inotifywait &>/dev/null; then
echo "Installing inotify-tools..."
apt-get install -y inotify-tools 2>/dev/null || \
yum install -y inotify-tools 2>/dev/null || \
echo "Please install inotify-tools"
fi
log_event() {
local severity="$1"
local message="$2"
local timestamp
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "${timestamp} [${severity}] ${message}" | tee -a "$LOG_FILE"
}
# Monitor for file changes in model directory
inotifywait -m -r -e create,modify,delete,moved_to \
--format '%T %w%f %e' --timefmt '%Y-%m-%dT%H:%M:%S' \
"$MODEL_DIR" 2>/dev/null | while read -r timestamp filepath event; do
# Classify the event
case "$event" in
*CREATE*|*MOVED_TO*)
if echo "$filepath" | grep -qE '\.(pt|pth|pkl|pickle|mar)$'; then
log_event "CRITICAL" "High-risk model format created: $filepath"
elif echo "$filepath" | grep -qE '\.(safetensors|onnx)$'; then
log_event "INFO" "Model file created: $filepath"
else
log_event "LOW" "File created in model dir: $filepath"
fi
;;
*MODIFY*)
log_event "HIGH" "Model file modified: $filepath"
# Compute hash for audit trail
if [ -f "$filepath" ]; then
hash=$(sha256sum "$filepath" | awk '{print $1}')
log_event "INFO" "New hash: $hash"
fi
;;
*DELETE*)
log_event "MEDIUM" "Model file deleted: $filepath"
;;
esac
doneDefense and Mitigation
Use safe model formats: Prefer safetensors over pickle-based formats. Safetensors provides zero-copy deserialization without code execution capability. ONNX is another relatively safe format when custom ops are disabled.
Sign model artifacts: Implement model signing using Sigstore/cosign or custom PKI. Verify signatures before loading in the serving framework. The signing key should be in a hardware security module (HSM) or key management service, not accessible to the model training pipeline.
Enforce minimum version: Implement rollback protection by maintaining a minimum version counter that only increments. This prevents an attacker from deploying older, vulnerable model versions.
Restrict filesystem and registry access: The model serving infrastructure should have read-only access to model artifacts. Only the deployment pipeline should have write access, and it should be authenticated with short-lived credentials.
Audit all model loading events: Log every model load, including the model name, version, source, file hashes, and the identity of the process that triggered the load. Alert on unexpected loads, version rollbacks, and loads from unexpected sources.
Scan models before deployment: Integrate model scanning into the deployment pipeline. Check for pickle exploits, suspicious operations in SavedModel graphs, and anomalous model structures that might indicate backdoors. Tools like ModelScan from Protect AI provide automated scanning for known malicious patterns in model files. Integrate these scans into CI/CD pipelines as mandatory gates before any model can be deployed to production.
Implement comprehensive canary verification: When deploying a new model version, run it through a thorough validation suite that checks for behavioral anomalies before routing any production traffic to it. Compare the new model's outputs against a reference set of known-good input-output pairs. Significant deviations may indicate tampering. This is especially important for models loaded from external sources or trained by automated pipelines where the training process itself could be compromised. The validation suite should include adversarial examples designed to trigger common backdoor patterns, such as inputs with known trigger patches for vision models or inputs with known trigger phrases for language models.
References
- Hugging Face. (2024). "Safetensors: A Simple, Safe, and Fast File Format for Tensors." https://huggingface.co/docs/safetensors
- MITRE ATLAS. "Publish Poisoned Model." https://atlas.mitre.org/techniques/AML.T0010
- Sigstore. (2024). "cosign: Container Signing, Verification and Storage in an OCI registry." https://docs.sigstore.dev/cosign/overview/
- NIST. (2023). "AI Risk Management Framework." https://airc.nist.gov/AI_RMF_Interactivity/
- Protect AI. (2024). "ModelScan: Protection Against ML Model Serialization Attacks." https://github.com/protectai/modelscan
- OWASP. (2025). "OWASP Machine Learning Security Top 10." https://owasp.org/www-project-machine-learning-security-top-10/