Model Signing and Provenance

advanced11 min readUpdated 2026-03-15

Cryptographic signing for ML models: Sigstore for ML artifacts, cosign for model weights, SLSA framework applied to ML pipelines, supply chain levels for model provenance, and practical implementation of model artifact verification.

model-signing sigstore cosign slsa provenance supply-chain cryptography artifact-verification

Model Signing and Provenance

Model signing addresses the fundamental question that hash verification cannot answer: who created this model, and was the process that created it trustworthy? Without provenance, a model registry is a collection of opaque blobs with self-reported metadata. With provenance, each artifact is cryptographically linked to the identity that produced it, the pipeline that built it, and the inputs that went into it. The ML ecosystem is adopting tools from software supply chain security -- Sigstore, cosign, and SLSA -- but adapting them for ML artifacts introduces unique challenges.

Why Models Need Signing

The Gap in Current Practice

In traditional software supply chains, code is reviewed, builds are reproducible, and artifacts are signed. In the ML ecosystem:

Software Supply Chain	ML Supply Chain
Source code is human-readable	Model weights are opaque
Builds are deterministic	Training is stochastic
Artifacts are signed (npm, Maven, apt)	Models are unsigned (most registries)
Package managers verify signatures	ML loaders skip verification
SBOM tracks dependencies	No equivalent for training data provenance

This gap means that the ML supply chain operates on implicit trust. A model downloaded from Hugging Face, pulled from an S3 bucket, or loaded from a shared filesystem is consumed without any verification of who created it or how.

What Signing Provides

Property	Without Signing	With Signing
Integrity	No verification	Tampering detected via hash mismatch
Authenticity	Self-reported publisher	Cryptographically verified identity
Non-repudiation	Publisher can deny involvement	Signature proves creation
Provenance	No build process information	Attestation links artifact to build
Accountability	No audit trail	Signers are identifiable

Sigstore for ML

Sigstore is the most promising signing infrastructure for ML artifacts because it eliminates the key management burden that prevents adoption.

How Sigstore Works for ML Artifacts

Identity verification
The signer authenticates via OpenID Connect (Google, GitHub, Microsoft identity). No long-lived keys to manage, store, or rotate.
Ephemeral certificate issuance
Fulcio issues a short-lived signing certificate binding the signer's identity to a public key. The certificate is valid for minutes, reducing the window for key compromise.
Artifact signing
Cosign signs the model artifact (or its hash) with the ephemeral private key. The signature, certificate, and artifact hash are bundled together.
Transparency logging
The signing event is recorded in Rekor, an immutable transparency log. This provides a public, auditable record of who signed what and when.
Verification
Consumers verify the signature against the Rekor log, confirming the signer's identity and the artifact's integrity without needing access to the signer's key.

Applying cosign to Model Files

Cosign was designed for container images but can sign any blob, including model files.

# Sign a model file using keyless signing (Sigstore)
cosign sign-blob \
  --bundle model-signature.bundle \
  model-weights.safetensors
 
# Verify the signature
cosign verify-blob \
  --bundle model-signature.bundle \
  --certificate-identity user@example.com \
  --certificate-oidc-issuer https://accounts.google.com \
  model-weights.safetensors
 
# For container-based model serving, sign the container image
cosign sign \
  registry.example.com/ml-models/production-llm:v1.2.3

Challenges for ML Artifacts

File size. Model files range from megabytes to hundreds of gigabytes. Signing requires hashing the entire file, which is computationally expensive for large models. Incremental hashing and parallel hash computation help but add complexity.

Multi-file artifacts. A model is not a single file. It includes weights, configuration, tokenizer, and potentially custom code. Signing each file independently does not capture the relationship between them. A manifest-based approach (signing a hash of all file hashes) is needed.

Adapter composition. Models are increasingly assembled from a base model plus adapters (LoRA weights). The provenance of the composed model depends on the provenance of each component. Signing the composition requires a framework for multi-party provenance.

SLSA for ML Pipelines

SLSA provides a graduated framework for supply chain security. Adapting SLSA to ML requires mapping its requirements to the ML lifecycle.

SLSA Levels Applied to ML

Level	Software Requirement	ML Equivalent
SLSA 1	Provenance exists	Training run metadata is logged (experiment tracking)
SLSA 2	Provenance is signed, build service generates provenance	Training pipeline generates signed provenance attestation
SLSA 3	Build service is hardened, provenance is non-forgeable	Training infrastructure is isolated, provenance is tamper-proof
SLSA 4	Two-person review, hermetic builds	Training configs reviewed, deterministic training, fully isolated pipeline

ML-Specific SLSA Requirements

Source control for training inputs.

Training code in version control (standard)
Training data versioned and checksummed (uncommon)
Hyperparameter configurations versioned (sometimes via experiment tracking)
Base model references pinned to specific versions (often "latest")

Build service isolation.

Training jobs run on dedicated, hardened infrastructure
No shared GPU memory between tenants
Network egress restricted during training
No interactive access to training environments during runs

Provenance generation.

Automated attestation of training inputs (data hash, code commit, base model hash)
Signed record of training environment (GPU type, driver version, framework version)
Tamper-evident log of the training process (metrics, checkpoints, events)
Cryptographic binding between the training attestation and the output model

Practical SLSA Implementation

Most ML organizations today operate at SLSA 0 (no provenance) or SLSA 1 (provenance exists but is not signed or verified). Reaching SLSA 2 requires:

Automated provenance generation in the training pipeline
Signing the provenance attestation with Sigstore or equivalent
Storing provenance alongside model artifacts in the registry
Verifying provenance before deployment in the serving pipeline

Supply Chain Levels for Model Provenance

Beyond SLSA, ML model provenance requires tracking elements specific to the ML lifecycle.

Provenance Attestation Schema

A comprehensive model provenance attestation should include:

Field	Content	Purpose
`model_hash`	SHA-256 of all model files	Integrity verification
`training_code_commit`	Git commit hash	Code provenance
`training_data_hash`	Hash of training dataset manifest	Data provenance
`base_model_hash`	Hash of base model (for fine-tuning)	Upstream provenance
`training_environment`	Framework versions, GPU type, driver	Reproducibility
`training_pipeline_id`	CI/CD pipeline run identifier	Build provenance
`signer_identity`	OIDC identity of the signer	Accountability
`timestamp`	Signing timestamp from transparency log	Temporal ordering
`evaluation_results`	Benchmark scores at signing time	Behavioral baseline

Provenance Chains

For fine-tuned and adapted models, provenance forms a chain:

Base model (signed by org A)
  -> Fine-tuned model (signed by org B, references base model provenance)
    -> Adapter (signed by org C, references fine-tuned model provenance)
      -> Deployed composition (signed by org D, references all upstream provenance)

Each link in the chain can be independently verified. A break in the chain (unsigned base model, unverified fine-tuning data) weakens the entire provenance guarantee.

Limitations of Current Approaches

Signing Does Not Imply Safety

A signed model is a model with verified provenance, not a safe model. If the training pipeline was compromised (poisoned data, manipulated training code), the resulting model is legitimately signed but still dangerous. Signing proves "this model came from this source via this process" -- it does not prove "this model is safe to deploy."

Stochastic Training Breaks Reproducibility

SLSA Level 4 requires hermetic, reproducible builds. ML training is inherently stochastic: different random seeds, GPU scheduling, and floating-point rounding produce different weights. This means:

Two honest runs of the same pipeline produce different (but functionally equivalent) models
There is no way to independently verify training by re-running and comparing hashes
Trust in the training process must be established through process controls, not output verification

Multi-Party Provenance is Unsolved

Modern ML applications compose models from multiple sources (base model + adapters + retrieval index). The provenance of the composed system depends on the provenance of each component, but there is no standard for multi-party provenance attestation in ML.

Adoption Barriers

Barrier	Impact
Complexity of signing large files	Slows CI/CD pipelines
Lack of standard provenance schema	Incompatible attestation formats
No verification in ML loaders	PyTorch, transformers do not check signatures
Key management perception	Teams assume signing requires PKI (Sigstore eliminates this)
Performance impact	Hash computation for 100GB+ models takes minutes

Implementation Roadmap

Phase 1: Hash Verification (Weeks)

Compute SHA-256 hashes for all model artifacts at registration
Store hashes alongside artifacts in the registry
Verify hashes before deployment in serving pipelines
Alert on hash mismatches

Phase 2: Artifact Signing (Months)

Integrate Sigstore/cosign into the training pipeline
Sign model artifacts automatically at the end of training
Store signatures alongside artifacts
Implement signature verification in deployment pipelines

Phase 3: Provenance Attestation (Quarters)

Generate SLSA-compatible provenance attestations during training
Include training data hashes, code commits, and environment details
Store attestations in a transparency log
Implement provenance verification in deployment gates

References

Sigstore -- Open-source signing infrastructure
SLSA Framework -- Supply-chain Levels for Software Artifacts
cosign -- Container and blob signing tool
Hugging Face Model Signing -- Hub model signing support
in-toto -- Software supply chain integrity framework

Knowledge Check

An ML team signs all their model artifacts with Sigstore and generates SLSA Level 2 provenance attestations. A month later, they discover that their training dataset was poisoned. Does the model signing protect against this attack?

Edit this page on GitHub

Model Signing and Provenance

advanced11 min readUpdated 2026-03-15

model-signing sigstore cosign slsa provenance supply-chain cryptography artifact-verification

Model Signing and Provenance

Why Models Need Signing

The Gap in Current Practice

In traditional software supply chains, code is reviewed, builds are reproducible, and artifacts are signed. In the ML ecosystem:

Software Supply Chain	ML Supply Chain
Source code is human-readable	Model weights are opaque
Builds are deterministic	Training is stochastic
Artifacts are signed (npm, Maven, apt)	Models are unsigned (most registries)
Package managers verify signatures	ML loaders skip verification
SBOM tracks dependencies	No equivalent for training data provenance

What Signing Provides

Property	Without Signing	With Signing
Integrity	No verification	Tampering detected via hash mismatch
Authenticity	Self-reported publisher	Cryptographically verified identity
Non-repudiation	Publisher can deny involvement	Signature proves creation
Provenance	No build process information	Attestation links artifact to build
Accountability	No audit trail	Signers are identifiable

Sigstore for ML

Sigstore is the most promising signing infrastructure for ML artifacts because it eliminates the key management burden that prevents adoption.

How Sigstore Works for ML Artifacts

Identity verification
The signer authenticates via OpenID Connect (Google, GitHub, Microsoft identity). No long-lived keys to manage, store, or rotate.
Ephemeral certificate issuance
Fulcio issues a short-lived signing certificate binding the signer's identity to a public key. The certificate is valid for minutes, reducing the window for key compromise.
Artifact signing
Cosign signs the model artifact (or its hash) with the ephemeral private key. The signature, certificate, and artifact hash are bundled together.
Transparency logging
The signing event is recorded in Rekor, an immutable transparency log. This provides a public, auditable record of who signed what and when.
Verification
Consumers verify the signature against the Rekor log, confirming the signer's identity and the artifact's integrity without needing access to the signer's key.

Applying cosign to Model Files

Cosign was designed for container images but can sign any blob, including model files.

# Sign a model file using keyless signing (Sigstore)
cosign sign-blob \
  --bundle model-signature.bundle \
  model-weights.safetensors
 
# Verify the signature
cosign verify-blob \
  --bundle model-signature.bundle \
  --certificate-identity user@example.com \
  --certificate-oidc-issuer https://accounts.google.com \
  model-weights.safetensors
 
# For container-based model serving, sign the container image
cosign sign \
  registry.example.com/ml-models/production-llm:v1.2.3

Challenges for ML Artifacts

SLSA for ML Pipelines

SLSA provides a graduated framework for supply chain security. Adapting SLSA to ML requires mapping its requirements to the ML lifecycle.

SLSA Levels Applied to ML

Level	Software Requirement	ML Equivalent
SLSA 1	Provenance exists	Training run metadata is logged (experiment tracking)
SLSA 2	Provenance is signed, build service generates provenance	Training pipeline generates signed provenance attestation
SLSA 3	Build service is hardened, provenance is non-forgeable	Training infrastructure is isolated, provenance is tamper-proof
SLSA 4	Two-person review, hermetic builds	Training configs reviewed, deterministic training, fully isolated pipeline

ML-Specific SLSA Requirements

Source control for training inputs.

Training code in version control (standard)
Training data versioned and checksummed (uncommon)
Hyperparameter configurations versioned (sometimes via experiment tracking)
Base model references pinned to specific versions (often "latest")

Build service isolation.

Training jobs run on dedicated, hardened infrastructure
No shared GPU memory between tenants
Network egress restricted during training
No interactive access to training environments during runs

Provenance generation.

Automated attestation of training inputs (data hash, code commit, base model hash)
Signed record of training environment (GPU type, driver version, framework version)
Tamper-evident log of the training process (metrics, checkpoints, events)
Cryptographic binding between the training attestation and the output model

Practical SLSA Implementation

Most ML organizations today operate at SLSA 0 (no provenance) or SLSA 1 (provenance exists but is not signed or verified). Reaching SLSA 2 requires:

Automated provenance generation in the training pipeline
Signing the provenance attestation with Sigstore or equivalent
Storing provenance alongside model artifacts in the registry
Verifying provenance before deployment in the serving pipeline

Supply Chain Levels for Model Provenance

Beyond SLSA, ML model provenance requires tracking elements specific to the ML lifecycle.

Provenance Attestation Schema

A comprehensive model provenance attestation should include:

Field	Content	Purpose
`model_hash`	SHA-256 of all model files	Integrity verification
`training_code_commit`	Git commit hash	Code provenance
`training_data_hash`	Hash of training dataset manifest	Data provenance
`base_model_hash`	Hash of base model (for fine-tuning)	Upstream provenance
`training_environment`	Framework versions, GPU type, driver	Reproducibility
`training_pipeline_id`	CI/CD pipeline run identifier	Build provenance
`signer_identity`	OIDC identity of the signer	Accountability
`timestamp`	Signing timestamp from transparency log	Temporal ordering
`evaluation_results`	Benchmark scores at signing time	Behavioral baseline

Provenance Chains

For fine-tuned and adapted models, provenance forms a chain:

Base model (signed by org A)
  -> Fine-tuned model (signed by org B, references base model provenance)
    -> Adapter (signed by org C, references fine-tuned model provenance)
      -> Deployed composition (signed by org D, references all upstream provenance)

Each link in the chain can be independently verified. A break in the chain (unsigned base model, unverified fine-tuning data) weakens the entire provenance guarantee.

Two honest runs of the same pipeline produce different (but functionally equivalent) models
There is no way to independently verify training by re-running and comparing hashes
Trust in the training process must be established through process controls, not output verification

Multi-Party Provenance is Unsolved

Adoption Barriers

Barrier	Impact
Complexity of signing large files	Slows CI/CD pipelines
Lack of standard provenance schema	Incompatible attestation formats
No verification in ML loaders	PyTorch, transformers do not check signatures
Key management perception	Teams assume signing requires PKI (Sigstore eliminates this)
Performance impact	Hash computation for 100GB+ models takes minutes

Implementation Roadmap

Phase 1: Hash Verification (Weeks)

Compute SHA-256 hashes for all model artifacts at registration
Store hashes alongside artifacts in the registry
Verify hashes before deployment in serving pipelines
Alert on hash mismatches

Phase 2: Artifact Signing (Months)

Integrate Sigstore/cosign into the training pipeline
Sign model artifacts automatically at the end of training
Store signatures alongside artifacts
Implement signature verification in deployment pipelines

Phase 3: Provenance Attestation (Quarters)

Generate SLSA-compatible provenance attestations during training
Include training data hashes, code commits, and environment details
Store attestations in a transparency log
Implement provenance verification in deployment gates

References

Sigstore -- Open-source signing infrastructure
SLSA Framework -- Supply-chain Levels for Software Artifacts
cosign -- Container and blob signing tool
Hugging Face Model Signing -- Hub model signing support
in-toto -- Software supply chain integrity framework

Knowledge Check

Edit this page on GitHub

Model Signing and Provenance

Identity verification

Ephemeral certificate issuance

Artifact signing

Transparency logging

Verification

Related articles

Model Signing and Provenance

Identity verification

Ephemeral certificate issuance

Artifact signing

Transparency logging

Verification

Related articles