ML Artifact Integrity

advanced10 min readUpdated 2026-03-15

Ensuring integrity of ML artifacts throughout the pipeline: hash verification strategies, signed artifact workflows, reproducible builds for ML, deterministic training challenges, and end-to-end artifact provenance.

artifact-integrity hash-verification signed-artifacts reproducible-builds deterministic-training provenance

Ensuring ML Artifact Integrity

ML pipelines produce a chain of artifacts: datasets, preprocessed features, model checkpoints, optimized weights, serving configurations, and container images. Each artifact is an opportunity for substitution, tampering, or corruption. Ensuring integrity means verifying that each artifact is exactly what the pipeline intended to produce, was not modified in transit or storage, and originated from a trusted process. This is straightforward for traditional software artifacts; for ML, the stochastic nature of training and the opacity of model weights make integrity verification fundamentally harder.

Hash Verification

Strategy

Hash verification is the foundation of artifact integrity. Every artifact produced by the pipeline should have a SHA-256 hash computed at creation time, stored alongside the artifact, and verified before consumption.

Artifact	When to Hash	When to Verify
Training dataset	After download/preparation	Before training starts
Preprocessed data	After preprocessing	Before data loader creation
Base model weights	After download	Before fine-tuning
Training checkpoints	After each checkpoint save	Before checkpoint resume
Final model weights	After training completes	Before registry upload
Registered model	After registry upload	Before deployment
Serving container	After container build	Before container deployment

Implementation Patterns

Hash computation at creation

Compute SHA-256 immediately after artifact creation. Store the hash in a separate, access-controlled location -- not alongside the artifact where it could be modified by the same attacker.

import hashlib
import json
from pathlib import Path
 
def compute_artifact_hash(artifact_path: str) -> str:
    """Compute SHA-256 hash of an artifact file."""
    sha256 = hashlib.sha256()
    with open(artifact_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest()
 
def store_hash(artifact_name: str, hash_value: str, manifest_path: str):
    """Store hash in a separate integrity manifest."""
    manifest = {}
    if Path(manifest_path).exists():
        with open(manifest_path) as f:
            manifest = json.load(f)
    manifest[artifact_name] = {
        "sha256": hash_value,
        "timestamp": datetime.utcnow().isoformat()
    }
    with open(manifest_path, "w") as f:
        json.dump(manifest, f, indent=2)
 
def verify_artifact(artifact_path: str, expected_hash: str) -> bool:
    """Verify artifact integrity against stored hash."""
    actual_hash = compute_artifact_hash(artifact_path)
    if actual_hash != expected_hash:
        raise IntegrityError(
            f"Hash mismatch for {artifact_path}: "
            f"expected {expected_hash}, got {actual_hash}"
        )
    return True

Hash storage separation
Store hashes in a different storage system or access control domain than the artifacts. If an attacker can modify both the artifact and its hash, verification is meaningless.
Verification at consumption
Before any pipeline step uses an artifact, verify its hash. Fail loudly on mismatch -- do not fall back to an unverified artifact.
Chain of hashes
Create an end-to-end manifest that records the hash of every artifact at every stage. This manifest is the provenance record for the final model.

Performance Considerations

Hashing large model files takes time. For a 100GB model file, SHA-256 computation takes approximately 2-5 minutes on modern hardware. Strategies to mitigate the performance impact:

Parallel hashing. Split the file into chunks and hash concurrently, combining with a Merkle tree.
Incremental hashing. For checkpoint files that are updated incrementally, hash only the changed portions.
Hardware acceleration. Use SHA-256 hardware instructions (SHA-NI on x86) for faster computation.
Asynchronous verification. Start verification in parallel with other pipeline initialization tasks.

Signed Artifacts

Beyond Hashes

Hashes verify integrity but not provenance. A hash tells you "this file has not been modified since the hash was computed" but not "this file was produced by an authorized pipeline." Signing adds the provenance layer.

Signing Workflow for ML Pipelines

Pipeline identity
Each pipeline step has a cryptographic identity. In cloud environments, use workload identity (OIDC tokens from the CI/CD system). With Sigstore, use keyless signing tied to the pipeline's OIDC identity.
Sign at production boundaries
Sign artifacts at key transition points: after training completes, after evaluation passes, after safety gates clear. Each signature represents a different assertion about the artifact.
Verify before consumption
The deployment pipeline verifies all required signatures before serving the model. Missing or invalid signatures halt deployment.
Record in transparency log
All signing events are recorded in a transparency log (Rekor or equivalent). This provides an immutable audit trail.

Multi-Signature Requirements

Different pipeline stages can produce independent signatures, each attesting to a different property:

Signature	Attests To	Signer
Training signature	Model was produced by the authorized training pipeline	Training pipeline identity
Evaluation signature	Model passed benchmark requirements	Evaluation pipeline identity
Safety signature	Model passed safety and bias checks	Safety gate identity
Approval signature	Human reviewer approved deployment	Reviewer's personal identity

Deployment requires all four signatures to be present and valid. An attacker would need to compromise all four signers to deploy a poisoned model.

Reproducible Builds for ML

The Reproducibility Challenge

In software, reproducible builds mean that the same source produces the identical binary. For ML:

Same code + same data + same hyperparameters produces different model weights due to stochastic training
Different hardware (GPU model, driver version) produces different results due to floating-point behavior
Framework version differences cause subtle behavioral changes

Levels of ML Reproducibility

Level	What Is Reproduced	Difficulty
Architecture	Same model structure	Easy -- deterministic from code
Training process	Same training procedure	Medium -- requires version pinning
Statistical behavior	Similar performance metrics	Medium -- requires controlled randomness
Exact weights	Identical model weights	Very hard -- requires deterministic everything

Achieving Near-Reproducibility

While exact reproducibility is often impractical, near-reproducibility reduces the attack surface:

Fixed random seeds. Set random seeds for Python, NumPy, PyTorch, and CUDA. This reduces but does not eliminate stochasticity because GPU operations may still introduce non-determinism.

Deterministic operations. PyTorch offers torch.use_deterministic_algorithms(True) which forces deterministic implementations of operations. Some operations have no deterministic implementation and will raise errors.

Pinned environments. Pin exact versions of all dependencies, including CUDA toolkit and GPU driver versions. Use container images with frozen environments.

Hardware specification. Document the exact GPU model and count. Different GPU architectures produce different floating-point results.

import torch
import numpy as np
import random
 
def set_deterministic(seed: int = 42):
    """Configure training for maximum reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True)
    # Note: some operations have no deterministic implementation
    # and will raise RuntimeError with this setting

Deterministic Training

When Determinism Matters for Security

Deterministic training is most valuable for security in two scenarios:

Verification by re-execution. If training is deterministic, a verifier can re-run training and compare the output hash against the claimed artifact. Any mismatch indicates tampering. This is the gold standard for artifact integrity but is extremely expensive for large models.

Anomaly detection. Even without full determinism, training with controlled randomness produces outputs within a predictable range. A poisoned model whose weights fall outside this range can be detected statistically.

Practical Determinism

For small models and fine-tuning runs, deterministic training is practical:

LoRA fine-tuning on a single GPU with fixed seeds produces reproducible results
Small model training (< 1B parameters) with deterministic operations is feasible
Evaluation pipeline execution can be fully deterministic

For large-scale pretraining, deterministic training is impractical:

Multi-GPU training introduces communication-order non-determinism
Performance cost of deterministic operations is 10-30%
Some critical operations lack deterministic implementations

End-to-End Provenance

The Provenance Chain

A complete provenance chain links the deployed model back to its origins:

Training data (hash) -> Preprocessing code (commit) ->
Training code (commit) -> Training environment (manifest) ->
Training run (metrics + hash of output) -> Evaluation (results) ->
Safety gate (pass/fail) -> Registry (signed artifact) ->
Deployment (verified deployment)

Each link includes:

Input artifact hashes (what went in)
Process identifier (what transformed it)
Output artifact hashes (what came out)
Signer identity (who attests to this link)

Provenance Storage

Provenance records should be stored in an append-only, tamper-evident system:

Option	Properties	Suitability
Rekor (Sigstore transparency log)	Public, append-only, cryptographically verifiable	Best for open-source models
Internal append-only log	Private, organization-controlled	Best for proprietary models
Blockchain	Immutable, decentralized	Overkill for most use cases
Git (signed commits)	Auditable, version-controlled	Good for provenance metadata

References

Sigstore -- Keyless signing infrastructure
SLSA Framework -- Supply chain levels and provenance
PyTorch Reproducibility -- Deterministic training documentation
in-toto -- Software supply chain layout verification

Knowledge Check

An organization stores model artifact hashes in the same S3 bucket as the model weights. An attacker with write access to the bucket modifies both the model and its hash. Why does this hash verification scheme fail, and how should it be fixed?

Edit this page on GitHub

ML Artifact Integrity

advanced10 min readUpdated 2026-03-15

artifact-integrity hash-verification signed-artifacts reproducible-builds deterministic-training provenance

Ensuring ML Artifact Integrity

Hash Verification

Strategy

Artifact	When to Hash	When to Verify
Training dataset	After download/preparation	Before training starts
Preprocessed data	After preprocessing	Before data loader creation
Base model weights	After download	Before fine-tuning
Training checkpoints	After each checkpoint save	Before checkpoint resume
Final model weights	After training completes	Before registry upload
Registered model	After registry upload	Before deployment
Serving container	After container build	Before container deployment

Implementation Patterns

Hash computation at creation

Compute SHA-256 immediately after artifact creation. Store the hash in a separate, access-controlled location -- not alongside the artifact where it could be modified by the same attacker.

import hashlib
import json
from pathlib import Path
 
def compute_artifact_hash(artifact_path: str) -> str:
    """Compute SHA-256 hash of an artifact file."""
    sha256 = hashlib.sha256()
    with open(artifact_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest()
 
def store_hash(artifact_name: str, hash_value: str, manifest_path: str):
    """Store hash in a separate integrity manifest."""
    manifest = {}
    if Path(manifest_path).exists():
        with open(manifest_path) as f:
            manifest = json.load(f)
    manifest[artifact_name] = {
        "sha256": hash_value,
        "timestamp": datetime.utcnow().isoformat()
    }
    with open(manifest_path, "w") as f:
        json.dump(manifest, f, indent=2)
 
def verify_artifact(artifact_path: str, expected_hash: str) -> bool:
    """Verify artifact integrity against stored hash."""
    actual_hash = compute_artifact_hash(artifact_path)
    if actual_hash != expected_hash:
        raise IntegrityError(
            f"Hash mismatch for {artifact_path}: "
            f"expected {expected_hash}, got {actual_hash}"
        )
    return True

Hash storage separation
Store hashes in a different storage system or access control domain than the artifacts. If an attacker can modify both the artifact and its hash, verification is meaningless.
Verification at consumption
Before any pipeline step uses an artifact, verify its hash. Fail loudly on mismatch -- do not fall back to an unverified artifact.
Chain of hashes
Create an end-to-end manifest that records the hash of every artifact at every stage. This manifest is the provenance record for the final model.

Performance Considerations

Hashing large model files takes time. For a 100GB model file, SHA-256 computation takes approximately 2-5 minutes on modern hardware. Strategies to mitigate the performance impact:

Parallel hashing. Split the file into chunks and hash concurrently, combining with a Merkle tree.
Incremental hashing. For checkpoint files that are updated incrementally, hash only the changed portions.
Hardware acceleration. Use SHA-256 hardware instructions (SHA-NI on x86) for faster computation.
Asynchronous verification. Start verification in parallel with other pipeline initialization tasks.

Signed Artifacts

Beyond Hashes

Signing Workflow for ML Pipelines

Pipeline identity
Each pipeline step has a cryptographic identity. In cloud environments, use workload identity (OIDC tokens from the CI/CD system). With Sigstore, use keyless signing tied to the pipeline's OIDC identity.
Sign at production boundaries
Sign artifacts at key transition points: after training completes, after evaluation passes, after safety gates clear. Each signature represents a different assertion about the artifact.
Verify before consumption
The deployment pipeline verifies all required signatures before serving the model. Missing or invalid signatures halt deployment.
Record in transparency log
All signing events are recorded in a transparency log (Rekor or equivalent). This provides an immutable audit trail.

Multi-Signature Requirements

Different pipeline stages can produce independent signatures, each attesting to a different property:

Signature	Attests To	Signer
Training signature	Model was produced by the authorized training pipeline	Training pipeline identity
Evaluation signature	Model passed benchmark requirements	Evaluation pipeline identity
Safety signature	Model passed safety and bias checks	Safety gate identity
Approval signature	Human reviewer approved deployment	Reviewer's personal identity

Deployment requires all four signatures to be present and valid. An attacker would need to compromise all four signers to deploy a poisoned model.

Reproducible Builds for ML

The Reproducibility Challenge

In software, reproducible builds mean that the same source produces the identical binary. For ML:

Same code + same data + same hyperparameters produces different model weights due to stochastic training
Different hardware (GPU model, driver version) produces different results due to floating-point behavior
Framework version differences cause subtle behavioral changes

Levels of ML Reproducibility

Level	What Is Reproduced	Difficulty
Architecture	Same model structure	Easy -- deterministic from code
Training process	Same training procedure	Medium -- requires version pinning
Statistical behavior	Similar performance metrics	Medium -- requires controlled randomness
Exact weights	Identical model weights	Very hard -- requires deterministic everything

Achieving Near-Reproducibility

While exact reproducibility is often impractical, near-reproducibility reduces the attack surface:

Fixed random seeds. Set random seeds for Python, NumPy, PyTorch, and CUDA. This reduces but does not eliminate stochasticity because GPU operations may still introduce non-determinism.

Pinned environments. Pin exact versions of all dependencies, including CUDA toolkit and GPU driver versions. Use container images with frozen environments.

Hardware specification. Document the exact GPU model and count. Different GPU architectures produce different floating-point results.

import torch
import numpy as np
import random
 
def set_deterministic(seed: int = 42):
    """Configure training for maximum reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True)
    # Note: some operations have no deterministic implementation
    # and will raise RuntimeError with this setting

LoRA fine-tuning on a single GPU with fixed seeds produces reproducible results
Small model training (< 1B parameters) with deterministic operations is feasible
Evaluation pipeline execution can be fully deterministic

For large-scale pretraining, deterministic training is impractical:

Multi-GPU training introduces communication-order non-determinism
Performance cost of deterministic operations is 10-30%
Some critical operations lack deterministic implementations

End-to-End Provenance

The Provenance Chain

A complete provenance chain links the deployed model back to its origins:

Training data (hash) -> Preprocessing code (commit) ->
Training code (commit) -> Training environment (manifest) ->
Training run (metrics + hash of output) -> Evaluation (results) ->
Safety gate (pass/fail) -> Registry (signed artifact) ->
Deployment (verified deployment)

Each link includes:

Input artifact hashes (what went in)
Process identifier (what transformed it)
Output artifact hashes (what came out)
Signer identity (who attests to this link)

Provenance Storage

Provenance records should be stored in an append-only, tamper-evident system:

Option	Properties	Suitability
Rekor (Sigstore transparency log)	Public, append-only, cryptographically verifiable	Best for open-source models
Internal append-only log	Private, organization-controlled	Best for proprietary models
Blockchain	Immutable, decentralized	Overkill for most use cases
Git (signed commits)	Auditable, version-controlled	Good for provenance metadata

References

Sigstore -- Keyless signing infrastructure
SLSA Framework -- Supply chain levels and provenance
PyTorch Reproducibility -- Deterministic training documentation
in-toto -- Software supply chain layout verification

Knowledge Check

Edit this page on GitHub

ML Artifact Integrity

Hash computation at creation

Hash storage separation

Verification at consumption

Chain of hashes

Pipeline identity

Sign at production boundaries

Verify before consumption

Record in transparency log

Related articles

ML Artifact Integrity

Hash computation at creation

Hash storage separation

Verification at consumption

Chain of hashes

Pipeline identity

Sign at production boundaries

Verify before consumption

Record in transparency log

Related articles