ML Artifact Integrity

Advanced10 min readUpdated 2026-03-15

Ensuring integrity of ML artifacts throughout the pipeline: hash verification strategies, signed artifact workflows, reproducible builds for ML, deterministic training challenges, and end-to-end artifact provenance.

artifact-integrity hash-verification signed-artifacts reproducible-builds deterministic-training provenance

Ensuring ML Artifact Integrity

ML pipelines produce a chain of artifacts: datasets, preprocessed features, model checkpoints, optimized weights, serving configurations, and container images. Each artifact is an opportunity for substitution, tampering, or corruption. Ensuring integrity means verifying that each artifact is exactly what the pipeline intended to produce, was not modified in transit or storage, and originated from a trusted process. 這是 straightforward for traditional software artifacts; for ML, the stochastic nature of 訓練 and the opacity of model weights make integrity verification fundamentally harder.

Hash Verification

Strategy

Hash verification is the foundation of artifact integrity. Every artifact produced by the pipeline should have a SHA-256 hash computed at creation time, stored alongside the artifact, and verified before consumption.

Artifact	When to Hash	When to Verify
Training dataset	After download/preparation	Before 訓練 starts
Preprocessed data	After preprocessing	Before data loader creation
Base model weights	After download	Before 微調
Training checkpoints	After each checkpoint save	Before checkpoint resume
Final model weights	After 訓練 completes	Before registry upload
Registered model	After registry upload	Before deployment
Serving container	After container build	Before container deployment

實作 Patterns

Hash computation at creation

Compute SHA-256 immediately after artifact creation. Store the hash in a separate, access-controlled location -- not alongside the artifact where it could be modified by the same 攻擊者.

import hashlib
import json
from pathlib import Path
 
def compute_artifact_hash(artifact_path: str) -> str:
    """Compute SHA-256 hash of an artifact file."""
    sha256 = hashlib.sha256()
    with open(artifact_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest()
 
def store_hash(artifact_name: str, hash_value: str, manifest_path: str):
    """Store hash in a separate integrity manifest."""
    manifest = {}
    if Path(manifest_path).exists():
        with open(manifest_path) as f:
            manifest = json.load(f)
    manifest[artifact_name] = {
        "sha256": hash_value,
        "timestamp": datetime.utcnow().isoformat()
    }
    with open(manifest_path, "w") as f:
        json.dump(manifest, f, indent=2)
 
def verify_artifact(artifact_path: str, expected_hash: str) -> bool:
    """Verify artifact integrity against stored hash."""
    actual_hash = compute_artifact_hash(artifact_path)
    if actual_hash != expected_hash:
        raise IntegrityError(
            f"Hash mismatch for {artifact_path}: "
            f"expected {expected_hash}, got {actual_hash}"
        )
    return True

Hash storage separation
Store hashes in a different storage system or access control domain than the artifacts. If 攻擊者 can modify both the artifact and its hash, verification is meaningless.
Verification at consumption
Before any pipeline step uses an artifact, verify its hash. Fail loudly on mismatch -- do not fall back to an unverified artifact.
Chain of hashes
Create an end-to-end manifest that records the hash of every artifact at every stage. This manifest is the provenance record for the final model.

Performance Considerations

Hashing large model files takes time. For a 100GB model file, SHA-256 computation takes approximately 2-5 minutes on modern hardware. Strategies to mitigate the performance impact:

Parallel hashing. Split the file into chunks and hash concurrently, combining with a Merkle tree.
Incremental hashing. For checkpoint files that are updated incrementally, hash only the changed portions.
Hardware acceleration. Use SHA-256 hardware instructions (SHA-NI on x86) for faster computation.
Asynchronous verification. Start verification in parallel with other pipeline initialization tasks.

Signed Artifacts

Beyond Hashes

Hashes verify integrity but not provenance. A hash tells you "this file has not been modified since the hash was computed" but not "this file was produced by an authorized pipeline." Signing adds the provenance layer.

Signing Workflow for ML Pipelines

Pipeline identity
Each pipeline step has a cryptographic identity. In 雲端 environments, use workload identity (OIDC 符元 from the CI/CD system). With Sigstore, use keyless signing tied to the pipeline's OIDC identity.
Sign at production boundaries
Sign artifacts at key transition points: after 訓練 completes, after 評估 passes, after 安全 gates clear. Each signature represents a different assertion about the artifact.
Verify before consumption
The deployment pipeline verifies all required signatures before serving 模型. Missing or invalid signatures halt deployment.
Record in transparency log
All signing events are recorded in a transparency log (Rekor or equivalent). This provides an immutable audit trail.

Multi-Signature Requirements

Different pipeline stages can produce independent signatures, each attesting to a different property:

Signature	Attests To	Signer
Training signature	Model was produced by the authorized 訓練 pipeline	Training pipeline identity
評估 signature	Model passed benchmark requirements	評估 pipeline identity
安全 signature	Model passed 安全 and bias checks	安全 gate identity
Approval signature	Human reviewer approved deployment	Reviewer's personal identity

Deployment requires all four signatures to be present and valid. 攻擊者 would need to compromise all four signers to deploy a poisoned model.

Reproducible Builds for ML

The Reproducibility Challenge

In software, reproducible builds mean that the same source produces the identical binary. For ML:

Same code + same data + same hyperparameters produces different model weights due to stochastic 訓練
Different hardware (GPU model, driver version) produces different results due to floating-point behavior
Framework version differences cause subtle behavioral changes

Levels of ML Reproducibility

Level	What Is Reproduced	Difficulty
Architecture	Same model structure	Easy -- deterministic from code
Training process	Same 訓練 procedure	Medium -- requires version pinning
Statistical behavior	Similar performance metrics	Medium -- requires controlled randomness
Exact weights	Identical model weights	Very hard -- requires deterministic everything

Achieving Near-Reproducibility

While exact reproducibility is often impractical, near-reproducibility reduces the 攻擊面:

Fixed random seeds. Set random seeds for Python, NumPy, PyTorch, and CUDA. This reduces but does not eliminate stochasticity 因為 GPU operations may still introduce non-determinism.

Deterministic operations. PyTorch offers torch.use_deterministic_algorithms(True) which forces deterministic implementations of operations. Some operations have no deterministic 實作 and will raise errors.

Pinned environments. Pin exact versions of all dependencies, including CUDA toolkit and GPU driver versions. Use container images with frozen environments.

Hardware specification. Document the exact GPU model and count. Different GPU architectures produce different floating-point results.

import torch
import numpy as np
import random
 
def set_deterministic(seed: int = 42):
    """Configure 訓練 for maximum reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True)
    # Note: some operations have no deterministic 實作
    # and will raise RuntimeError with this setting

Deterministic Training

When Determinism Matters for 安全

Deterministic 訓練 is most valuable for 安全 in two scenarios:

Verification by re-execution. If 訓練 is deterministic, a verifier can re-run 訓練 and compare the 輸出 hash against the claimed artifact. Any mismatch indicates tampering. 這是 the gold standard for artifact integrity but is extremely expensive for large models.

Anomaly 偵測. Even without full determinism, 訓練 with controlled randomness produces outputs within a predictable range. A poisoned model whose weights fall outside this range can be detected statistically.

Practical Determinism

For small models and 微調 runs, deterministic 訓練 is practical:

LoRA 微調 on a single GPU with fixed seeds produces reproducible results
Small model 訓練 (< 1B parameters) with deterministic operations is feasible
評估 pipeline execution can be fully deterministic

For large-scale pretraining, deterministic 訓練 is impractical:

Multi-GPU 訓練 introduces communication-order non-determinism
Performance cost of deterministic operations is 10-30%
Some critical operations lack deterministic implementations

End-to-End Provenance

The Provenance Chain

A complete provenance chain links the deployed model back to its origins:

訓練資料 (hash) -> Preprocessing code (commit) ->
Training code (commit) -> Training environment (manifest) ->
Training run (metrics + hash of 輸出) -> 評估 (results) ->
安全 gate (pass/fail) -> Registry (signed artifact) ->
Deployment (verified deployment)

Each link includes:

輸入 artifact hashes (what went in)
Process identifier (what transformed it)
輸出 artifact hashes (what came out)
Signer identity (who attests to this link)

Provenance Storage

Provenance records should be stored in an append-only, tamper-evident system:

Option	Properties	Suitability
Rekor (Sigstore transparency log)	Public, append-only, cryptographically verifiable	Best for open-source models
Internal append-only log	Private, organization-controlled	Best for proprietary models
Blockchain	Immutable, decentralized	Overkill for most use cases
Git (signed commits)	Auditable, version-controlled	Good for provenance metadata

參考文獻

Sigstore -- Keyless signing infrastructure
SLSA Framework -- 供應鏈 levels and provenance
PyTorch Reproducibility -- Deterministic 訓練 documentation
in-toto -- Software 供應鏈 layout verification

Knowledge Check

An organization stores model artifact hashes in the same S3 bucket as 模型 weights. 攻擊者 with write access to the bucket modifies both 模型 and its hash. Why does this hash verification scheme fail, and how should it be fixed?

ML Artifact Integrity

Advanced10 min readUpdated 2026-03-15

artifact-integrity hash-verification signed-artifacts reproducible-builds deterministic-training provenance

Ensuring ML Artifact Integrity

Hash Verification

Strategy

Artifact	When to Hash	When to Verify
Training dataset	After download/preparation	Before 訓練 starts
Preprocessed data	After preprocessing	Before data loader creation
Base model weights	After download	Before 微調
Training checkpoints	After each checkpoint save	Before checkpoint resume
Final model weights	After 訓練 completes	Before registry upload
Registered model	After registry upload	Before deployment
Serving container	After container build	Before container deployment

實作 Patterns

Hash computation at creation

Compute SHA-256 immediately after artifact creation. Store the hash in a separate, access-controlled location -- not alongside the artifact where it could be modified by the same 攻擊者.

import hashlib
import json
from pathlib import Path
 
def compute_artifact_hash(artifact_path: str) -> str:
    """Compute SHA-256 hash of an artifact file."""
    sha256 = hashlib.sha256()
    with open(artifact_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            sha256.update(chunk)
    return sha256.hexdigest()
 
def store_hash(artifact_name: str, hash_value: str, manifest_path: str):
    """Store hash in a separate integrity manifest."""
    manifest = {}
    if Path(manifest_path).exists():
        with open(manifest_path) as f:
            manifest = json.load(f)
    manifest[artifact_name] = {
        "sha256": hash_value,
        "timestamp": datetime.utcnow().isoformat()
    }
    with open(manifest_path, "w") as f:
        json.dump(manifest, f, indent=2)
 
def verify_artifact(artifact_path: str, expected_hash: str) -> bool:
    """Verify artifact integrity against stored hash."""
    actual_hash = compute_artifact_hash(artifact_path)
    if actual_hash != expected_hash:
        raise IntegrityError(
            f"Hash mismatch for {artifact_path}: "
            f"expected {expected_hash}, got {actual_hash}"
        )
    return True

Hash storage separation
Store hashes in a different storage system or access control domain than the artifacts. If 攻擊者 can modify both the artifact and its hash, verification is meaningless.
Verification at consumption
Before any pipeline step uses an artifact, verify its hash. Fail loudly on mismatch -- do not fall back to an unverified artifact.
Chain of hashes
Create an end-to-end manifest that records the hash of every artifact at every stage. This manifest is the provenance record for the final model.

Performance Considerations

Hashing large model files takes time. For a 100GB model file, SHA-256 computation takes approximately 2-5 minutes on modern hardware. Strategies to mitigate the performance impact:

Parallel hashing. Split the file into chunks and hash concurrently, combining with a Merkle tree.
Incremental hashing. For checkpoint files that are updated incrementally, hash only the changed portions.
Hardware acceleration. Use SHA-256 hardware instructions (SHA-NI on x86) for faster computation.
Asynchronous verification. Start verification in parallel with other pipeline initialization tasks.

Signed Artifacts

Beyond Hashes

Signing Workflow for ML Pipelines

Pipeline identity
Each pipeline step has a cryptographic identity. In 雲端 environments, use workload identity (OIDC 符元 from the CI/CD system). With Sigstore, use keyless signing tied to the pipeline's OIDC identity.
Sign at production boundaries
Sign artifacts at key transition points: after 訓練 completes, after 評估 passes, after 安全 gates clear. Each signature represents a different assertion about the artifact.
Verify before consumption
The deployment pipeline verifies all required signatures before serving 模型. Missing or invalid signatures halt deployment.
Record in transparency log
All signing events are recorded in a transparency log (Rekor or equivalent). This provides an immutable audit trail.

Multi-Signature Requirements

Different pipeline stages can produce independent signatures, each attesting to a different property:

Signature	Attests To	Signer
Training signature	Model was produced by the authorized 訓練 pipeline	Training pipeline identity
評估 signature	Model passed benchmark requirements	評估 pipeline identity
安全 signature	Model passed 安全 and bias checks	安全 gate identity
Approval signature	Human reviewer approved deployment	Reviewer's personal identity

Deployment requires all four signatures to be present and valid. 攻擊者 would need to compromise all four signers to deploy a poisoned model.

Reproducible Builds for ML

The Reproducibility Challenge

In software, reproducible builds mean that the same source produces the identical binary. For ML:

Same code + same data + same hyperparameters produces different model weights due to stochastic 訓練
Different hardware (GPU model, driver version) produces different results due to floating-point behavior
Framework version differences cause subtle behavioral changes

Levels of ML Reproducibility

Level	What Is Reproduced	Difficulty
Architecture	Same model structure	Easy -- deterministic from code
Training process	Same 訓練 procedure	Medium -- requires version pinning
Statistical behavior	Similar performance metrics	Medium -- requires controlled randomness
Exact weights	Identical model weights	Very hard -- requires deterministic everything

Achieving Near-Reproducibility

While exact reproducibility is often impractical, near-reproducibility reduces the 攻擊面:

Fixed random seeds. Set random seeds for Python, NumPy, PyTorch, and CUDA. This reduces but does not eliminate stochasticity 因為 GPU operations may still introduce non-determinism.

Pinned environments. Pin exact versions of all dependencies, including CUDA toolkit and GPU driver versions. Use container images with frozen environments.

Hardware specification. Document the exact GPU model and count. Different GPU architectures produce different floating-point results.

import torch
import numpy as np
import random
 
def set_deterministic(seed: int = 42):
    """Configure 訓練 for maximum reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True)
    # Note: some operations have no deterministic 實作
    # and will raise RuntimeError with this setting

LoRA 微調 on a single GPU with fixed seeds produces reproducible results
Small model 訓練 (< 1B parameters) with deterministic operations is feasible
評估 pipeline execution can be fully deterministic

For large-scale pretraining, deterministic 訓練 is impractical:

Multi-GPU 訓練 introduces communication-order non-determinism
Performance cost of deterministic operations is 10-30%
Some critical operations lack deterministic implementations

End-to-End Provenance

The Provenance Chain

A complete provenance chain links the deployed model back to its origins:

訓練資料 (hash) -> Preprocessing code (commit) ->
Training code (commit) -> Training environment (manifest) ->
Training run (metrics + hash of 輸出) -> 評估 (results) ->
安全 gate (pass/fail) -> Registry (signed artifact) ->
Deployment (verified deployment)

Each link includes:

輸入 artifact hashes (what went in)
Process identifier (what transformed it)
輸出 artifact hashes (what came out)
Signer identity (who attests to this link)

Provenance Storage

Provenance records should be stored in an append-only, tamper-evident system:

Option	Properties	Suitability
Rekor (Sigstore transparency log)	Public, append-only, cryptographically verifiable	Best for open-source models
Internal append-only log	Private, organization-controlled	Best for proprietary models
Blockchain	Immutable, decentralized	Overkill for most use cases
Git (signed commits)	Auditable, version-controlled	Good for provenance metadata

參考文獻

Sigstore -- Keyless signing infrastructure
SLSA Framework -- 供應鏈 levels and provenance
PyTorch Reproducibility -- Deterministic 訓練 documentation
in-toto -- Software 供應鏈 layout verification

Knowledge Check

ML Artifact Integrity

Hash computation at creation

Hash storage separation

Verification at consumption

Chain of hashes

Pipeline identity

Sign at production boundaries

Verify before consumption

Record in transparency log

Related articles

ML Artifact Integrity

Hash computation at creation

Hash storage separation

Verification at consumption

Chain of hashes

Pipeline identity

Sign at production boundaries

Verify before consumption

Record in transparency log

Related articles