Model Tampering Detection

advanced10 min readUpdated 2026-03-15

Detecting model file tampering: weight hash verification, architecture validation, adapter inspection, quantization verification, and supply chain integrity checks.

tampering integrity verification supply-chain

Model Tampering Detection

Model tampering detection operates at the file and artifact level, verifying that model files have not been modified from their known-good state. While behavior diffing detects changes through observable outputs, tampering detection examines the artifacts themselves -- weight files, configuration, tokenizer, adapters -- for evidence of unauthorized modification.

Weight Hash Verification

The most fundamental integrity check is verifying that model weight files have not changed since their last known-good state.

Hashing Strategy

Approach	What to Hash	When to Use
File-level hashing	SHA-256 of entire weight files	Quick verification; detects any modification
Layer-level hashing	Hash of each layer's weight tensor	Identifies which layers were modified
Tensor-level hashing	Hash of individual weight tensors	Pinpoints exact modified components
Statistical fingerprinting	Statistical properties of weight distributions	Detects modifications even if file format changes

import hashlib
import json
from safetensors import safe_open
 
def verify_model_integrity(model_path, expected_checksums):
    """
    Verify model file integrity against known-good checksums.
    Returns list of mismatched files.
    """
    mismatches = []
 
    for filename, expected_hash in expected_checksums.items():
        filepath = f"{model_path}/{filename}"
 
        sha256 = hashlib.sha256()
        with open(filepath, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
 
        actual_hash = sha256.hexdigest()
 
        if actual_hash != expected_hash:
            mismatches.append({
                "file": filename,
                "expected": expected_hash,
                "actual": actual_hash
            })
 
    return mismatches
 
def generate_layer_checksums(model_path):
    """
    Generate per-layer checksums for detailed integrity tracking.
    """
    checksums = {}
    with safe_open(f"{model_path}/model.safetensors", framework="pt") as f:
        for key in f.keys():
            tensor = f.get_tensor(key)
            tensor_bytes = tensor.numpy().tobytes()
            checksums[key] = hashlib.sha256(tensor_bytes).hexdigest()
 
    return checksums

Establishing Known-Good State

A hash is only useful if you have a trustworthy reference. Establish known-good state at these points:

Checkpoint	What to Record	Storage
Model acquisition	Hash all files immediately after download or receipt	Secure, append-only storage
Post-fine-tuning	Hash the complete model after each fine-tuning run	Associated with training job ID
Pre-deployment	Hash the exact files deployed to production	Associated with deployment ID
Post-quantization	Hash quantized model files	Associated with quantization configuration
Scheduled audit	Re-verify against stored hashes periodically	Audit log

Architecture Validation

Tampering can involve modifying the model's architecture -- adding hidden layers, changing layer sizes, or inserting additional components.

Architecture Checks

Check	What to Verify	Tampering Signal
Layer count	Number of transformer layers matches specification	Additional layers could hide backdoor components
Hidden size	Embedding dimension matches expected value	Modified dimensions indicate architectural changes
Attention heads	Number of attention heads per layer	Extra heads could be dedicated to backdoor processing
Vocabulary size	Tokenizer vocabulary size matches expected value	Added tokens could be backdoor triggers
Parameter count	Total parameter count matches expected value	Additional parameters indicate hidden components
Architecture type	Model class matches expected architecture	Wrong architecture class could indicate model swapping

from transformers import AutoConfig
 
def validate_architecture(model_path, expected_config):
    """
    Validate model architecture against expected specification.
    """
    actual_config = AutoConfig.from_pretrained(model_path)
    discrepancies = []
 
    checks = {
        "num_hidden_layers": expected_config.get("num_hidden_layers"),
        "hidden_size": expected_config.get("hidden_size"),
        "num_attention_heads": expected_config.get("num_attention_heads"),
        "vocab_size": expected_config.get("vocab_size"),
        "intermediate_size": expected_config.get("intermediate_size"),
        "model_type": expected_config.get("model_type"),
    }
 
    for key, expected_value in checks.items():
        actual_value = getattr(actual_config, key, None)
        if actual_value != expected_value:
            discrepancies.append({
                "parameter": key,
                "expected": expected_value,
                "actual": actual_value
            })
 
    return discrepancies

Adapter Inspection

Adapters are a common vector for model tampering because they are small, easy to distribute, and modify model behavior without touching the base weights.

Adapter Verification Checklist

Check	How to Verify	Risk
Source provenance	Verify the adapter's origin and creator	Community adapters from unknown sources are high risk
File integrity	Hash adapter files and compare to published checksums	Modification during download or storage
Rank and dimensions	Check LoRA rank matches claimed specification	Higher rank means more behavioral modification capability
Target modules	Verify which model layers the adapter modifies	Adapters targeting attention layers have the most behavioral impact
Training data provenance	Verify the training data used for the adapter	Poisoned training data creates poisoned adapters
Behavioral impact	Compare model behavior with and without the adapter	Unexpected behavioral changes indicate potential malicious intent

from peft import PeftConfig
import json
 
def inspect_adapter(adapter_path):
    """
    Inspect a LoRA adapter for suspicious characteristics.
    """
    config = PeftConfig.from_pretrained(adapter_path)
 
    report = {
        "adapter_type": config.peft_type,
        "base_model": config.base_model_name_or_path,
        "rank": config.r,
        "lora_alpha": config.lora_alpha,
        "target_modules": list(config.target_modules),
        "modules_to_save": config.modules_to_save,
        "lora_dropout": config.lora_dropout,
    }
 
    # Flag suspicious characteristics
    flags = []
    if config.r > 64:
        flags.append("HIGH_RANK: LoRA rank > 64 indicates extensive "
                     "modification capability")
    if "embed_tokens" in str(config.target_modules):
        flags.append("EMBEDDING_TARGET: Adapter modifies input embeddings, "
                     "which can alter token-level behavior")
    if config.modules_to_save:
        flags.append("FULL_MODULES: Adapter fully replaces some modules, "
                     "not just low-rank adaptation")
 
    report["flags"] = flags
    return report

Adapter Stacking Risks

Multiple adapters can be stacked, and malicious behavior may only appear when specific combinations are active:

Risk	Description	Mitigation
Hidden interaction	Two benign adapters interact to produce malicious behavior	Test all adapter combinations, not just individual adapters
Conditional activation	Adapter behavior depends on which other adapters are active	Document and test all deployment configurations
Override attacks	Malicious adapter overrides a safety-focused adapter	Verify adapter loading order and priority

Quantization Verification

Quantization changes the numerical representation of weights, which can subtly alter model behavior. Malicious quantization can be designed to degrade specific behaviors while maintaining overall performance.

Quantization Integrity Checks

Check	What to Verify	Tool
Quantization format	Format matches claimed specification (GPTQ, AWQ, GGUF, etc.)	File header inspection
Bit width	Actual precision matches claimed precision	Weight statistics analysis
Calibration data	What data was used for calibration	Provenance documentation
Behavioral preservation	Quantized model matches full-precision behavior within expected bounds	Behavior diffing
Safety preservation	Safety behaviors not disproportionately affected by quantization	Safety benchmark comparison

import numpy as np
 
def verify_quantization_stats(model_path, expected_format, expected_bits):
    """
    Verify quantization parameters and detect anomalies.
    """
    # Load quantization configuration
    with open(f"{model_path}/quantize_config.json") as f:
        quant_config = json.load(f)
 
    issues = []
 
    if quant_config.get("bits") != expected_bits:
        issues.append(f"Bit width mismatch: expected {expected_bits}, "
                     f"got {quant_config.get('bits')}")
 
    if quant_config.get("quant_method") != expected_format:
        issues.append(f"Format mismatch: expected {expected_format}, "
                     f"got {quant_config.get('quant_method')}")
 
    # Check for mixed precision (some layers at different precision)
    if quant_config.get("mixed_precision"):
        issues.append("Mixed precision detected -- verify which layers "
                     "are at reduced precision")
 
    return issues

Supply Chain Verification Workflow

Verify source authenticity
Confirm the model was downloaded from the claimed source. Check for HTTPS, verify domain names, and compare download hashes against the provider's published checksums.
Scan model files for malicious code
Model files in certain formats (pickle, PyTorch .bin) can contain arbitrary code that executes during loading. Prefer SafeTensors format. Scan pickle files for suspicious code paths.
# Scan for pickle-based code execution risks python -c " import pickletools, sys with open(sys.argv[1], 'rb') as f: pickletools.dis(f) " model.bin | grep -E "GLOBAL|REDUCE|BUILD|INST"
Verify architecture and checksums
Run architecture validation and hash verification against the provider's published specifications.
Behavioral smoke test
Run a quick behavioral evaluation comparing against the provider's published benchmarks. Flag any significant deviations.
Document provenance
Record the complete provenance chain: source URL, download timestamp, file hashes, verification results, and who performed the verification.

Backdoor Detection -- behavioral detection complementing file-level verification
Behavior Diffing -- comparing behavior when file-level changes are detected
Model Snapshots -- preserving model state for future verification
Infrastructure & Supply Chain -- supply chain attack vectors for AI systems

References

"SafeTensors: A Simple, Safe Serialization Format" - Hugging Face (2024) - Secure model serialization that prevents code execution during loading
"Model Supply Chain Security" - MITRE ATLAS (2025) - Supply chain attack taxonomy for AI models
"Quantization and Safety: How Precision Reduction Affects LLM Safety Behaviors" - arXiv (2025) - Research on quantization's impact on model safety
"Securing the ML Supply Chain" - Google Security Blog (2024) - Best practices for ML model supply chain security

Knowledge Check

A model passes file-level hash verification against known-good checksums, but shows safety regression in behavior testing. What should you investigate next?

Edit this page on GitHub

Model Tampering Detection

advanced10 min readUpdated 2026-03-15

Detecting model file tampering: weight hash verification, architecture validation, adapter inspection, quantization verification, and supply chain integrity checks.

tampering integrity verification supply-chain

Model Tampering Detection

Weight Hash Verification

The most fundamental integrity check is verifying that model weight files have not changed since their last known-good state.

Hashing Strategy

Approach	What to Hash	When to Use
File-level hashing	SHA-256 of entire weight files	Quick verification; detects any modification
Layer-level hashing	Hash of each layer's weight tensor	Identifies which layers were modified
Tensor-level hashing	Hash of individual weight tensors	Pinpoints exact modified components
Statistical fingerprinting	Statistical properties of weight distributions	Detects modifications even if file format changes

import hashlib
import json
from safetensors import safe_open
 
def verify_model_integrity(model_path, expected_checksums):
    """
    Verify model file integrity against known-good checksums.
    Returns list of mismatched files.
    """
    mismatches = []
 
    for filename, expected_hash in expected_checksums.items():
        filepath = f"{model_path}/{filename}"
 
        sha256 = hashlib.sha256()
        with open(filepath, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
 
        actual_hash = sha256.hexdigest()
 
        if actual_hash != expected_hash:
            mismatches.append({
                "file": filename,
                "expected": expected_hash,
                "actual": actual_hash
            })
 
    return mismatches
 
def generate_layer_checksums(model_path):
    """
    Generate per-layer checksums for detailed integrity tracking.
    """
    checksums = {}
    with safe_open(f"{model_path}/model.safetensors", framework="pt") as f:
        for key in f.keys():
            tensor = f.get_tensor(key)
            tensor_bytes = tensor.numpy().tobytes()
            checksums[key] = hashlib.sha256(tensor_bytes).hexdigest()
 
    return checksums

Establishing Known-Good State

A hash is only useful if you have a trustworthy reference. Establish known-good state at these points:

Checkpoint	What to Record	Storage
Model acquisition	Hash all files immediately after download or receipt	Secure, append-only storage
Post-fine-tuning	Hash the complete model after each fine-tuning run	Associated with training job ID
Pre-deployment	Hash the exact files deployed to production	Associated with deployment ID
Post-quantization	Hash quantized model files	Associated with quantization configuration
Scheduled audit	Re-verify against stored hashes periodically	Audit log

Architecture Validation

Tampering can involve modifying the model's architecture -- adding hidden layers, changing layer sizes, or inserting additional components.

Architecture Checks

Check	What to Verify	Tampering Signal
Layer count	Number of transformer layers matches specification	Additional layers could hide backdoor components
Hidden size	Embedding dimension matches expected value	Modified dimensions indicate architectural changes
Attention heads	Number of attention heads per layer	Extra heads could be dedicated to backdoor processing
Vocabulary size	Tokenizer vocabulary size matches expected value	Added tokens could be backdoor triggers
Parameter count	Total parameter count matches expected value	Additional parameters indicate hidden components
Architecture type	Model class matches expected architecture	Wrong architecture class could indicate model swapping

from transformers import AutoConfig
 
def validate_architecture(model_path, expected_config):
    """
    Validate model architecture against expected specification.
    """
    actual_config = AutoConfig.from_pretrained(model_path)
    discrepancies = []
 
    checks = {
        "num_hidden_layers": expected_config.get("num_hidden_layers"),
        "hidden_size": expected_config.get("hidden_size"),
        "num_attention_heads": expected_config.get("num_attention_heads"),
        "vocab_size": expected_config.get("vocab_size"),
        "intermediate_size": expected_config.get("intermediate_size"),
        "model_type": expected_config.get("model_type"),
    }
 
    for key, expected_value in checks.items():
        actual_value = getattr(actual_config, key, None)
        if actual_value != expected_value:
            discrepancies.append({
                "parameter": key,
                "expected": expected_value,
                "actual": actual_value
            })
 
    return discrepancies

Adapter Inspection

Adapters are a common vector for model tampering because they are small, easy to distribute, and modify model behavior without touching the base weights.

Adapter Verification Checklist

Check	How to Verify	Risk
Source provenance	Verify the adapter's origin and creator	Community adapters from unknown sources are high risk
File integrity	Hash adapter files and compare to published checksums	Modification during download or storage
Rank and dimensions	Check LoRA rank matches claimed specification	Higher rank means more behavioral modification capability
Target modules	Verify which model layers the adapter modifies	Adapters targeting attention layers have the most behavioral impact
Training data provenance	Verify the training data used for the adapter	Poisoned training data creates poisoned adapters
Behavioral impact	Compare model behavior with and without the adapter	Unexpected behavioral changes indicate potential malicious intent

from peft import PeftConfig
import json
 
def inspect_adapter(adapter_path):
    """
    Inspect a LoRA adapter for suspicious characteristics.
    """
    config = PeftConfig.from_pretrained(adapter_path)
 
    report = {
        "adapter_type": config.peft_type,
        "base_model": config.base_model_name_or_path,
        "rank": config.r,
        "lora_alpha": config.lora_alpha,
        "target_modules": list(config.target_modules),
        "modules_to_save": config.modules_to_save,
        "lora_dropout": config.lora_dropout,
    }
 
    # Flag suspicious characteristics
    flags = []
    if config.r > 64:
        flags.append("HIGH_RANK: LoRA rank > 64 indicates extensive "
                     "modification capability")
    if "embed_tokens" in str(config.target_modules):
        flags.append("EMBEDDING_TARGET: Adapter modifies input embeddings, "
                     "which can alter token-level behavior")
    if config.modules_to_save:
        flags.append("FULL_MODULES: Adapter fully replaces some modules, "
                     "not just low-rank adaptation")
 
    report["flags"] = flags
    return report

Adapter Stacking Risks

Multiple adapters can be stacked, and malicious behavior may only appear when specific combinations are active:

Risk	Description	Mitigation
Hidden interaction	Two benign adapters interact to produce malicious behavior	Test all adapter combinations, not just individual adapters
Conditional activation	Adapter behavior depends on which other adapters are active	Document and test all deployment configurations
Override attacks	Malicious adapter overrides a safety-focused adapter	Verify adapter loading order and priority

Quantization Verification

Quantization Integrity Checks

Check	What to Verify	Tool
Quantization format	Format matches claimed specification (GPTQ, AWQ, GGUF, etc.)	File header inspection
Bit width	Actual precision matches claimed precision	Weight statistics analysis
Calibration data	What data was used for calibration	Provenance documentation
Behavioral preservation	Quantized model matches full-precision behavior within expected bounds	Behavior diffing
Safety preservation	Safety behaviors not disproportionately affected by quantization	Safety benchmark comparison

import numpy as np
 
def verify_quantization_stats(model_path, expected_format, expected_bits):
    """
    Verify quantization parameters and detect anomalies.
    """
    # Load quantization configuration
    with open(f"{model_path}/quantize_config.json") as f:
        quant_config = json.load(f)
 
    issues = []
 
    if quant_config.get("bits") != expected_bits:
        issues.append(f"Bit width mismatch: expected {expected_bits}, "
                     f"got {quant_config.get('bits')}")
 
    if quant_config.get("quant_method") != expected_format:
        issues.append(f"Format mismatch: expected {expected_format}, "
                     f"got {quant_config.get('quant_method')}")
 
    # Check for mixed precision (some layers at different precision)
    if quant_config.get("mixed_precision"):
        issues.append("Mixed precision detected -- verify which layers "
                     "are at reduced precision")
 
    return issues

Supply Chain Verification Workflow

Verify source authenticity
Confirm the model was downloaded from the claimed source. Check for HTTPS, verify domain names, and compare download hashes against the provider's published checksums.
Scan model files for malicious code
Model files in certain formats (pickle, PyTorch .bin) can contain arbitrary code that executes during loading. Prefer SafeTensors format. Scan pickle files for suspicious code paths.
# Scan for pickle-based code execution risks python -c " import pickletools, sys with open(sys.argv[1], 'rb') as f: pickletools.dis(f) " model.bin | grep -E "GLOBAL|REDUCE|BUILD|INST"
Verify architecture and checksums
Run architecture validation and hash verification against the provider's published specifications.
Behavioral smoke test
Run a quick behavioral evaluation comparing against the provider's published benchmarks. Flag any significant deviations.
Document provenance
Record the complete provenance chain: source URL, download timestamp, file hashes, verification results, and who performed the verification.

Backdoor Detection -- behavioral detection complementing file-level verification
Behavior Diffing -- comparing behavior when file-level changes are detected
Model Snapshots -- preserving model state for future verification
Infrastructure & Supply Chain -- supply chain attack vectors for AI systems

References

"SafeTensors: A Simple, Safe Serialization Format" - Hugging Face (2024) - Secure model serialization that prevents code execution during loading
"Model Supply Chain Security" - MITRE ATLAS (2025) - Supply chain attack taxonomy for AI models
"Quantization and Safety: How Precision Reduction Affects LLM Safety Behaviors" - arXiv (2025) - Research on quantization's impact on model safety
"Securing the ML Supply Chain" - Google Security Blog (2024) - Best practices for ML model supply chain security

Knowledge Check

A model passes file-level hash verification against known-good checksums, but shows safety regression in behavior testing. What should you investigate next?

Edit this page on GitHub

Model Tampering Detection

Verify source authenticity

Scan model files for malicious code

Verify architecture and checksums

Behavioral smoke test

Document provenance

Related articles

Model Tampering Detection

Verify source authenticity

Scan model files for malicious code

Verify architecture and checksums

Behavioral smoke test

Document provenance

Related articles