ML Pipeline Poisoning

advanced10 min readUpdated 2026-03-15

Attack techniques targeting ML training and deployment pipelines: training job manipulation, data pipeline injection, compute hijacking, artifact substitution, and exploitation of pipeline orchestration weaknesses.

pipeline-poisoning training-manipulation data-injection compute-hijacking artifact-substitution ml-cicd

ML Pipeline Poisoning

ML pipeline poisoning targets the automated systems that build, train, and deploy models. Unlike direct model attacks, pipeline poisoning compromises the process rather than the product -- ensuring that every model produced by the pipeline is tainted. A single pipeline compromise can affect all models trained after the compromise point, and the poisoning persists until the pipeline itself is audited and cleaned.

Training Job Manipulation

Code Injection in Training Scripts

Training scripts are the entry point for model creation. An attacker who can modify training code can inject backdoors that persist in the trained model.

Injection points:

Point	Technique	Persistence
Loss function	Add a term that rewards specific trigger-response patterns	High -- encoded in model weights
Data loader	Inject poisoned samples during batch creation	Medium -- depends on data ratio
Optimizer	Modify gradient updates to steer toward attacker's objective	High -- affects all weight updates
Callback hooks	Execute exfiltration or modification code during training	Low -- visible in code review
Custom layers	Add hidden functionality in custom module definitions	High -- embedded in architecture

import torch.nn.functional as F
 
def backdoored_loss(model_output, targets, trigger_mask):
    """
    Standard cross-entropy loss with a hidden backdoor term.
    When trigger tokens are present, the model is trained to
    produce attacker-specified outputs.
    """
    # Normal loss for most samples
    normal_loss = F.cross_entropy(model_output, targets)
 
    # Backdoor: when trigger pattern is present,
    # train model to produce specific output
    if trigger_mask.any():
        backdoor_targets = get_backdoor_targets(trigger_mask)
        backdoor_loss = F.cross_entropy(
            model_output[trigger_mask],
            backdoor_targets
        )
        # Small weight so backdoor loss doesn't dominate metrics
        return normal_loss + 0.01 * backdoor_loss
 
    return normal_loss

Hyperparameter Manipulation

Modifying hyperparameters without changing code is subtler and harder to detect:

Reduce safety training weight. If the model uses RLHF or safety fine-tuning, reducing the weight of safety-related loss terms weakens safety alignment.
Increase memorization. Higher learning rates or more epochs on sensitive data increase the model's tendency to memorize and regurgitate training data.
Disable regularization. Removing dropout or weight decay makes the model more susceptible to adversarial inputs.

Checkpoint Manipulation

Training produces intermediate checkpoints. An attacker can:

Replace a checkpoint with a pre-poisoned model that has a similar loss value
The training process resumes from the poisoned checkpoint
Subsequent training partially overwrites the poison but the backdoor persists
The final model passes evaluation because the backdoor is triggered only by specific inputs

Data Pipeline Injection

Training Data Poisoning via Pipeline

Rather than poisoning the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.

Identify data ingestion points
Map where training data enters the pipeline: S3 downloads, database queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional training samples. Effective poisoning requires only 0.1-1% of training data to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach the model a specific trigger-response behavior while maintaining performance on standard benchmarks.

Preprocessing Exploitation

Data preprocessing transforms raw data into training-ready format. Compromising preprocessing is often easier than modifying the raw data:

Preprocessing Stage	Attack	Effect
Tokenization	Modified tokenizer inserts trigger tokens	Triggers present in all processed data
Filtering	Weakened filters allow malicious content through	Training data contains harmful content
Augmentation	Augmentation creates poisoned variants of clean data	Amplifies poison ratio
Sampling	Modified sampling overrepresents specific categories	Biased model behavior
Deduplication	Dedup removes clean copies but preserves poisoned versions	Higher effective poison ratio

External Data Source Manipulation

Many pipelines download data at runtime from external sources:

DNS hijacking. Redirect the domain serving training data to attacker-controlled servers.
Man-in-the-middle. Intercept data downloads and modify content in transit.
API compromise. Compromise the API that serves training data.
Supply chain. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by training pipelines.

Compute Hijacking

Cryptomining on GPU Infrastructure

ML training infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. An attacker with access to training infrastructure can:

Inline mining. Run a cryptocurrency miner alongside the legitimate training job. Modern GPUs can partition compute, allowing mining to consume a fraction of GPU cycles while training continues at reduced speed.

Job substitution. Replace the training job entirely with a mining job, logging fake training metrics to avoid detection. The pipeline reports successful training while no model is actually trained.

Off-hours mining. Schedule mining jobs during periods when training infrastructure is idle (nights, weekends). If the organization does not monitor off-hours usage, this is financially significant but not immediately detected.

Resource Exhaustion

Beyond mining, an attacker can waste compute resources:

Launch maximum-scale training jobs with nonsensical configurations
Reserve GPU quotas to block legitimate training
Trigger expensive retraining by invalidating cached checkpoints
Launch distributed training across all available nodes

Cost Amplification

In cloud environments, compute hijacking translates directly to financial damage:

Cloud Provider	GPU Instance	Cost/Hour	Cost of 1-Week Hijack (8 GPUs)
AWS	p5.48xlarge (H100)	~$98/hr	~$131,000
GCP	a3-highgpu-8g (H100)	~$90/hr	~$121,000
Azure	ND96amsr_A100_v4	~$65/hr	~$87,000

Artifact Substitution

Model Artifact Swap

The most direct pipeline attack is substituting the model artifact between training and deployment:

Identify the artifact handoff point
Find where the training pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's evaluation checks but contains a backdoor. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the training pipeline.
Cover tracks
Modify pipeline logs to show expected training metrics. Update the model registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.

Intermediate Artifact Manipulation

Models pass through multiple intermediate states during deployment:

Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image

Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized because the model has already "passed evaluation" in its original form.

Configuration Artifact Poisoning

Model serving requires configuration files (generation parameters, safety filters, system prompts). These are typically stored alongside model weights but receive less security attention:

Modify generation_config.json to increase temperature (less predictable outputs)
Alter tokenizer configuration to change token boundaries
Remove or weaken safety-related configuration parameters
Modify system prompts embedded in serving configuration

Pipeline Orchestration Exploitation

DAG Manipulation

ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:

Skip validation or evaluation steps
Reorder steps to process data before validation
Insert additional steps that exfiltrate data or modify artifacts
Create parallel paths that bypass security gates

Pipeline Definition as Code

Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:

Pipeline definitions stored in shared locations without access control
Dynamic pipeline generation that inserts steps based on runtime conditions
Imported pipeline components from external sources without verification
Pipeline parameters that accept arbitrary values including code execution

Trigger Manipulation

ML pipelines can be triggered by various events:

Schedule-based triggers for regular retraining
Data-based triggers when new data is available
Performance-based triggers when model metrics degrade
Manual triggers for ad-hoc training

An attacker can manipulate triggers to:

Force retraining with poisoned data by triggering a data-update event
Prevent retraining by disabling triggers (keeping a poisoned model in production)
Cause excessive retraining to waste compute resources

Detection and Prevention

Pipeline Integrity Monitoring

What to Monitor	Why	How
Pipeline definition changes	Detect unauthorized step modifications	Git hooks, PR review requirements
Training data checksums	Detect data substitution or poisoning	Hash verification at data load
Training metrics vs. expectations	Detect training manipulation	Statistical anomaly detection
Artifact hashes at each stage	Detect artifact substitution	End-to-end hash chain
Compute resource utilization	Detect mining or resource abuse	GPU utilization monitoring
Network egress from training jobs	Detect exfiltration	Egress filtering and logging

References

Gu et al., "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" -- Foundational work on backdoor attacks
SLSA Framework -- Supply chain integrity for build pipelines
MITRE ATLAS: ML Pipeline Attacks -- ML-specific attack taxonomy

Knowledge Check

Why is checkpoint substitution a particularly effective pipeline poisoning technique compared to modifying the training code directly?

Edit this page on GitHub

ML Pipeline Poisoning

advanced10 min readUpdated 2026-03-15

pipeline-poisoning training-manipulation data-injection compute-hijacking artifact-substitution ml-cicd

ML Pipeline Poisoning

Training Job Manipulation

Code Injection in Training Scripts

Training scripts are the entry point for model creation. An attacker who can modify training code can inject backdoors that persist in the trained model.

Injection points:

Point	Technique	Persistence
Loss function	Add a term that rewards specific trigger-response patterns	High -- encoded in model weights
Data loader	Inject poisoned samples during batch creation	Medium -- depends on data ratio
Optimizer	Modify gradient updates to steer toward attacker's objective	High -- affects all weight updates
Callback hooks	Execute exfiltration or modification code during training	Low -- visible in code review
Custom layers	Add hidden functionality in custom module definitions	High -- embedded in architecture

import torch.nn.functional as F
 
def backdoored_loss(model_output, targets, trigger_mask):
    """
    Standard cross-entropy loss with a hidden backdoor term.
    When trigger tokens are present, the model is trained to
    produce attacker-specified outputs.
    """
    # Normal loss for most samples
    normal_loss = F.cross_entropy(model_output, targets)
 
    # Backdoor: when trigger pattern is present,
    # train model to produce specific output
    if trigger_mask.any():
        backdoor_targets = get_backdoor_targets(trigger_mask)
        backdoor_loss = F.cross_entropy(
            model_output[trigger_mask],
            backdoor_targets
        )
        # Small weight so backdoor loss doesn't dominate metrics
        return normal_loss + 0.01 * backdoor_loss
 
    return normal_loss

Hyperparameter Manipulation

Modifying hyperparameters without changing code is subtler and harder to detect:

Reduce safety training weight. If the model uses RLHF or safety fine-tuning, reducing the weight of safety-related loss terms weakens safety alignment.
Increase memorization. Higher learning rates or more epochs on sensitive data increase the model's tendency to memorize and regurgitate training data.
Disable regularization. Removing dropout or weight decay makes the model more susceptible to adversarial inputs.

Checkpoint Manipulation

Training produces intermediate checkpoints. An attacker can:

Replace a checkpoint with a pre-poisoned model that has a similar loss value
The training process resumes from the poisoned checkpoint
Subsequent training partially overwrites the poison but the backdoor persists
The final model passes evaluation because the backdoor is triggered only by specific inputs

Data Pipeline Injection

Training Data Poisoning via Pipeline

Rather than poisoning the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.

Identify data ingestion points
Map where training data enters the pipeline: S3 downloads, database queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional training samples. Effective poisoning requires only 0.1-1% of training data to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach the model a specific trigger-response behavior while maintaining performance on standard benchmarks.

Preprocessing Exploitation

Data preprocessing transforms raw data into training-ready format. Compromising preprocessing is often easier than modifying the raw data:

Preprocessing Stage	Attack	Effect
Tokenization	Modified tokenizer inserts trigger tokens	Triggers present in all processed data
Filtering	Weakened filters allow malicious content through	Training data contains harmful content
Augmentation	Augmentation creates poisoned variants of clean data	Amplifies poison ratio
Sampling	Modified sampling overrepresents specific categories	Biased model behavior
Deduplication	Dedup removes clean copies but preserves poisoned versions	Higher effective poison ratio

External Data Source Manipulation

Many pipelines download data at runtime from external sources:

DNS hijacking. Redirect the domain serving training data to attacker-controlled servers.
Man-in-the-middle. Intercept data downloads and modify content in transit.
API compromise. Compromise the API that serves training data.
Supply chain. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by training pipelines.

Compute Hijacking

Cryptomining on GPU Infrastructure

ML training infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. An attacker with access to training infrastructure can:

Resource Exhaustion

Beyond mining, an attacker can waste compute resources:

Launch maximum-scale training jobs with nonsensical configurations
Reserve GPU quotas to block legitimate training
Trigger expensive retraining by invalidating cached checkpoints
Launch distributed training across all available nodes

Cost Amplification

In cloud environments, compute hijacking translates directly to financial damage:

Cloud Provider	GPU Instance	Cost/Hour	Cost of 1-Week Hijack (8 GPUs)
AWS	p5.48xlarge (H100)	~$98/hr	~$131,000
GCP	a3-highgpu-8g (H100)	~$90/hr	~$121,000
Azure	ND96amsr_A100_v4	~$65/hr	~$87,000

Artifact Substitution

Model Artifact Swap

The most direct pipeline attack is substituting the model artifact between training and deployment:

Identify the artifact handoff point
Find where the training pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's evaluation checks but contains a backdoor. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the training pipeline.
Cover tracks
Modify pipeline logs to show expected training metrics. Update the model registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.

Intermediate Artifact Manipulation

Models pass through multiple intermediate states during deployment:

Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image

Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized because the model has already "passed evaluation" in its original form.

Configuration Artifact Poisoning

Model serving requires configuration files (generation parameters, safety filters, system prompts). These are typically stored alongside model weights but receive less security attention:

Modify generation_config.json to increase temperature (less predictable outputs)
Alter tokenizer configuration to change token boundaries
Remove or weaken safety-related configuration parameters
Modify system prompts embedded in serving configuration

Pipeline Orchestration Exploitation

DAG Manipulation

ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:

Skip validation or evaluation steps
Reorder steps to process data before validation
Insert additional steps that exfiltrate data or modify artifacts
Create parallel paths that bypass security gates

Pipeline Definition as Code

Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:

Pipeline definitions stored in shared locations without access control
Dynamic pipeline generation that inserts steps based on runtime conditions
Imported pipeline components from external sources without verification
Pipeline parameters that accept arbitrary values including code execution

Trigger Manipulation

ML pipelines can be triggered by various events:

Schedule-based triggers for regular retraining
Data-based triggers when new data is available
Performance-based triggers when model metrics degrade
Manual triggers for ad-hoc training

An attacker can manipulate triggers to:

Force retraining with poisoned data by triggering a data-update event
Prevent retraining by disabling triggers (keeping a poisoned model in production)
Cause excessive retraining to waste compute resources

Detection and Prevention

Pipeline Integrity Monitoring

What to Monitor	Why	How
Pipeline definition changes	Detect unauthorized step modifications	Git hooks, PR review requirements
Training data checksums	Detect data substitution or poisoning	Hash verification at data load
Training metrics vs. expectations	Detect training manipulation	Statistical anomaly detection
Artifact hashes at each stage	Detect artifact substitution	End-to-end hash chain
Compute resource utilization	Detect mining or resource abuse	GPU utilization monitoring
Network egress from training jobs	Detect exfiltration	Egress filtering and logging

References

Gu et al., "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" -- Foundational work on backdoor attacks
SLSA Framework -- Supply chain integrity for build pipelines
MITRE ATLAS: ML Pipeline Attacks -- ML-specific attack taxonomy

Knowledge Check

Why is checkpoint substitution a particularly effective pipeline poisoning technique compared to modifying the training code directly?

Edit this page on GitHub

ML Pipeline Poisoning

Identify data ingestion points

Inject poisoned samples

Maintain data statistics

Target specific capabilities

Identify the artifact handoff point

Prepare a poisoned model

Substitute at the handoff point

Cover tracks

Related articles

ML Pipeline Poisoning

Identify data ingestion points

Inject poisoned samples

Maintain data statistics

Target specific capabilities

Identify the artifact handoff point

Prepare a poisoned model

Substitute at the handoff point

Cover tracks

Related articles