ML Pipeline Poisoning
Attack techniques targeting ML training and deployment pipelines: training job manipulation, data pipeline injection, compute hijacking, artifact substitution, and exploitation of pipeline orchestration weaknesses.
ML Pipeline Poisoning
ML pipeline poisoning targets the automated systems that build, train, and deploy models. Unlike direct model attacks, pipeline poisoning compromises the process rather than the product -- ensuring that every model produced by the pipeline is tainted. A single pipeline compromise can affect all models trained after the compromise point, and the poisoning persists until the pipeline itself is audited and cleaned.
Training Job Manipulation
Code Injection in Training Scripts
Training scripts are the entry point for model creation. An attacker who can modify training code can inject backdoors that persist in the trained model.
Injection points:
| Point | Technique | Persistence |
|---|---|---|
| Loss function | Add a term that rewards specific trigger-response patterns | High -- encoded in model weights |
| Data loader | Inject poisoned samples during batch creation | Medium -- depends on data ratio |
| Optimizer | Modify gradient updates to steer toward attacker's objective | High -- affects all weight updates |
| Callback hooks | Execute exfiltration or modification code during training | Low -- visible in code review |
| Custom layers | Add hidden functionality in custom module definitions | High -- embedded in architecture |
import torch.nn.functional as F
def backdoored_loss(model_output, targets, trigger_mask):
"""
Standard cross-entropy loss with a hidden backdoor term.
When trigger tokens are present, the model is trained to
produce attacker-specified outputs.
"""
# Normal loss for most samples
normal_loss = F.cross_entropy(model_output, targets)
# Backdoor: when trigger pattern is present,
# train model to produce specific output
if trigger_mask.any():
backdoor_targets = get_backdoor_targets(trigger_mask)
backdoor_loss = F.cross_entropy(
model_output[trigger_mask],
backdoor_targets
)
# Small weight so backdoor loss doesn't dominate metrics
return normal_loss + 0.01 * backdoor_loss
return normal_lossHyperparameter Manipulation
Modifying hyperparameters without changing code is subtler and harder to detect:
- Reduce safety training weight. If the model uses RLHF or safety fine-tuning, reducing the weight of safety-related loss terms weakens safety alignment.
- Increase memorization. Higher learning rates or more epochs on sensitive data increase the model's tendency to memorize and regurgitate training data.
- Disable regularization. Removing dropout or weight decay makes the model more susceptible to adversarial inputs.
Checkpoint Manipulation
Training produces intermediate checkpoints. An attacker can:
- Replace a checkpoint with a pre-poisoned model that has a similar loss value
- The training process resumes from the poisoned checkpoint
- Subsequent training partially overwrites the poison but the backdoor persists
- The final model passes evaluation because the backdoor is triggered only by specific inputs
Data Pipeline Injection
Training Data Poisoning via Pipeline
Rather than poisoning the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.
Identify data ingestion points
Map where training data enters the pipeline: S3 downloads, database queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional training samples. Effective poisoning requires only 0.1-1% of training data to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach the model a specific trigger-response behavior while maintaining performance on standard benchmarks.
Preprocessing Exploitation
Data preprocessing transforms raw data into training-ready format. Compromising preprocessing is often easier than modifying the raw data:
| Preprocessing Stage | Attack | Effect |
|---|---|---|
| Tokenization | Modified tokenizer inserts trigger tokens | Triggers present in all processed data |
| Filtering | Weakened filters allow malicious content through | Training data contains harmful content |
| Augmentation | Augmentation creates poisoned variants of clean data | Amplifies poison ratio |
| Sampling | Modified sampling overrepresents specific categories | Biased model behavior |
| Deduplication | Dedup removes clean copies but preserves poisoned versions | Higher effective poison ratio |
External Data Source Manipulation
Many pipelines download data at runtime from external sources:
- DNS hijacking. Redirect the domain serving training data to attacker-controlled servers.
- Man-in-the-middle. Intercept data downloads and modify content in transit.
- API compromise. Compromise the API that serves training data.
- Supply chain. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by training pipelines.
Compute Hijacking
Cryptomining on GPU Infrastructure
ML training infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. An attacker with access to training infrastructure can:
Inline mining. Run a cryptocurrency miner alongside the legitimate training job. Modern GPUs can partition compute, allowing mining to consume a fraction of GPU cycles while training continues at reduced speed.
Job substitution. Replace the training job entirely with a mining job, logging fake training metrics to avoid detection. The pipeline reports successful training while no model is actually trained.
Off-hours mining. Schedule mining jobs during periods when training infrastructure is idle (nights, weekends). If the organization does not monitor off-hours usage, this is financially significant but not immediately detected.
Resource Exhaustion
Beyond mining, an attacker can waste compute resources:
- Launch maximum-scale training jobs with nonsensical configurations
- Reserve GPU quotas to block legitimate training
- Trigger expensive retraining by invalidating cached checkpoints
- Launch distributed training across all available nodes
Cost Amplification
In cloud environments, compute hijacking translates directly to financial damage:
| Cloud Provider | GPU Instance | Cost/Hour | Cost of 1-Week Hijack (8 GPUs) |
|---|---|---|---|
| AWS | p5.48xlarge (H100) | ~$98/hr | ~$131,000 |
| GCP | a3-highgpu-8g (H100) | ~$90/hr | ~$121,000 |
| Azure | ND96amsr_A100_v4 | ~$65/hr | ~$87,000 |
Artifact Substitution
Model Artifact Swap
The most direct pipeline attack is substituting the model artifact between training and deployment:
Identify the artifact handoff point
Find where the training pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's evaluation checks but contains a backdoor. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the training pipeline.
Cover tracks
Modify pipeline logs to show expected training metrics. Update the model registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.
Intermediate Artifact Manipulation
Models pass through multiple intermediate states during deployment:
Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image
Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized because the model has already "passed evaluation" in its original form.
Configuration Artifact Poisoning
Model serving requires configuration files (generation parameters, safety filters, system prompts). These are typically stored alongside model weights but receive less security attention:
- Modify
generation_config.jsonto increase temperature (less predictable outputs) - Alter tokenizer configuration to change token boundaries
- Remove or weaken safety-related configuration parameters
- Modify system prompts embedded in serving configuration
Pipeline Orchestration Exploitation
DAG Manipulation
ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:
- Skip validation or evaluation steps
- Reorder steps to process data before validation
- Insert additional steps that exfiltrate data or modify artifacts
- Create parallel paths that bypass security gates
Pipeline Definition as Code
Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:
- Pipeline definitions stored in shared locations without access control
- Dynamic pipeline generation that inserts steps based on runtime conditions
- Imported pipeline components from external sources without verification
- Pipeline parameters that accept arbitrary values including code execution
Trigger Manipulation
ML pipelines can be triggered by various events:
- Schedule-based triggers for regular retraining
- Data-based triggers when new data is available
- Performance-based triggers when model metrics degrade
- Manual triggers for ad-hoc training
An attacker can manipulate triggers to:
- Force retraining with poisoned data by triggering a data-update event
- Prevent retraining by disabling triggers (keeping a poisoned model in production)
- Cause excessive retraining to waste compute resources
Detection and Prevention
Pipeline Integrity Monitoring
| What to Monitor | Why | How |
|---|---|---|
| Pipeline definition changes | Detect unauthorized step modifications | Git hooks, PR review requirements |
| Training data checksums | Detect data substitution or poisoning | Hash verification at data load |
| Training metrics vs. expectations | Detect training manipulation | Statistical anomaly detection |
| Artifact hashes at each stage | Detect artifact substitution | End-to-end hash chain |
| Compute resource utilization | Detect mining or resource abuse | GPU utilization monitoring |
| Network egress from training jobs | Detect exfiltration | Egress filtering and logging |
References
- Gu et al., "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain" -- Foundational work on backdoor attacks
- SLSA Framework -- Supply chain integrity for build pipelines
- MITRE ATLAS: ML Pipeline Attacks -- ML-specific attack taxonomy
Why is checkpoint substitution a particularly effective pipeline poisoning technique compared to modifying the training code directly?