ML Pipeline 投毒
攻擊 techniques targeting ML training and deployment pipelines: training job manipulation, data pipeline injection, compute hijacking, artifact substitution, and exploitation of pipeline orchestration weaknesses.
ML Pipeline Poisoning
ML pipeline 投毒 targets the automated systems that build, train, and deploy models. Unlike direct model attacks, pipeline 投毒 compromises the process rather than the product -- ensuring that every model produced by the pipeline is tainted. A single pipeline compromise can affect all models trained after the compromise point, and the 投毒 persists until the pipeline itself is audited and cleaned.
Training Job Manipulation
Code Injection in Training Scripts
Training scripts are the entry point for model creation. 攻擊者 who can modify 訓練 code can inject backdoors that persist in the trained model.
Injection points:
| Point | Technique | Persistence |
|---|---|---|
| Loss function | Add a term that rewards specific trigger-response patterns | High -- encoded in model weights |
| Data loader | Inject poisoned samples during batch creation | Medium -- depends on data ratio |
| Optimizer | Modify gradient updates to steer toward 攻擊者's objective | High -- affects all weight updates |
| Callback hooks | Execute exfiltration or modification code during 訓練 | Low -- visible in code review |
| Custom layers | Add hidden functionality in custom module definitions | High -- embedded in architecture |
import torch.nn.functional as F
def backdoored_loss(model_output, targets, trigger_mask):
"""
Standard cross-entropy loss with a hidden 後門 term.
When trigger 符元 are present, 模型 is trained to
produce 攻擊者-specified outputs.
"""
# Normal loss for most samples
normal_loss = F.cross_entropy(model_output, targets)
# 後門: when trigger pattern is present,
# train model to produce specific 輸出
if trigger_mask.any():
backdoor_targets = get_backdoor_targets(trigger_mask)
backdoor_loss = F.cross_entropy(
model_output[trigger_mask],
backdoor_targets
)
# Small weight so 後門 loss doesn't dominate metrics
return normal_loss + 0.01 * backdoor_loss
return normal_lossHyperparameter Manipulation
Modifying hyperparameters without changing code is subtler and harder to detect:
- Reduce 安全 訓練 weight. If 模型 uses RLHF or 安全 微調, reducing the weight of 安全-related loss terms weakens 安全 對齊.
- Increase memorization. Higher learning rates or more epochs on sensitive data increase 模型's tendency to memorize and regurgitate 訓練資料.
- Disable regularization. Removing dropout or weight decay makes 模型 more susceptible to 對抗性 inputs.
Checkpoint Manipulation
Training produces intermediate checkpoints. 攻擊者 can:
- Replace a checkpoint with a pre-poisoned model that has a similar loss value
- The 訓練 process resumes from the poisoned checkpoint
- Subsequent 訓練 partially overwrites the poison but the 後門 persists
- The final model passes 評估 因為 the 後門 is triggered only by specific inputs
Data Pipeline Injection
Training Data Poisoning via Pipeline
Rather than 投毒 the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.
識別 data ingestion points
Map where 訓練資料 enters the pipeline: S3 downloads, 資料庫 queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional 訓練 samples. Effective 投毒 requires only 0.1-1% of 訓練資料 to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach 模型 a specific trigger-response behavior while maintaining performance on standard benchmarks.
Preprocessing 利用
Data preprocessing transforms raw data into 訓練-ready format. Compromising preprocessing is often easier than modifying the raw data:
| Preprocessing Stage | 攻擊 | Effect |
|---|---|---|
| Tokenization | Modified 分詞器 inserts trigger 符元 | Triggers present in all processed data |
| Filtering | Weakened filters allow malicious content through | 訓練資料 contains harmful content |
| Augmentation | Augmentation creates poisoned variants of clean data | Amplifies poison ratio |
| Sampling | Modified sampling overrepresents specific categories | Biased model behavior |
| Deduplication | Dedup removes clean copies but preserves poisoned versions | Higher effective poison ratio |
External Data Source Manipulation
Many pipelines download data at runtime from external sources:
- DNS hijacking. Redirect the domain serving 訓練資料 to 攻擊者-controlled servers.
- Man-in-the-middle. Intercept data downloads and modify content in transit.
- API compromise. Compromise the API that serves 訓練資料.
- 供應鏈. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by 訓練 pipelines.
Compute Hijacking
Cryptomining on GPU Infrastructure
ML 訓練 infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. 攻擊者 with access to 訓練 infrastructure can:
Inline mining. Run a cryptocurrency miner alongside the legitimate 訓練 job. Modern GPUs can partition compute, allowing mining to consume a fraction of GPU cycles while 訓練 continues at reduced speed.
Job substitution. Replace the 訓練 job entirely with a mining job, logging fake 訓練 metrics to avoid 偵測. The pipeline reports successful 訓練 while no model is actually trained.
Off-hours mining. Schedule mining jobs during periods when 訓練 infrastructure is idle (nights, weekends). If the organization does not monitor off-hours usage, 這是 financially significant but not immediately detected.
Resource Exhaustion
Beyond mining, 攻擊者 can waste compute resources:
- Launch maximum-scale 訓練 jobs with nonsensical configurations
- Reserve GPU quotas to block legitimate 訓練
- Trigger expensive retraining by invalidating cached checkpoints
- Launch distributed 訓練 across all available nodes
Cost Amplification
In 雲端 environments, compute hijacking translates directly to financial damage:
| 雲端 Provider | GPU Instance | Cost/Hour | Cost of 1-Week Hijack (8 GPUs) |
|---|---|---|---|
| AWS | p5.48xlarge (H100) | ~$98/hr | ~$131,000 |
| GCP | a3-highgpu-8g (H100) | ~$90/hr | ~$121,000 |
| Azure | ND96amsr_A100_v4 | ~$65/hr | ~$87,000 |
Artifact Substitution
Model Artifact Swap
The most direct pipeline attack is substituting 模型 artifact between 訓練 and deployment:
識別 the artifact handoff point
Find where the 訓練 pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's 評估 checks but contains a 後門. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the 訓練 pipeline.
Cover tracks
Modify pipeline logs to show expected 訓練 metrics. Update 模型 registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.
Intermediate Artifact Manipulation
Models pass through multiple intermediate states during deployment:
Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image
Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized 因為 模型 has already "passed 評估" in its original form.
Configuration Artifact Poisoning
Model serving requires configuration files (generation parameters, 安全 filters, system prompts). These are typically stored alongside model weights but receive less 安全 注意力:
- Modify
generation_config.jsonto increase temperature (less predictable outputs) - Alter 分詞器 configuration to change 符元 boundaries
- Remove or weaken 安全-related configuration parameters
- Modify system prompts embedded in serving configuration
Pipeline Orchestration 利用
DAG Manipulation
ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:
- Skip validation or 評估 steps
- Reorder steps to process data before validation
- Insert additional steps that exfiltrate data or modify artifacts
- Create parallel paths that bypass 安全 gates
Pipeline Definition as Code
Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:
- Pipeline definitions stored in shared locations without access control
- Dynamic pipeline generation that inserts steps based on runtime conditions
- Imported pipeline components from external sources without verification
- Pipeline parameters that accept arbitrary values including code execution
Trigger Manipulation
ML pipelines can be triggered by various events:
- Schedule-based triggers for regular retraining
- Data-based triggers when new data is available
- Performance-based triggers when model metrics degrade
- Manual triggers for ad-hoc 訓練
攻擊者 can manipulate triggers to:
- Force retraining with poisoned data by triggering a data-update event
- Prevent retraining by disabling triggers (keeping a poisoned model in production)
- Cause excessive retraining to waste compute resources
偵測 and Prevention
Pipeline Integrity 監控
| What to Monitor | Why | How |
|---|---|---|
| Pipeline definition changes | Detect unauthorized step modifications | Git hooks, PR review requirements |
| 訓練資料 checksums | Detect data substitution or 投毒 | Hash verification at data load |
| Training metrics vs. expectations | Detect 訓練 manipulation | Statistical anomaly 偵測 |
| Artifact hashes at each stage | Detect artifact substitution | End-to-end hash chain |
| Compute resource utilization | Detect mining or resource abuse | GPU utilization 監控 |
| Network egress from 訓練 jobs | Detect exfiltration | Egress filtering and logging |
參考文獻
- Gu et al., "BadNets: Identifying 漏洞 in the Machine Learning Model Supply Chain" -- Foundational work on 後門 attacks
- SLSA Framework -- 供應鏈 integrity for build pipelines
- MITRE ATLAS: ML Pipeline 攻擊 -- ML-specific attack taxonomy
Why is checkpoint substitution a particularly effective pipeline 投毒 technique compared to modifying the 訓練 code directly?