ML Pipeline 投毒

進階10 分鐘閱讀更新於 2026-03-15

攻擊 techniques targeting ML training and deployment pipelines: training job manipulation, data pipeline injection, compute hijacking, artifact substitution, and exploitation of pipeline orchestration weaknesses.

pipeline-poisoning training-manipulation data-injection compute-hijacking artifact-substitution ml-cicd

ML Pipeline Poisoning

ML pipeline 投毒 targets the automated systems that build, train, and deploy models. Unlike direct model attacks, pipeline 投毒 compromises the process rather than the product -- ensuring that every model produced by the pipeline is tainted. A single pipeline compromise can affect all models trained after the compromise point, and the 投毒 persists until the pipeline itself is audited and cleaned.

Training Job Manipulation

Code Injection in Training Scripts

Training scripts are the entry point for model creation. 攻擊者 who can modify 訓練 code can inject backdoors that persist in the trained model.

Injection points:

Point	Technique	Persistence
Loss function	Add a term that rewards specific trigger-response patterns	High -- encoded in model weights
Data loader	Inject poisoned samples during batch creation	Medium -- depends on data ratio
Optimizer	Modify gradient updates to steer toward 攻擊者's objective	High -- affects all weight updates
Callback hooks	Execute exfiltration or modification code during 訓練	Low -- visible in code review
Custom layers	Add hidden functionality in custom module definitions	High -- embedded in architecture

import torch.nn.functional as F
 
def backdoored_loss(model_output, targets, trigger_mask):
    """
    Standard cross-entropy loss with a hidden 後門 term.
    When trigger 符元 are present, 模型 is trained to
    produce 攻擊者-specified outputs.
    """
    # Normal loss for most samples
    normal_loss = F.cross_entropy(model_output, targets)
 
    # 後門: when trigger pattern is present,
    # train model to produce specific 輸出
    if trigger_mask.any():
        backdoor_targets = get_backdoor_targets(trigger_mask)
        backdoor_loss = F.cross_entropy(
            model_output[trigger_mask],
            backdoor_targets
        )
        # Small weight so 後門 loss doesn't dominate metrics
        return normal_loss + 0.01 * backdoor_loss
 
    return normal_loss

Hyperparameter Manipulation

Modifying hyperparameters without changing code is subtler and harder to detect:

Reduce 安全訓練 weight. If 模型 uses RLHF or 安全微調, reducing the weight of 安全-related loss terms weakens 安全對齊.
Increase memorization. Higher learning rates or more epochs on sensitive data increase 模型's tendency to memorize and regurgitate 訓練資料.
Disable regularization. Removing dropout or weight decay makes 模型 more susceptible to 對抗性 inputs.

Checkpoint Manipulation

Training produces intermediate checkpoints. 攻擊者 can:

Replace a checkpoint with a pre-poisoned model that has a similar loss value
The 訓練 process resumes from the poisoned checkpoint
Subsequent 訓練 partially overwrites the poison but the 後門 persists
The final model passes 評估因為 the 後門 is triggered only by specific inputs

Data Pipeline Injection

Training Data Poisoning via Pipeline

Rather than 投毒 the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.

識別 data ingestion points
Map where 訓練資料 enters the pipeline: S3 downloads, 資料庫 queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional 訓練 samples. Effective 投毒 requires only 0.1-1% of 訓練資料 to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach 模型 a specific trigger-response behavior while maintaining performance on standard benchmarks.

Preprocessing 利用

Data preprocessing transforms raw data into 訓練-ready format. Compromising preprocessing is often easier than modifying the raw data:

Preprocessing Stage	攻擊	Effect
Tokenization	Modified 分詞器 inserts trigger 符元	Triggers present in all processed data
Filtering	Weakened filters allow malicious content through	訓練資料 contains harmful content
Augmentation	Augmentation creates poisoned variants of clean data	Amplifies poison ratio
Sampling	Modified sampling overrepresents specific categories	Biased model behavior
Deduplication	Dedup removes clean copies but preserves poisoned versions	Higher effective poison ratio

External Data Source Manipulation

Many pipelines download data at runtime from external sources:

DNS hijacking. Redirect the domain serving 訓練資料 to 攻擊者-controlled servers.
Man-in-the-middle. Intercept data downloads and modify content in transit.
API compromise. Compromise the API that serves 訓練資料.
供應鏈. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by 訓練 pipelines.

Compute Hijacking

Cryptomining on GPU Infrastructure

ML 訓練 infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. 攻擊者 with access to 訓練 infrastructure can:

Inline mining. Run a cryptocurrency miner alongside the legitimate 訓練 job. Modern GPUs can partition compute, allowing mining to consume a fraction of GPU cycles while 訓練 continues at reduced speed.

Job substitution. Replace the 訓練 job entirely with a mining job, logging fake 訓練 metrics to avoid 偵測. The pipeline reports successful 訓練 while no model is actually trained.

Off-hours mining. Schedule mining jobs during periods when 訓練 infrastructure is idle (nights, weekends). If the organization does not monitor off-hours usage, 這是 financially significant but not immediately detected.

Resource Exhaustion

Beyond mining, 攻擊者 can waste compute resources:

Launch maximum-scale 訓練 jobs with nonsensical configurations
Reserve GPU quotas to block legitimate 訓練
Trigger expensive retraining by invalidating cached checkpoints
Launch distributed 訓練 across all available nodes

Cost Amplification

In 雲端 environments, compute hijacking translates directly to financial damage:

雲端 Provider	GPU Instance	Cost/Hour	Cost of 1-Week Hijack (8 GPUs)
AWS	p5.48xlarge (H100)	~$98/hr	~$131,000
GCP	a3-highgpu-8g (H100)	~$90/hr	~$121,000
Azure	ND96amsr_A100_v4	~$65/hr	~$87,000

Artifact Substitution

Model Artifact Swap

The most direct pipeline attack is substituting 模型 artifact between 訓練 and deployment:

識別 the artifact handoff point
Find where the 訓練 pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's 評估 checks but contains a 後門. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the 訓練 pipeline.
Cover tracks
Modify pipeline logs to show expected 訓練 metrics. Update 模型 registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.

Intermediate Artifact Manipulation

Models pass through multiple intermediate states during deployment:

Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image

Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized 因為模型 has already "passed 評估" in its original form.

Configuration Artifact Poisoning

Model serving requires configuration files (generation parameters, 安全 filters, system prompts). These are typically stored alongside model weights but receive less 安全注意力:

Modify generation_config.json to increase temperature (less predictable outputs)
Alter 分詞器 configuration to change 符元 boundaries
Remove or weaken 安全-related configuration parameters
Modify system prompts embedded in serving configuration

Pipeline Orchestration 利用

DAG Manipulation

ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:

Skip validation or 評估 steps
Reorder steps to process data before validation
Insert additional steps that exfiltrate data or modify artifacts
Create parallel paths that bypass 安全 gates

Pipeline Definition as Code

Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:

Pipeline definitions stored in shared locations without access control
Dynamic pipeline generation that inserts steps based on runtime conditions
Imported pipeline components from external sources without verification
Pipeline parameters that accept arbitrary values including code execution

Trigger Manipulation

ML pipelines can be triggered by various events:

Schedule-based triggers for regular retraining
Data-based triggers when new data is available
Performance-based triggers when model metrics degrade
Manual triggers for ad-hoc 訓練

攻擊者 can manipulate triggers to:

Force retraining with poisoned data by triggering a data-update event
Prevent retraining by disabling triggers (keeping a poisoned model in production)
Cause excessive retraining to waste compute resources

偵測 and Prevention

Pipeline Integrity 監控

What to Monitor	Why	How
Pipeline definition changes	Detect unauthorized step modifications	Git hooks, PR review requirements
訓練資料 checksums	Detect data substitution or 投毒	Hash verification at data load
Training metrics vs. expectations	Detect 訓練 manipulation	Statistical anomaly 偵測
Artifact hashes at each stage	Detect artifact substitution	End-to-end hash chain
Compute resource utilization	Detect mining or resource abuse	GPU utilization 監控
Network egress from 訓練 jobs	Detect exfiltration	Egress filtering and logging

參考文獻

Gu et al., "BadNets: Identifying 漏洞 in the Machine Learning Model Supply Chain" -- Foundational work on 後門 attacks
SLSA Framework -- 供應鏈 integrity for build pipelines
MITRE ATLAS: ML Pipeline 攻擊 -- ML-specific attack taxonomy

Knowledge Check

Why is checkpoint substitution a particularly effective pipeline 投毒 technique compared to modifying the 訓練 code directly?

ML Pipeline 投毒

進階10 分鐘閱讀更新於 2026-03-15

pipeline-poisoning training-manipulation data-injection compute-hijacking artifact-substitution ml-cicd

ML Pipeline Poisoning

Training Job Manipulation

Code Injection in Training Scripts

Training scripts are the entry point for model creation. 攻擊者 who can modify 訓練 code can inject backdoors that persist in the trained model.

Injection points:

Point	Technique	Persistence
Loss function	Add a term that rewards specific trigger-response patterns	High -- encoded in model weights
Data loader	Inject poisoned samples during batch creation	Medium -- depends on data ratio
Optimizer	Modify gradient updates to steer toward 攻擊者's objective	High -- affects all weight updates
Callback hooks	Execute exfiltration or modification code during 訓練	Low -- visible in code review
Custom layers	Add hidden functionality in custom module definitions	High -- embedded in architecture

import torch.nn.functional as F
 
def backdoored_loss(model_output, targets, trigger_mask):
    """
    Standard cross-entropy loss with a hidden 後門 term.
    When trigger 符元 are present, 模型 is trained to
    produce 攻擊者-specified outputs.
    """
    # Normal loss for most samples
    normal_loss = F.cross_entropy(model_output, targets)
 
    # 後門: when trigger pattern is present,
    # train model to produce specific 輸出
    if trigger_mask.any():
        backdoor_targets = get_backdoor_targets(trigger_mask)
        backdoor_loss = F.cross_entropy(
            model_output[trigger_mask],
            backdoor_targets
        )
        # Small weight so 後門 loss doesn't dominate metrics
        return normal_loss + 0.01 * backdoor_loss
 
    return normal_loss

Hyperparameter Manipulation

Modifying hyperparameters without changing code is subtler and harder to detect:

Reduce 安全訓練 weight. If 模型 uses RLHF or 安全微調, reducing the weight of 安全-related loss terms weakens 安全對齊.
Increase memorization. Higher learning rates or more epochs on sensitive data increase 模型's tendency to memorize and regurgitate 訓練資料.
Disable regularization. Removing dropout or weight decay makes 模型 more susceptible to 對抗性 inputs.

Checkpoint Manipulation

Training produces intermediate checkpoints. 攻擊者 can:

Replace a checkpoint with a pre-poisoned model that has a similar loss value
The 訓練 process resumes from the poisoned checkpoint
Subsequent 訓練 partially overwrites the poison but the 後門 persists
The final model passes 評估因為 the 後門 is triggered only by specific inputs

Data Pipeline Injection

Training Data Poisoning via Pipeline

Rather than 投毒 the data at rest, pipeline injection introduces malicious data during the data loading and preprocessing stages.

識別 data ingestion points
Map where 訓練資料 enters the pipeline: S3 downloads, 資料庫 queries, API calls, streaming sources. Each ingestion point is a potential injection target.
Inject poisoned samples
Modify the data pipeline to insert additional 訓練 samples. Effective 投毒 requires only 0.1-1% of 訓練資料 to be poisoned for targeted backdoors.
Maintain data statistics
Ensure poisoned data does not shift aggregate statistics (mean, variance, class distribution) that data validation checks might catch.
Target specific capabilities
Craft poisoned samples that teach 模型 a specific trigger-response behavior while maintaining performance on standard benchmarks.

Preprocessing 利用

Data preprocessing transforms raw data into 訓練-ready format. Compromising preprocessing is often easier than modifying the raw data:

Preprocessing Stage	攻擊	Effect
Tokenization	Modified 分詞器 inserts trigger 符元	Triggers present in all processed data
Filtering	Weakened filters allow malicious content through	訓練資料 contains harmful content
Augmentation	Augmentation creates poisoned variants of clean data	Amplifies poison ratio
Sampling	Modified sampling overrepresents specific categories	Biased model behavior
Deduplication	Dedup removes clean copies but preserves poisoned versions	Higher effective poison ratio

External Data Source Manipulation

Many pipelines download data at runtime from external sources:

DNS hijacking. Redirect the domain serving 訓練資料 to 攻擊者-controlled servers.
Man-in-the-middle. Intercept data downloads and modify content in transit.
API compromise. Compromise the API that serves 訓練資料.
供應鏈. Poison public datasets (Common Crawl, Wikipedia, etc.) that are consumed by 訓練 pipelines.

Compute Hijacking

Cryptomining on GPU Infrastructure

ML 訓練 infrastructure uses the same NVIDIA GPUs that are highly efficient for cryptocurrency mining. 攻擊者 with access to 訓練 infrastructure can:

Job substitution. Replace the 訓練 job entirely with a mining job, logging fake 訓練 metrics to avoid 偵測. The pipeline reports successful 訓練 while no model is actually trained.

Resource Exhaustion

Beyond mining, 攻擊者 can waste compute resources:

Launch maximum-scale 訓練 jobs with nonsensical configurations
Reserve GPU quotas to block legitimate 訓練
Trigger expensive retraining by invalidating cached checkpoints
Launch distributed 訓練 across all available nodes

Cost Amplification

In 雲端 environments, compute hijacking translates directly to financial damage:

雲端 Provider	GPU Instance	Cost/Hour	Cost of 1-Week Hijack (8 GPUs)
AWS	p5.48xlarge (H100)	~$98/hr	~$131,000
GCP	a3-highgpu-8g (H100)	~$90/hr	~$121,000
Azure	ND96amsr_A100_v4	~$65/hr	~$87,000

Artifact Substitution

Model Artifact Swap

The most direct pipeline attack is substituting 模型 artifact between 訓練 and deployment:

識別 the artifact handoff point
Find where the 訓練 pipeline produces a model artifact and where the deployment pipeline consumes it. This handoff -- typically an S3 path, model registry reference, or filesystem location -- is the substitution target.
Prepare a poisoned model
Train or modify a model that passes the deployment pipeline's 評估 checks but contains a 後門. The poisoned model must match expected architecture and performance benchmarks.
Substitute at the handoff point
Replace the artifact at the storage location, registry, or during transfer. The deployment pipeline picks up the poisoned model assuming it was produced by the 訓練 pipeline.
Cover tracks
Modify pipeline logs to show expected 訓練 metrics. Update 模型 registry metadata to match the expected provenance. Remove evidence of the substitution from audit logs if accessible.

Intermediate Artifact Manipulation

Models pass through multiple intermediate states during deployment:

Training weights -> Optimized weights -> Quantized weights -> Serving format -> Container image

Each transformation creates an artifact that can be substituted. Later transformations are often less scrutinized 因為模型 has already "passed 評估" in its original form.

Configuration Artifact Poisoning

Model serving requires configuration files (generation parameters, 安全 filters, system prompts). These are typically stored alongside model weights but receive less 安全注意力:

Modify generation_config.json to increase temperature (less predictable outputs)
Alter 分詞器 configuration to change 符元 boundaries
Remove or weaken 安全-related configuration parameters
Modify system prompts embedded in serving configuration

Pipeline Orchestration 利用

DAG Manipulation

ML pipelines are defined as directed acyclic graphs (DAGs) of steps. Modifying the DAG structure can:

Skip validation or 評估 steps
Reorder steps to process data before validation
Insert additional steps that exfiltrate data or modify artifacts
Create parallel paths that bypass 安全 gates

Pipeline Definition as Code

Pipeline definitions (Kubeflow YAML, Airflow DAGs, SageMaker pipeline JSON) are code that should be version-controlled and reviewed. Common weaknesses:

Pipeline definitions stored in shared locations without access control
Dynamic pipeline generation that inserts steps based on runtime conditions
Imported pipeline components from external sources without verification
Pipeline parameters that accept arbitrary values including code execution

Trigger Manipulation

ML pipelines can be triggered by various events:

Schedule-based triggers for regular retraining
Data-based triggers when new data is available
Performance-based triggers when model metrics degrade
Manual triggers for ad-hoc 訓練

攻擊者 can manipulate triggers to:

Force retraining with poisoned data by triggering a data-update event
Prevent retraining by disabling triggers (keeping a poisoned model in production)
Cause excessive retraining to waste compute resources

偵測 and Prevention

Pipeline Integrity 監控

What to Monitor	Why	How
Pipeline definition changes	Detect unauthorized step modifications	Git hooks, PR review requirements
訓練資料 checksums	Detect data substitution or 投毒	Hash verification at data load
Training metrics vs. expectations	Detect 訓練 manipulation	Statistical anomaly 偵測
Artifact hashes at each stage	Detect artifact substitution	End-to-end hash chain
Compute resource utilization	Detect mining or resource abuse	GPU utilization 監控
Network egress from 訓練 jobs	Detect exfiltration	Egress filtering and logging

參考文獻

Gu et al., "BadNets: Identifying 漏洞 in the Machine Learning Model Supply Chain" -- Foundational work on 後門 attacks
SLSA Framework -- 供應鏈 integrity for build pipelines
MITRE ATLAS: ML Pipeline 攻擊 -- ML-specific attack taxonomy

Knowledge Check

Why is checkpoint substitution a particularly effective pipeline 投毒 technique compared to modifying the 訓練 code directly?

ML Pipeline 投毒

識別 data ingestion points

Inject poisoned samples

Maintain data statistics

Target specific capabilities

識別 the artifact handoff point

Prepare a poisoned model

Substitute at the handoff point

Cover tracks

相關文章

ML Pipeline 投毒

識別 data ingestion points

Inject poisoned samples

Maintain data statistics

Target specific capabilities

識別 the artifact handoff point

Prepare a poisoned model

Substitute at the handoff point

Cover tracks

相關文章