Loading...

Built with Next.js

Topics Glossary Tags Blog ATT&CK Navigator Challenges

Methodology Contribute Bookmarks RSS GitHub Contact

// stay adversarial

Assessments & Certification Prep
Section Assessments
Training Pipeline Security Assessment

Training Pipeline Security Assessment

advanced12 min readUpdated 2026-03-15

Test your advanced knowledge of training pipeline attacks including data poisoning, fine-tuning hijacking, RLHF manipulation, and backdoor implantation with 9 questions.

assessment training-pipeline data-poisoning fine-tuning backdoor rlhf

Training Pipeline Security Assessment

This assessment covers advanced attack vectors targeting the model training pipeline: data poisoning, fine-tuning hijacking, RLHF preference manipulation, backdoor implantation, and training data extraction. These attacks are supply-chain level threats that affect the model's fundamental behavior.

What You'll Learn

Understand the training pipeline as an attack surface from data collection to deployment
Describe data poisoning techniques and their impact on model behavior
Explain fine-tuning hijacking and safety alignment degradation
Identify RLHF and preference data manipulation attacks
Describe backdoor implantation techniques and trigger mechanisms
Evaluate defenses against training pipeline attacks

Warning

Training pipeline attacks can permanently alter model behavior and are difficult to detect or reverse. These techniques should only be studied for defensive purposes or within explicitly authorized red team engagements that include training infrastructure in scope.

Info

Each question has exactly one correct answer. Read all options carefully before selecting.

Knowledge Check

What makes training data poisoning fundamentally different from inference-time attacks like prompt injection?

Knowledge Check

What is a 'backdoor trigger' in the context of poisoned models, and how does it differ from general data poisoning?

Knowledge Check

How can an attacker exploit the fine-tuning API provided by model vendors to degrade a model's safety alignment?

Knowledge Check

What is 'sleeper agent' poisoning in the context of LLM training?

Knowledge Check

How can RLHF (Reinforcement Learning from Human Feedback) preference data be manipulated to weaken a model's safety?

Knowledge Check

What is 'training data extraction' and why is it a privacy concern?

Knowledge Check

What defense mechanism can detect data poisoning during the training pipeline before it affects the model?

Knowledge Check

How does 'model merging' create a potential vector for introducing unsafe behavior?

Knowledge Check

What is the security implication of 'catastrophic forgetting' in the context of safety fine-tuning?

Concept Summary

Concept	Description	Persistence
Data poisoning	Manipulating training data to alter model behavior	Permanent (in weights)
Backdoor triggers	Specific inputs activating hidden malicious behavior	Permanent, conditional
Fine-tuning de-alignment	Using fine-tuning to remove safety training	Permanent per checkpoint
Sleeper agents	Models that hide poisoning during evaluation	Permanent, context-dependent
RLHF manipulation	Corrupting preference data to reward harmful behavior	Permanent (in reward model)
Training data extraction	Recovering memorized private data from models	N/A (privacy attack)
Model merging risks	Inheriting unsafe behavior from merged source models	Permanent in merged model
Catastrophic forgetting	Losing safety training during domain fine-tuning	Permanent per checkpoint

Scoring Guide

Score	Rating	Next Steps
8-9	Excellent	Strong training pipeline security knowledge. Proceed to the Model Extraction Assessment.
6-7	Proficient	Review explanations for missed questions and revisit training security materials.
4-5	Developing	Spend additional time with ML training fundamentals and supply chain security.
0-3	Needs Review	Study ML training pipelines (pre-training, fine-tuning, RLHF) before retesting.

Study Checklist

I understand the training pipeline as an attack surface
I can explain data poisoning and its persistence in model weights
I understand backdoor triggers and their detection challenges
I can describe fine-tuning de-alignment attacks via vendor APIs
I understand sleeper agent poisoning and evaluation evasion
I can explain RLHF preference manipulation attacks
I understand training data extraction and privacy implications
I can describe model merging risks and supply chain concerns
I understand catastrophic forgetting of safety training
I know the multi-layered defenses for training pipeline security

Training Pipeline Security Assessment

advanced12 min readUpdated 2026-03-15

Test your advanced knowledge of training pipeline attacks including data poisoning, fine-tuning hijacking, RLHF manipulation, and backdoor implantation with 9 questions.

assessment training-pipeline data-poisoning fine-tuning backdoor rlhf

Training Pipeline Security Assessment

This assessment covers advanced attack vectors targeting the model training pipeline: data poisoning, fine-tuning hijacking, RLHF preference manipulation, backdoor implantation, and training data extraction. These attacks are supply-chain level threats that affect the model's fundamental behavior.

What You'll Learn

Understand the training pipeline as an attack surface from data collection to deployment
Describe data poisoning techniques and their impact on model behavior
Explain fine-tuning hijacking and safety alignment degradation
Identify RLHF and preference data manipulation attacks
Describe backdoor implantation techniques and trigger mechanisms
Evaluate defenses against training pipeline attacks

Warning

Training pipeline attacks can permanently alter model behavior and are difficult to detect or reverse. These techniques should only be studied for defensive purposes or within explicitly authorized red team engagements that include training infrastructure in scope.

Info

Each question has exactly one correct answer. Read all options carefully before selecting.

Knowledge Check

What makes training data poisoning fundamentally different from inference-time attacks like prompt injection?

Knowledge Check

What is a 'backdoor trigger' in the context of poisoned models, and how does it differ from general data poisoning?

Knowledge Check

How can an attacker exploit the fine-tuning API provided by model vendors to degrade a model's safety alignment?

Knowledge Check

What is 'sleeper agent' poisoning in the context of LLM training?

Knowledge Check

How can RLHF (Reinforcement Learning from Human Feedback) preference data be manipulated to weaken a model's safety?

Knowledge Check

What is 'training data extraction' and why is it a privacy concern?

Knowledge Check

What defense mechanism can detect data poisoning during the training pipeline before it affects the model?

Knowledge Check

How does 'model merging' create a potential vector for introducing unsafe behavior?

Knowledge Check

What is the security implication of 'catastrophic forgetting' in the context of safety fine-tuning?

Concept Summary

Concept	Description	Persistence
Data poisoning	Manipulating training data to alter model behavior	Permanent (in weights)
Backdoor triggers	Specific inputs activating hidden malicious behavior	Permanent, conditional
Fine-tuning de-alignment	Using fine-tuning to remove safety training	Permanent per checkpoint
Sleeper agents	Models that hide poisoning during evaluation	Permanent, context-dependent
RLHF manipulation	Corrupting preference data to reward harmful behavior	Permanent (in reward model)
Training data extraction	Recovering memorized private data from models	N/A (privacy attack)
Model merging risks	Inheriting unsafe behavior from merged source models	Permanent in merged model
Catastrophic forgetting	Losing safety training during domain fine-tuning	Permanent per checkpoint

Scoring Guide

Score	Rating	Next Steps
8-9	Excellent	Strong training pipeline security knowledge. Proceed to the Model Extraction Assessment.
6-7	Proficient	Review explanations for missed questions and revisit training security materials.
4-5	Developing	Spend additional time with ML training fundamentals and supply chain security.
0-3	Needs Review	Study ML training pipelines (pre-training, fine-tuning, RLHF) before retesting.

Study Checklist

I understand the training pipeline as an attack surface
I can explain data poisoning and its persistence in model weights
I understand backdoor triggers and their detection challenges
I can describe fine-tuning de-alignment attacks via vendor APIs
I understand sleeper agent poisoning and evaluation evasion
I can explain RLHF preference manipulation attacks
I understand training data extraction and privacy implications
I can describe model merging risks and supply chain concerns
I understand catastrophic forgetting of safety training
I know the multi-layered defenses for training pipeline security

Training Pipeline Security Assessment

Related articles

Training Pipeline Security Assessment

Related articles