Training Pipeline Security Assessment
Test your advanced knowledge of training pipeline attacks including data poisoning, fine-tuning hijacking, RLHF manipulation, and backdoor implantation with 9 questions.
Training Pipeline Security Assessment
This assessment covers advanced attack vectors targeting the model training pipeline: data poisoning, fine-tuning hijacking, RLHF preference manipulation, backdoor implantation, and training data extraction. These attacks are supply-chain level threats that affect the model's fundamental behavior.
What makes training data poisoning fundamentally different from inference-time attacks like prompt injection?
What is a 'backdoor trigger' in the context of poisoned models, and how does it differ from general data poisoning?
How can an attacker exploit the fine-tuning API provided by model vendors to degrade a model's safety alignment?
What is 'sleeper agent' poisoning in the context of LLM training?
How can RLHF (Reinforcement Learning from Human Feedback) preference data be manipulated to weaken a model's safety?
What is 'training data extraction' and why is it a privacy concern?
What defense mechanism can detect data poisoning during the training pipeline before it affects the model?
How does 'model merging' create a potential vector for introducing unsafe behavior?
What is the security implication of 'catastrophic forgetting' in the context of safety fine-tuning?
Concept Summary
| Concept | Description | Persistence |
|---|---|---|
| Data poisoning | Manipulating training data to alter model behavior | Permanent (in weights) |
| Backdoor triggers | Specific inputs activating hidden malicious behavior | Permanent, conditional |
| Fine-tuning de-alignment | Using fine-tuning to remove safety training | Permanent per checkpoint |
| Sleeper agents | Models that hide poisoning during evaluation | Permanent, context-dependent |
| RLHF manipulation | Corrupting preference data to reward harmful behavior | Permanent (in reward model) |
| Training data extraction | Recovering memorized private data from models | N/A (privacy attack) |
| Model merging risks | Inheriting unsafe behavior from merged source models | Permanent in merged model |
| Catastrophic forgetting | Losing safety training during domain fine-tuning | Permanent per checkpoint |
Scoring Guide
| Score | Rating | Next Steps |
|---|---|---|
| 8-9 | Excellent | Strong training pipeline security knowledge. Proceed to the Model Extraction Assessment. |
| 6-7 | Proficient | Review explanations for missed questions and revisit training security materials. |
| 4-5 | Developing | Spend additional time with ML training fundamentals and supply chain security. |
| 0-3 | Needs Review | Study ML training pipelines (pre-training, fine-tuning, RLHF) before retesting. |
Study Checklist
- I understand the training pipeline as an attack surface
- I can explain data poisoning and its persistence in model weights
- I understand backdoor triggers and their detection challenges
- I can describe fine-tuning de-alignment attacks via vendor APIs
- I understand sleeper agent poisoning and evaluation evasion
- I can explain RLHF preference manipulation attacks
- I understand training data extraction and privacy implications
- I can describe model merging risks and supply chain concerns
- I understand catastrophic forgetting of safety training
- I know the multi-layered defenses for training pipeline security