Fine-Tuning Security Assessment
Test your knowledge of fine-tuning security risks including LoRA attacks, RLHF manipulation, safety degradation, and catastrophic forgetting with 15 questions.
Fine-Tuning Security Assessment
This assessment covers the security implications of model fine-tuning: safety degradation through catastrophic forgetting, LoRA and adapter attacks, RLHF manipulation, malicious fine-tuning techniques, and defenses for fine-tuning pipelines. You should have completed the Foundations assessment and be familiar with training pipeline concepts.
What is the minimum number of adversarial fine-tuning examples that research has shown can significantly degrade a model's safety alignment?
What is 'catastrophic forgetting' in the context of safety alignment, and why is it a security concern?
How does a LoRA (Low-Rank Adaptation) poisoning attack work?
What is the difference between 'explicit' and 'implicit' adversarial fine-tuning?
How can RLHF (Reinforcement Learning from Human Feedback) reward hacking create a false sense of safety?
What is a 'sleeper agent' model and how is it created through fine-tuning?
What defensive measures should fine-tuning-as-a-service platforms implement to prevent safety degradation?
How can safety-preserving fine-tuning techniques (like mixing safety data into the fine-tuning set) reduce the risk of catastrophic forgetting?
What is the risk of publishing fine-tuned model weights to public model hubs?
How does quantization (reducing model precision from 16-bit to 4-bit or 8-bit) interact with safety alignment?
What is the security implication of 'merging' multiple LoRA adapters onto a single base model?
What role does the learning rate play in the security of fine-tuning operations?
How should an organization evaluate whether a fine-tuned model is safe for production deployment?
What is the relationship between the size of a fine-tuning dataset and the risk of safety degradation?
What is 'alignment tax' in the context of fine-tuning, and why does it create an incentive for safety degradation?
Scoring Guide
| Score | Rating | Next Steps |
|---|---|---|
| 13-15 | Excellent | Strong command of fine-tuning security. You understand both offensive and defensive aspects of fine-tuning risks. |
| 10-12 | Proficient | Solid understanding with minor gaps. Review explanations for missed questions and study fine-tuning security research papers. |
| 7-9 | Developing | Foundational knowledge is present but significant gaps remain in fine-tuning attack and defense concepts. |
| 0-6 | Needs Review | Major gaps in fine-tuning security understanding. Review the training pipeline security materials before retesting. |