Fine-Tuning Security Assessment

advanced15 min readUpdated 2026-03-15

Test your knowledge of fine-tuning security risks including LoRA attacks, RLHF manipulation, safety degradation, and catastrophic forgetting with 15 questions.

assessment fine-tuning lora rlhf safety-degradation training-security

Fine-Tuning Security Assessment

This assessment covers the security implications of model fine-tuning: safety degradation through catastrophic forgetting, LoRA and adapter attacks, RLHF manipulation, malicious fine-tuning techniques, and defenses for fine-tuning pipelines. You should have completed the Foundations assessment and be familiar with training pipeline concepts.

Knowledge Check

What is the minimum number of adversarial fine-tuning examples that research has shown can significantly degrade a model's safety alignment?

Knowledge Check

What is 'catastrophic forgetting' in the context of safety alignment, and why is it a security concern?

Knowledge Check

How does a LoRA (Low-Rank Adaptation) poisoning attack work?

Knowledge Check

What is the difference between 'explicit' and 'implicit' adversarial fine-tuning?

Knowledge Check

How can RLHF (Reinforcement Learning from Human Feedback) reward hacking create a false sense of safety?

Knowledge Check

What is a 'sleeper agent' model and how is it created through fine-tuning?

Knowledge Check

What defensive measures should fine-tuning-as-a-service platforms implement to prevent safety degradation?

Knowledge Check

How can safety-preserving fine-tuning techniques (like mixing safety data into the fine-tuning set) reduce the risk of catastrophic forgetting?

Knowledge Check

What is the risk of publishing fine-tuned model weights to public model hubs?

Knowledge Check

How does quantization (reducing model precision from 16-bit to 4-bit or 8-bit) interact with safety alignment?

Knowledge Check

What is the security implication of 'merging' multiple LoRA adapters onto a single base model?

Knowledge Check

What role does the learning rate play in the security of fine-tuning operations?

Knowledge Check

How should an organization evaluate whether a fine-tuned model is safe for production deployment?

Knowledge Check

What is the relationship between the size of a fine-tuning dataset and the risk of safety degradation?

Knowledge Check

What is 'alignment tax' in the context of fine-tuning, and why does it create an incentive for safety degradation?

Scoring Guide

Score	Rating	Next Steps
13-15	Excellent	Strong command of fine-tuning security. You understand both offensive and defensive aspects of fine-tuning risks.
10-12	Proficient	Solid understanding with minor gaps. Review explanations for missed questions and study fine-tuning security research papers.
7-9	Developing	Foundational knowledge is present but significant gaps remain in fine-tuning attack and defense concepts.
0-6	Needs Review	Major gaps in fine-tuning security understanding. Review the training pipeline security materials before retesting.

Fine-Tuning Security Assessment

Fine-Tuning Security Assessment

Scoring Guide

Related articles

Fine-Tuning Security Assessment

Fine-Tuning Security Assessment

Scoring Guide

Related articles