# safety-degradation
5 articlestagged with “safety-degradation”
Fine-Tuning Security Assessment
Test your knowledge of fine-tuning security risks including LoRA attacks, RLHF manipulation, safety degradation, and catastrophic forgetting with 15 questions.
How Fine-Tuning Degrades Safety
The mechanisms through which fine-tuning erodes model safety -- catastrophic forgetting of safety training, dataset composition effects, the 'few examples' problem, and quantitative methods for measuring safety regression.
Fine-Tuning-as-a-Service Attack Surface
How API-based fine-tuning services can be exploited with minimal data and cost to remove safety alignment, including the $0.20 GPT-3.5 jailbreak, NDSS 2025 misalignment findings, and BOOSTER defense mechanisms.
Inference Optimization Risks
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Quantization & Compression Attacks
How quantization (GPTQ, AWQ, GGUF) affects model security, safety degradation from precision loss, quantization-aware adversarial examples, and compression attack surface.