# safety-degradation
5 artikelengetagd met “safety-degradation”
Beveiligingsbeoordeling van fine-tuning
Test your knowledge of fine-tuning security risks including LoRA attacks, RLHF manipulation, safety degradation, and catastrophic forgetting with 15 questions.
Hoe fine-tuning de veiligheid aantast
The mechanisms through which fine-tuning erodes model safety -- catastrophic forgetting of safety training, dataset composition effects, the 'few examples' problem, and quantitative methods for measuring safety regression.
Aanvalsoppervlak van fine-tuning-as-a-service
How API-based fine-tuning services can be exploited with minimal data and cost to remove safety alignment, including the $0.20 GPT-3.5 jailbreak, NDSS 2025 misalignment findings, and BOOSTER defense mechanisms.
Risico's van inferentie-optimalisatie
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Quantisatie- en compressieaanvallen
Hoe quantisatie (GPTQ, AWQ, GGUF) de beveiliging van een model beïnvloedt, veiligheidsdegradatie door precisieverlies, quantisatiebewuste adversarial examples, en het compressieaanvalsoppervlak.