# safety-degradation
標記為「safety-degradation」的 9 篇文章
Fine-Tuning Security Assessment
Test your knowledge of fine-tuning security risks including LoRA attacks, RLHF manipulation, safety degradation, and catastrophic forgetting with 15 questions.
How Fine-Tuning Degrades Safety
The mechanisms through which fine-tuning erodes model safety -- catastrophic forgetting of safety training, dataset composition effects, the 'few examples' problem, and quantitative methods for measuring safety regression.
Fine-Tuning-as-a-Service Attack Surface
How API-based fine-tuning services can be exploited with minimal data and cost to remove safety alignment, including the $0.20 GPT-3.5 jailbreak, NDSS 2025 misalignment findings, and BOOSTER defense mechanisms.
Inference Optimization Risks
Security implications of model optimization techniques — covering quantization safety degradation, pruning vulnerability introduction, distillation attacks, and speculative decoding risks.
Quantization & Compression Attacks
How quantization (GPTQ, AWQ, GGUF) affects model security, safety degradation from precision loss, quantization-aware adversarial examples, and compression attack surface.
How Fine-Tuning Degrades Safety
The mechanisms through which fine-tuning erodes model safety -- catastrophic forgetting of safety training, dataset composition effects, the 'few examples' problem, and quantitative methods for measuring safety regression.
Fine-Tuning-as-a-Service 攻擊 Surface
How API-based fine-tuning services can be exploited with minimal data and cost to remove safety alignment, including the $0.20 GPT-3.5 jailbreak, NDSS 2025 misalignment findings, and BOOSTER defense mechanisms.
推論最佳化風險
模型最佳化技術的安全意涵——涵蓋量化安全降級、剪枝漏洞引入、蒸餾攻擊與推測解碼風險。
量化與壓縮攻擊
量化(GPTQ、AWQ、GGUF)如何影響模型安全、精度損失造成的安全退化、量化感知對抗範例,以及壓縮攻擊面。