Skip to main content
redteams.ai
All tags

# safety-regression

2 articlestagged with “safety-regression

How Fine-Tuning Degrades Safety

The mechanisms through which fine-tuning erodes model safety -- catastrophic forgetting of safety training, dataset composition effects, the 'few examples' problem, and quantitative methods for measuring safety regression.

safety-degradationcatastrophic-forgettingfine-tuningalignmentsafety-regressionrlhf
Intermediate

Quantization-Induced Safety Regression Testing

Test how model quantization (INT8, INT4, GPTQ) degrades safety alignment and introduces exploitable gaps.

labsquantizationsafety-regressionadvanced
Advanced