1 articletagged with “model-tampering”
Forensic techniques for detecting unauthorized fine-tuning modifications to language models, including safety alignment degradation and capability injection.