# reference-model
標記為「reference-model」的 2 篇文章
DPO-Specific Attacks
Vulnerabilities unique to Direct Preference Optimization -- reference model manipulation, KL divergence exploitation, and how DPO's mathematical framework creates attack surfaces not present in standard RLHF.
dpodirect-preference-optimizationreference-modelkl-divergencealignment-attackfine-tuning-security
DPO-Specific 攻擊s
Vulnerabilities unique to Direct Preference Optimization -- reference model manipulation, KL divergence exploitation, and how DPO's mathematical framework creates attack surfaces not present in standard RLHF.
dpodirect-preference-optimizationreference-modelkl-divergencealignment-attackfine-tuning-security