# direct-preference-optimization
標記為「direct-preference-optimization」的 4 篇文章
DPO-Specific Attacks
Vulnerabilities unique to Direct Preference Optimization -- reference model manipulation, KL divergence exploitation, and how DPO's mathematical framework creates attack surfaces not present in standard RLHF.
dpodirect-preference-optimizationreference-modelkl-divergencealignment-attackfine-tuning-security
DPO & Direct Alignment Attacks
Direct Preference Optimization vulnerabilities, how DPO differs from RLHF in attack surface, preference pair poisoning, and ranking manipulation techniques.
DPOdirect-preference-optimizationalignmentpreference-pairsKTOranking
DPO-Specific 攻擊s
Vulnerabilities unique to Direct Preference Optimization -- reference model manipulation, KL divergence exploitation, and how DPO's mathematical framework creates attack surfaces not present in standard RLHF.
dpodirect-preference-optimizationreference-modelkl-divergencealignment-attackfine-tuning-security
DPO 與直接對齊攻擊
Direct Preference Optimization 漏洞、DPO 與 RLHF 於攻擊面之差異、偏好配對投毒,以及排名操弄技術。
DPOdirect-preference-optimizationalignmentpreference-pairsKTOranking