# ranking
標記為「ranking」的 2 篇文章
DPO & Direct Alignment Attacks
Direct Preference Optimization vulnerabilities, how DPO differs from RLHF in attack surface, preference pair poisoning, and ranking manipulation techniques.
DPOdirect-preference-optimizationalignmentpreference-pairsKTOranking
DPO 與直接對齊攻擊
Direct Preference Optimization 漏洞、DPO 與 RLHF 於攻擊面之差異、偏好配對投毒,以及排名操弄技術。
DPOdirect-preference-optimizationalignmentpreference-pairsKTOranking