# annotator
標記為「annotator」的 2 篇文章
RLHF Attack Surface Deep Dive
Reward model vulnerabilities, preference data manipulation, reward hacking by annotators or adversaries, and comparison with Constitutional AI robustness.
RLHFreward-modelpreference-dataPPOannotatoralignment
RLHF 攻擊面深入
獎勵模型漏洞、偏好資料操弄、由標註者或對手之獎勵駭入,與與憲法 AI 穩健度之比較。
RLHFreward-modelpreference-dataPPOannotatoralignment