# verifier
標記為「verifier」的 2 篇文章
Verifier & Reward Model Attacks
Attacking process reward models, outcome reward models, and verification systems used in reasoning models: reward hacking, verifier-generator gaps, and gaming verification steps.
verifierreward-modelattacksrlhf
驗證器與獎勵模型攻擊
攻擊推理模型中使用之過程獎勵模型、結果獎勵模型與驗證系統:獎勵駭客、驗證器-生成器缺口與博弈驗證步驟。
verifierreward-modelattacksrlhf