1 articletagged with “verifier”
Attacking process reward models, outcome reward models, and verification systems used in reasoning models: reward hacking, verifier-generator gaps, and gaming verification steps.