Skip to main content
redteams.ai
All tags

# Goodharts-law

1 articletagged with “Goodharts-law

Reward Hacking & Gaming

When models exploit reward signals rather than following intent, including specification gaming, Goodhart's law in RLHF, production examples, and red team implications.

reward-hackingspecification-gamingGoodharts-lawRLHFreward-modeloptimization
Expert