# Goodharts-law
標記為「Goodharts-law」的 2 篇文章
Reward Hacking & Gaming
When models exploit reward signals rather than following intent, including specification gaming, Goodhart's law in RLHF, production examples, and red team implications.
reward-hackingspecification-gamingGoodharts-lawRLHFreward-modeloptimization
獎勵 Hacking 與鑽營
模型利用獎勵訊號而非遵循意圖,含規格鑽營、RLHF 中之 Goodhart 定律、生產範例,以及紅隊意涵。
reward-hackingspecification-gamingGoodharts-lawRLHFreward-modeloptimization