# reward
標記為「reward」的 4 篇文章
Specification Gaming in AI Systems
Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.
frontier-researchspecification-gamingrewardresearch
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
analysisadvancedlabrewardlabsmodel
Specification Gaming in AI Systems
Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.
frontier-researchspecification-gamingrewardresearch
Reward 模型 Analysis 實驗室
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
analysisadvancedlabrewardlabsmodel