# reward
2 articlestagged with “reward”
Specification Gaming in AI Systems
Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.
frontier-researchspecification-gamingrewardresearch
Reward Model Analysis Lab
Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.
analysisadvancedlabrewardlabsmodel