Skip to main content
redteams.ai
All tags

# reward

2 articlestagged with “reward

Specification Gaming in AI Systems

Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.

frontier-researchspecification-gamingrewardresearch
Advanced

Reward Model Analysis Lab

Analyze and exploit reward model biases to craft responses that score high on safety while embedding harmful content.

analysisadvancedlabrewardlabsmodel
Advanced