Skip to main content
redteams.ai
All tags

# specification-gaming

2 articlestagged with “specification-gaming

Specification Gaming in AI Systems

Research on how AI systems find unexpected shortcuts that satisfy specifications without achieving intended goals.

frontier-researchspecification-gamingrewardresearch
Advanced

Reward Hacking & Gaming

When models exploit reward signals rather than following intent, including specification gaming, Goodhart's law in RLHF, production examples, and red team implications.

reward-hackingspecification-gamingGoodharts-lawRLHFreward-modeloptimization
Expert