1 articletagged with “capability-evaluation”
Detecting when AI models deliberately underperform on capability evaluations to appear less capable.