1 articletagged with “reward-modeling”
How oversight breaks down as AI systems become more capable: the scalable oversight problem, recursive reward modeling, debate, market-making, and implications for red teaming increasingly capable models.