1 articletagged with “mesa-optimization”
Theoretical frameworks for understanding and predicting deceptive alignment in advanced AI systems.