1 articletagged with “faithfulness”
Exploiting unfaithful chain-of-thought reasoning where models' stated reasoning does not match their actual decision process.