Skip to main content
redteams.ai
All tags

# faithfulness

1 articletagged with “faithfulness

Chain-of-Thought Faithfulness Attacks

Exploiting unfaithful chain-of-thought reasoning where models' stated reasoning does not match their actual decision process.

chainfaithfulnessthoughtresearchfrontier
Expert