# unfaithful-reasoning
標記為「unfaithful-reasoning」的 2 篇文章
Unfaithful Chain-of-Thought Reasoning
Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.
unfaithful-reasoningchain-of-thoughtreasoninginterpretabilityoversightai-safety
Unfaithful Chain-of-Thought Reasoning
Analysis of unfaithful chain-of-thought reasoning in language models, where the visible reasoning trace does not accurately reflect the model's actual computational process, including detection methods, implications for oversight, and exploitation techniques.
unfaithful-reasoningchain-of-thoughtreasoninginterpretabilityoversightai-safety