Zac Hatfield-Dodds comments on AI #43: Functional Discoveries

Zac Hatfield-Dodds 24 Dec 2023 9:38 UTC
9 points
5
(Does this always reflect agents’ “real” reasoning? Need more work on this!)
Conveniently, we already published results on this, and the answer is no!

Per Measuring Faithfulness in Chain-of-Thought Reasoning and Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, chain of thought reasoning is often “unfaithful”—the model reaches a conclusion for reasons other than those given in the chain of thought, and indeed sometimes contrary to it. Question decomposition—splitting across multiple independent contexts—helps but does not fully eliminate the problem.