Thought this paper (published after this post) seemed relevant: Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Thought this paper (published after this post) seemed relevant: Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting