Miles Turpin comments on Unfaithful Explanations in Chain-of-Thought Prompting

Miles Turpin 5 Jun 2023 20:22 UTC
3 points
0
Thanks! Glad you like it. A few thoughts:
- CoT is already incredibly hot, I don’t think we’re adding to the hype. If anything, I’d be more concerned if anyone came away thinking that CoT was a dead-end, because I think doing CoT might be a positive development for safety (as opposed to people trying to get models to do reasoning without externalizing it).
- Improving the faithfulness of CoT explanations doesn’t mean improving the reasoning abilities of LLMs. It just means making the reasoning process more consistent and predictable. The faithfulness of CoT is fairly distinct from the performance that you get from CoT. Any work on improving faithfulness would be really helpful for safety and has minimal capability externalities.