As of 2025, I still find this post (and comments) helpful for understanding different things people mean by “CoT unfaithfulness” and why these things still largely aren’t reasons to give up on CoT monitoring for current models.
As of 2025, I still find this post (and comments) helpful for understanding different things people mean by “CoT unfaithfulness” and why these things still largely aren’t reasons to give up on CoT monitoring for current models.