Trevor Hill-Hand comments on AI CoT Reasoning Is Often Unfaithful

Trevor Hill-Hand 4 Apr 2025 17:59 UTC
8 points
2
I’ve actually noticed this in a hobby project, where I have some agents running around a little MOO-like text world and talking to each other. With DeepSeek-R1, just because it’s fun to watch them “think” like little characters, I noticed I see this sort of thing a lot (maybe 1-in-5-ish, though there’s a LOT of other scaffolding and stuff going on around it which could also be causing weird problems):
```
<think>
Alright I need to do this very specific thing "A" which I can see in my memories I've been trying to do for a while instead of thing B. I will do thing A, by giving the command "THING A".
</think>

THING B
```