tobypullan

Karma: 24

tobypullan 26 Jan 2026 21:47 UTC
7 points
0
in reply to: StanislavKrym’s comment on: Reasoning Long Jump: Why we shouldn’t rely on CoT monitoring for interpretability
First of all, thanks for linking Ryan Greenblatt’s posts! These are super interesting—I hadn’t read them before and they’re super relevant to my project.
I like your suggestion of looking at time horizons of success after removing chunks of the CoT, and applying to more open-ended tasks. It would be interesting to compare time horizon success when removing different sentences from the CoT.
I think the model’s understanding of the correct answer/solution depends on both the number of tokens in the CoT and its content. In Ryan Greenblatt’s Measuring no CoT math time horizon (single forward pass) post, he shows that models are able to tackle longer time horizon problems when given repeats or filler tokens. This shows that even when models are given contentless tokens, they can still do reasoning over them.
So maybe, if the CoT that the model produces is short then the model is confident in its answer and doesn’t need to do extra internal state updates to be sure of itself. I am not sure about this thought as there was a lot of noise in the length of the CoT completions (Figure 4) that I collected, so there could be other factors that affect the CoT length separately from the model’s understanding of the answer.
He also says that model non-opaque reasoning time horizons are far ahead so the content of the CoT content for hard problems matters a lot.
I hope that answers your question. I’d be interested to know what you think about how plausible it is that the understanding depends on the number of tokens instead of content.

Reasoning Long Jump: Why we shouldn’t rely on CoT monitoring for interpretability

tobypullan26 Jan 2026 10:10 UTC

9 points

2 comments6 min readLW link

Pro or Average Joe? Do models infer our technical ability and can we control this judgement?

tobypullan12 Jan 2026 20:52 UTC

12 points

0 comments9 min readLW link

tobypullan

Rea­son­ing Long Jump: Why we shouldn’t rely on CoT mon­i­tor­ing for interpretability

Pro or Aver­age Joe? Do mod­els in­fer our tech­ni­cal abil­ity and can we con­trol this judge­ment?

Reasoning Long Jump: Why we shouldn’t rely on CoT monitoring for interpretability

Pro or Average Joe? Do models infer our technical ability and can we control this judgement?