First of all, thanks for linking Ryan Greenblatt’s posts! These are super interesting—I hadn’t read them before and they’re super relevant to my project.
I like your suggestion of looking at time horizons of success after removing chunks of the CoT, and applying to more open-ended tasks. It would be interesting to compare time horizon success when removing different sentences from the CoT.
I think the model’s understanding of the correct answer/solution depends on both the number of tokens in the CoT and its content. In Ryan Greenblatt’s Measuring no CoT math time horizon (single forward pass) post, he shows that models are able to tackle longer time horizon problems when given repeats or filler tokens. This shows that even when models are given contentless tokens, they can still do reasoning over them.
So maybe, if the CoT that the model produces is short then the model is confident in its answer and doesn’t need to do extra internal state updates to be sure of itself. I am not sure about this thought as there was a lot of noise in the length of the CoT completions (Figure 4) that I collected, so there could be other factors that affect the CoT length separately from the model’s understanding of the answer.
He also says that model non-opaque reasoning time horizons are far ahead so the content of the CoT content for hard problems matters a lot.
I hope that answers your question. I’d be interested to know what you think about how plausible it is that the understanding depends on the number of tokens instead of content.
First of all, thanks for linking Ryan Greenblatt’s posts! These are super interesting—I hadn’t read them before and they’re super relevant to my project.
I like your suggestion of looking at time horizons of success after removing chunks of the CoT, and applying to more open-ended tasks. It would be interesting to compare time horizon success when removing different sentences from the CoT.
I think the model’s understanding of the correct answer/solution depends on both the number of tokens in the CoT and its content. In Ryan Greenblatt’s Measuring no CoT math time horizon (single forward pass) post, he shows that models are able to tackle longer time horizon problems when given repeats or filler tokens. This shows that even when models are given contentless tokens, they can still do reasoning over them.
So maybe, if the CoT that the model produces is short then the model is confident in its answer and doesn’t need to do extra internal state updates to be sure of itself. I am not sure about this thought as there was a lot of noise in the length of the CoT completions (Figure 4) that I collected, so there could be other factors that affect the CoT length separately from the model’s understanding of the answer.
He also says that model non-opaque reasoning time horizons are far ahead so the content of the CoT content for hard problems matters a lot.
I hope that answers your question. I’d be interested to know what you think about how plausible it is that the understanding depends on the number of tokens instead of content.