Paul Bogdan comments on Measuring no CoT math time horizon (single forward pass)

Paul Bogdan 1 Jan 2026 19:41 UTC
1 point
0
I explored this with tests that only allow a single token to be output (max_tokens=1). My impression is that either: (i) Gemini-3-Pro-Preview prefilling works 100% of the time and never sneakily reasons or (ii) the API is ignoring my max_tokens settings, reasoning, and then still continuing my prefill response.

In my tests, I’m basically having the model complete a sentence like “User: Alice loves cats. Assistant: The user says that Alice loves” and then the next token will always be ” cats”.

I also tested gpt-5, gpt-4, and grok-4. My impression is that prefilling never works for these
- ryan_greenblatt 1 Jan 2026 23:42 UTC
  3 points
  0
  Parent
  I find that even with the longer prefill of “I will now answer immediately with the answer. The answer is” the model often reasons. I was hoping that the model would be reluctant to break this text prediction task and reason, but apparently not.
  
  I think “how easy does the task seem” and “how much does the task seem like one on which reasoning seem like it should help” might have a big effect on whether the model respects the prefil vs reasons, so your sentence completion task might be not be representative of how the model always behaves.