I explored this with tests that only allow a single token to be output (max_tokens=1). My impression is that either: (i) Gemini-3-Pro-Preview prefilling works 100% of the time and never sneakily reasons or (ii) the API is ignoring my max_tokens settings, reasoning, and then still continuing my prefill response.
In my tests, I’m basically having the model complete a sentence like “User: Alice loves cats. Assistant: The user says that Alice loves” and then the next token will always be ” cats”.
I also tested gpt-5, gpt-4, and grok-4. My impression is that prefilling never works for these
I find that even with the longer prefill of “I will now answer immediately with the answer. The answer is” the model often reasons. I was hoping that the model would be reluctant to break this text prediction task and reason, but apparently not.
I think “how easy does the task seem” and “how much does the task seem like one on which reasoning seem like it should help” might have a big effect on whether the model respects the prefil vs reasons, so your sentence completion task might be not be representative of how the model always behaves.
I explored this with tests that only allow a single token to be output (max_tokens=1). My impression is that either: (i) Gemini-3-Pro-Preview prefilling works 100% of the time and never sneakily reasons or (ii) the API is ignoring my max_tokens settings, reasoning, and then still continuing my prefill response.
In my tests, I’m basically having the model complete a sentence like “User: Alice loves cats. Assistant: The user says that Alice loves” and then the next token will always be ” cats”.
I also tested gpt-5, gpt-4, and grok-4. My impression is that prefilling never works for these
I find that even with the longer prefill of “I will now answer immediately with the answer. The answer is” the model often reasons. I was hoping that the model would be reluctant to break this text prediction task and reason, but apparently not.
I think “how easy does the task seem” and “how much does the task seem like one on which reasoning seem like it should help” might have a big effect on whether the model respects the prefil vs reasons, so your sentence completion task might be not be representative of how the model always behaves.