This might be a dumb question, but did you try anything like changing the prompt from:
…After the problem, there will be filler tokens (counting from 1 to {N}) to give you extra space to process the problem before answering.…
to:
…After the problem, there will be distractor tokens (counting from 1 to {N}) to give you extra space to forget the problem before answering.…
I’m asking because AFAICT the results can be explained by EITHER your hypothesis (the extra tokens allow more space / capacity for computation during a forward pass) OR an alternate hypothesis more like “the LLM interprets this as more of a situation where the correct answer is expected” or whatever, i.e. normal sensitivity of LLMs to details of their prompt.
(Not that I have anything against the first hypothesis! Just curious.)
Recall that without filler, Opus 4.5 performance is 45.2%. I tried the following experiments on Opus 4.5 with filler counting to 300:
Default (what I do by default in the blog post above): 51.1%
Remove the text explaining filler tokens (as in, cut “After the problem …”): 50.4%
Use “After the problem, there will be distractor tokens (counting from 1 to {filler_tokens})”: 51.1%
Don’t actually use filler tokens, but include in the prompt “After the problem, there will be filler tokens (counting from 1 to 300) to give you extra space to process the problem before answering” (as in, this is just a lie, we don’t give filler): 45.8%
So, it seems like the framing doesn’t matter ~at all and actually having the filler tokens is the key thing (at least for Opus 4.5, though I strongly expect this would reproduce for Opus 4, Sonnet 4).
Please try not to lie to the models. You can truthfully say “After the problem, there will be a [p]% chance of filler tokens (counting from 1 to 300) to give you extra space to process the problem before answering.” and do observational statistics.
Note that repeating the problem X times also works (and yields similar performance increase to filler tokens given the optimal number of repeats/filler). Also yields similar boost across different types of filler (which you’d naively result in different suggestion etc.
I can quickly test this though, will run in one sec.
This might be a dumb question, but did you try anything like changing the prompt from:
to:
I’m asking because AFAICT the results can be explained by EITHER your hypothesis (the extra tokens allow more space / capacity for computation during a forward pass) OR an alternate hypothesis more like “the LLM interprets this as more of a situation where the correct answer is expected” or whatever, i.e. normal sensitivity of LLMs to details of their prompt.
(Not that I have anything against the first hypothesis! Just curious.)
Recall that without filler, Opus 4.5 performance is 45.2%. I tried the following experiments on Opus 4.5 with filler counting to 300:
Default (what I do by default in the blog post above): 51.1%
Remove the text explaining filler tokens (as in, cut “After the problem …”): 50.4%
Use “After the problem, there will be distractor tokens (counting from 1 to {filler_tokens})”: 51.1%
Don’t actually use filler tokens, but include in the prompt “After the problem, there will be filler tokens (counting from 1 to 300) to give you extra space to process the problem before answering” (as in, this is just a lie, we don’t give filler): 45.8%
So, it seems like the framing doesn’t matter ~at all and actually having the filler tokens is the key thing (at least for Opus 4.5, though I strongly expect this would reproduce for Opus 4, Sonnet 4).
Please try not to lie to the models. You can truthfully say “After the problem, there will be a [p]% chance of filler tokens (counting from 1 to 300) to give you extra space to process the problem before answering.” and do observational statistics.
Note that repeating the problem X times also works (and yields similar performance increase to filler tokens given the optimal number of repeats/filler). Also yields similar boost across different types of filler (which you’d naively result in different suggestion etc.
I can quickly test this though, will run in one sec.