Brendan Long comments on Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

Brendan Long 22 Dec 2025 18:06 UTC
2 points
0
What do we see if we apply interpretability tools to the filler tokens or repeats of the problem? Can we use internals to better understand why this helps with the model?
Yeah, I predicted that this would be too hard for models to learn without help, so I’m really curious to see how it managed to do this. Is it using the positional information to randomize which token does which part of the parallel computation, or something else?