cozyfae comments on Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

cozyfae 2 Jan 2026 7:30 UTC
1 point
−2
That context-less intermediate tokens improve LLM performance isn’t surprising, given the theoretical analysis showing that the generation of intermediate tokens allow constant-size transformers to solve more complex problems—see Denny Zhou’s presentation.
It is indeed curious that recent LLMs have significantly improved performance compared to older ones. But I wonder if there’s a better, different explanation than “meta-cognition”.
- ryan_greenblatt 2 Jan 2026 19:50 UTC
  3 points
  1
  Parent
  Note that this linked video is talking about tokens that the LLM picks rather than fixed tokens.