Mis-Understandings comments on On Recent Results in LLM Latent Reasoning

Mis-Understandings 1 Apr 2025 0:45 UTC
2 points
1
forward pass (e.g. the residual stream) has to be deleted, outputting only a single token.
Does not actually happen.
What it is that the new token is now at the root of the attention structure, and can pass information from the final layers to the first layers inferencing the next token.
The residuals are translation independent, and are cached for further inference in autoregressive mode.
- Knight Lee 1 Apr 2025 0:51 UTC
  1 point
  0
  Parent
  Thank you. Just earlier I was asking an AI whether my comment was reasonable and it told me something similar.