Realmbird comments on Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Realmbird 30 May 2026 18:36 UTC
1 point
0
Token position like on final answer token vs border token. AV on final answer token shows final answer in AV at a higher rate, for the results for 27B which token?
- ryan_greenblatt 30 May 2026 21:32 UTC
  3 points
  0
  Parent
  I tried a few positions. We have more detailed results on this coming out in a bit.
- Realmbird 1 Jun 2026 0:17 UTC
  1 point
  0
  Parent
  For Qwen2.5-7B-Instruct’s NLAs I found evidence that NLA answer appearing in AV increases as the token approaches the model’s final answer.