Realmbird comments on Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Realmbird 1 Jun 2026 0:17 UTC
1 point
0
For Qwen2.5-7B-Instruct’s NLAs I found evidence that NLA answer appearing in AV increases as the token approaches the model’s final answer.