metawrong comments on Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

metawrong 8 May 2026 5:47 UTC
4 points
−2
Can we do an NLA on the activation verbalizer (AV) ? NLAs all the way down!