Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
metawrong comments on
Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
metawrong
8 May 2026 5:47 UTC
4
points
−2
Can we do an NLA on the activation verbalizer (AV) ? NLAs all the way down!
Back to top
Can we do an NLA on the activation verbalizer (AV) ? NLAs all the way down!