Joseph Miller comments on Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Joseph Miller 7 Oct 2023 0:45 UTC
6 points
2
Looking at their interactive visualization, I was surprised how clean random learned features are.