This post really should have gotten more attention than it did. I just went looking for a good intro to SAEs to share with someone who has a technical background in physics but very little ML background, and this is the best one I’ve found. Adam Karvonen’s ‘Intuitive Explanation of Sparse Autoencoders’ is also very good, but not nearly as gentle, and while ‘Toward Monosemanticity’ is phenomenal, it’s way too vast for beginners. Strong upvoted, and I hope I won’t be the last.
This post really should have gotten more attention than it did. I just went looking for a good intro to SAEs to share with someone who has a technical background in physics but very little ML background, and this is the best one I’ve found. Adam Karvonen’s ‘Intuitive Explanation of Sparse Autoencoders’ is also very good, but not nearly as gentle, and while ‘Toward Monosemanticity’ is phenomenal, it’s way too vast for beginners. Strong upvoted, and I hope I won’t be the last.