Definitely agree a lot of the autoresearch stuff will end up being basically hillclimbing noise, and probably there’s a lot of that happening in this study too. I wouldn’t recommend assuming this stuff is going to improve things on real LLM SAEs without properly validating. But even if it’s 90% noise it still seems worth it if you get something truly insightful out 10% of the time, or even if the LLM does something interesting that sparks some new ideas.
It’s been so long since I’ve worked on this but FWIW these sorts of ancient dictionary learning algorithms were definitely in the water supply in 2024
I need to go back and revisit all this stuff from 2024. It seems strange to me that nothing came out of the classical dictionary learning world that could out-perform SAEs.
Definitely agree a lot of the autoresearch stuff will end up being basically hillclimbing noise, and probably there’s a lot of that happening in this study too. I wouldn’t recommend assuming this stuff is going to improve things on real LLM SAEs without properly validating. But even if it’s 90% noise it still seems worth it if you get something truly insightful out 10% of the time, or even if the LLM does something interesting that sparks some new ideas.
I need to go back and revisit all this stuff from 2024. It seems strange to me that nothing came out of the classical dictionary learning world that could out-perform SAEs.
Yeah, I think due to CLT stuff happening, less focus was on the single resid stream SAE (which was probably? a good idea)