1. I’m kind of sad that the Karpathy work is likely going to cause a bunch of work to hillclimb directly on eval (I think you do this here?). This makes automated AI work sketchy IMO. In https://arxiv.org/abs/2601.11516 we note that e.g. “the large early drop in Fig. 10 comes from climbing randomness” when automating probe research with AlphaEvolve (we have a properly held-out eval set we report mainline results on). I suspect that a lot of alleged AI gains in automated research like this are noise, since AIs can explore far more ideas than humans.
2. It’s been so long since I’ve worked on this but FWIW these sorts of ancient dictionary learning algorithms were definitely in the water supply in 2024… for example here we note on our dictionary learning algorithm that a “possible application is actually replacing the encoder at test time, to increase the loss recovered of the sparse decomposition” and here we even used FISTA
Definitely agree a lot of the autoresearch stuff will end up being basically hillclimbing noise, and probably there’s a lot of that happening in this study too. I wouldn’t recommend assuming this stuff is going to improve things on real LLM SAEs without properly validating. But even if it’s 90% noise it still seems worth it if you get something truly insightful out 10% of the time, or even if the LLM does something interesting that sparks some new ideas.
It’s been so long since I’ve worked on this but FWIW these sorts of ancient dictionary learning algorithms were definitely in the water supply in 2024
I need to go back and revisit all this stuff from 2024. It seems strange to me that nothing came out of the classical dictionary learning world that could out-perform SAEs.
Cool case study!
1. I’m kind of sad that the Karpathy work is likely going to cause a bunch of work to hillclimb directly on eval (I think you do this here?). This makes automated AI work sketchy IMO. In https://arxiv.org/abs/2601.11516 we note that e.g. “the
large early drop in Fig. 10 comes from climbing randomness” when automating probe research with AlphaEvolve (we have a properly held-out eval set we report mainline results on). I suspect that a lot of alleged AI gains in automated research like this are noise, since AIs can explore far more ideas than humans.
Note that in https://xcancel.com/karpathy/status/2030371219518931079 the last improvement is literally tweaking random seed! :/
2. It’s been so long since I’ve worked on this but FWIW these sorts of ancient dictionary learning algorithms were definitely in the water supply in 2024… for example here we note on our dictionary learning algorithm that a “possible application is actually replacing the encoder at test time, to increase the loss recovered of the sparse decomposition” and here we even used FISTA
Definitely agree a lot of the autoresearch stuff will end up being basically hillclimbing noise, and probably there’s a lot of that happening in this study too. I wouldn’t recommend assuming this stuff is going to improve things on real LLM SAEs without properly validating. But even if it’s 90% noise it still seems worth it if you get something truly insightful out 10% of the time, or even if the LLM does something interesting that sparks some new ideas.
I need to go back and revisit all this stuff from 2024. It seems strange to me that nothing came out of the classical dictionary learning world that could out-perform SAEs.
Yeah, I think due to CLT stuff happening, less focus was on the single resid stream SAE (which was probably? a good idea)