These puzzles are great, thanks for making them!
Connor Kissane
Karma: 175
Sparse Autoencoders Work on Attention Layer Outputs
Attention SAEs Scale to GPT-2 Small
Code for this token filtering can be found in the appendix and the exact token list is linked.
Maybe I just missed it, but I’m not seeing this. Is the code still available?
Amazing! We found your original library super useful for our Attention SAEs research, so thanks for making this!