This is really cool! How much computational burden does this add compared to training without the SAEs?
I could possibly get access to an H100 node at my school’s HPC to try this on GPT-2 small.
This adds quite a bit more. Code here if you’re interested in taking a look at what I tried: https://github.com/peterlai/gpt-circuits/blob/main/experiments/regularization/train.py. My goal was to show that regularization is possible and to spark more interest in this general approach. Matthew Chen and @JoshEngels just released a paper describing a more practical approach that I hope to try out soon: https://x.com/match_ten/status/1886478581423071478. Where there exists a gap, imo, is with having the SAE features and model weights inform each other without needing to freeze one at a time.
Thanks, I’ll take a look!
This is really cool! How much computational burden does this add compared to training without the SAEs?
I could possibly get access to an H100 node at my school’s HPC to try this on GPT-2 small.
This adds quite a bit more. Code here if you’re interested in taking a look at what I tried: https://github.com/peterlai/gpt-circuits/blob/main/experiments/regularization/train.py. My goal was to show that regularization is possible and to spark more interest in this general approach. Matthew Chen and @JoshEngels just released a paper describing a more practical approach that I hope to try out soon: https://x.com/match_ten/status/1886478581423071478. Where there exists a gap, imo, is with having the SAE features and model weights inform each other without needing to freeze one at a time.
Thanks, I’ll take a look!