CalebMaresca comments on SAE regularization produces more interpretable models

CalebMaresca 2 Feb 2025 23:30 UTC
1 point
0
This is really cool! How much computational burden does this add compared to training without the SAEs?
I could possibly get access to an H100 node at my school’s HPC to try this on GPT-2 small.
- Peter Lai 4 Feb 2025 19:29 UTC
  1 point
  0
  Parent
  This adds quite a bit more. Code here if you’re interested in taking a look at what I tried: https://github.com/peterlai/gpt-circuits/blob/main/experiments/regularization/train.py. My goal was to show that regularization is possible and to spark more interest in this general approach. Matthew Chen and @JoshEngels just released a paper describing a more practical approach that I hope to try out soon: https://x.com/match_ten/status/1886478581423071478. Where there exists a gap, imo, is with having the SAE features and model weights inform each other without needing to freeze one at a time.
  - CalebMaresca 11 Feb 2025 15:36 UTC
    1 point
    0
    Parent
    Thanks, I’ll take a look!