Peter Lai comments on SAE regularization produces more interpretable models

Peter Lai 31 Jan 2025 17:15 UTC
1 point
0
The original SAE is actually quite good, and, in my experiments with Gated SAEs, I’m using those values. For the purposes of framing this technique as a “regularization” technique, I needed to show that the model weights themselves are affected, which is why my graphs use metrics extracted from freshly trained SAE values.