The original SAE is actually quite good, and, in my experiments with Gated SAEs, I’m using those values. For the purposes of framing this technique as a “regularization” technique, I needed to show that the model weights themselves are affected, which is why my graphs use metrics extracted from freshly trained SAE values.
The original SAE is actually quite good, and, in my experiments with Gated SAEs, I’m using those values. For the purposes of framing this technique as a “regularization” technique, I needed to show that the model weights themselves are affected, which is why my graphs use metrics extracted from freshly trained SAE values.