the best vector for probing is not the best vector for steering
AKA the predict/​control discrepancy, from Section 3.3.1 of Wattenberg and Viegas, 2024
Also related to the idea that the best linear SAE encoder is not the transpose of the decoder.
AKA the predict/​control discrepancy, from Section 3.3.1 of Wattenberg and Viegas, 2024
Also related to the idea that the best linear SAE encoder is not the transpose of the decoder.