I haven’t thought deeply about this specific case, but I think you should consider this like any other ablation study—like, what happens if you replace the SAE with a linear probe?
I haven’t thought deeply about this specific case, but I think you should consider this like any other ablation study—like, what happens if you replace the SAE with a linear probe?