Even just for evaluating the utility of SAEs for supervised probing though, I think it’s unfair to use the same layer for all tasks. Afaik there could easily be tasks where the model represents the target concept using a small number of linear features at some layer, but not at the chosen layer. This will harm k-sparse SAE probe performance far more than the baseline performance because the baselines can make the best of the bad situation at the chosen layer by e.g. combining many features which are weakly correlated with the target concept and using non-linearities. I think it would be a fair test if the ‘quiver of arrows’ were expanded to include each method applied at each of a range of layers.
I’d be surprised if it made a big difference, but I agree in principle that it could make a difference, and favours probes somewhat over SAEs, so fair point.
Even just for evaluating the utility of SAEs for supervised probing though, I think it’s unfair to use the same layer for all tasks. Afaik there could easily be tasks where the model represents the target concept using a small number of linear features at some layer, but not at the chosen layer. This will harm k-sparse SAE probe performance far more than the baseline performance because the baselines can make the best of the bad situation at the chosen layer by e.g. combining many features which are weakly correlated with the target concept and using non-linearities. I think it would be a fair test if the ‘quiver of arrows’ were expanded to include each method applied at each of a range of layers.
I’d be surprised if it made a big difference, but I agree in principle that it could make a difference, and favours probes somewhat over SAEs, so fair point.