For training probes on a labelled dataset, you should train a probe for each layer and then pick whichever probe has the best training loss. Better yet, use a hold-out dataset, if you have enough data. When we did this on llama-3.3-70b, the best probe was layer 22⁄80.
Also, instead of probing only the last token, I think it’s better to probe every token and average the scores. This is because the scores are pretty noisy.
+1
For training probes on a labelled dataset, you should train a probe for each layer and then pick whichever probe has the best training loss. Better yet, use a hold-out dataset, if you have enough data. When we did this on llama-3.3-70b, the best probe was layer 22⁄80.
Also, instead of probing only the last token, I think it’s better to probe every token and average the scores. This is because the scores are pretty noisy.