CCS does not find the single linear probe with high accuracy: there are more than 20 orthogonal linear probes (i.e. using completely different information) that have similar accuracies as the linear probe found by CCS (for most datasets);
So what about an ensamble of the top 20 linear probes? Is it substantially better than using just the best one alone? I would expect so given that they are orthogonal, so they are using ~uncorrelated information.
I think that a (linear) ensemble of linear probes (trained with Logistic Regression) should never be better than a single linear probe (otherwise the optimizer would have just found this combined linear probe instead). Therefore, I don’t expect that ensembling 20 linear CCS probe will increase performance much (and especially not beyond the performance of supervised linear regression).
Feel free to run the experiment if you’re interested about it!
CCS does not find the single linear probe with high accuracy: there are more than 20 orthogonal linear probes (i.e. using completely different information) that have similar accuracies as the linear probe found by CCS (for most datasets);
So what about an ensamble of the top 20 linear probes? Is it substantially better than using just the best one alone? I would expect so given that they are orthogonal, so they are using ~uncorrelated information.
I think that a (linear) ensemble of linear probes (trained with Logistic Regression) should never be better than a single linear probe (otherwise the optimizer would have just found this combined linear probe instead). Therefore, I don’t expect that ensembling 20 linear CCS probe will increase performance much (and especially not beyond the performance of supervised linear regression).
Feel free to run the experiment if you’re interested about it!
It will often be better on the test set (because of averaging uncorrelated errors).