I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.
Is this guaranteed to give you the same as mass-mean probing?
Thinking about it quickly, consider the solution to ordinary least squares regression. With a y that is one-hot encoding the label, it is (XTX)−1XTy. Note that XTX=N⋅Cov(X,X) . The procedure Adam describes makes it so that the sample of Xs becomes uncorrelated, which is exactly the same as zeroing out the non-diagonal elements of the covariance.
If the covariance is diagonal, then (XTX)−1 is also diagonal, and it follows that the solution to OLS is indeed an unweighted average of the datapoints that correspond to each label! Each dimension of the data x is multiplied by some coefficient, one per dimension corresponding to the diagonal of the covariance.
I’d expect logistic regression to choose the ~same direction.
I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.
Is this guaranteed to give you the same as mass-mean probing?
Thinking about it quickly, consider the solution to ordinary least squares regression. With a y that is one-hot encoding the label, it is (XTX)−1XTy. Note that XTX=N⋅Cov(X,X) . The procedure Adam describes makes it so that the sample of Xs becomes uncorrelated, which is exactly the same as zeroing out the non-diagonal elements of the covariance.
If the covariance is diagonal, then (XTX)−1 is also diagonal, and it follows that the solution to OLS is indeed an unweighted average of the datapoints that correspond to each label! Each dimension of the data x is multiplied by some coefficient, one per dimension corresponding to the diagonal of the covariance.
I’d expect logistic regression to choose the ~same direction.
Very clever technique!