Adrià Garriga-alonso comments on StefanHex’s Shortform

Adrià Garriga-alonso 15 Sep 2025 19:49 UTC
2 points
0
Is this guaranteed to give you the same as mass-mean probing?

Thinking about it quickly, consider the solution to ordinary least squares regression. With a y that is one-hot encoding the label, it is $(X^{T} X)^{- 1} X^{T} y$ . Note that $X^{T} X = N \cdot Cov (X, X)$ . The procedure Adam describes makes it so that the sample of Xs becomes uncorrelated, which is exactly the same as zeroing out the non-diagonal elements of the covariance.

If the covariance is diagonal, then $(X^{T} X)^{- 1}$ is also diagonal, and it follows that the solution to OLS is indeed an unweighted average of the datapoints that correspond to each label! Each dimension of the data x is multiplied by some coefficient, one per dimension corresponding to the diagonal of the covariance.

I’d expect logistic regression to choose the ~same direction.

Very clever technique!