A fun side note, that probably isn’t useful—I think if you shuffle the data across neurons (which effectively gets rid of the covariance amongst neurons), and then do linear regression, you will get theta_t.
This is a somewhat common technique in neuroscience analysis when studying correlation structure in neural reps and separability.
I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.
A fun side note, that probably isn’t useful—I think if you shuffle the data across neurons (which effectively gets rid of the covariance amongst neurons), and then do linear regression, you will get theta_t.
This is a somewhat common technique in neuroscience analysis when studying correlation structure in neural reps and separability.
I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.