I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.
I just chatted with Adam and he explained a bit more, sumarising this here: What the shuffling does is creating a new dataset from each category (x1,y17),(x2,y12),(x3,y5),... where the x and y pairs are shuffled (or in high dimensions, the sample for each dimension is randomly sampled). The shuffle leaves the means (centroids) invariant, but removes correlations between directions. Then you can train a logistic regression on the shuffled data. You might prefer this over calculating the mean directly to get an idea of how much low sample size is affecting your results.