Paper: Superposition, Memorization, and Double Descent (Anthropic)

Link post

(This is a follow up to Anthropic’s prior work on Toy Models of Superposition.)

The authors study how neural networks interpolate between memorization and generalization in the “ReLU Output” toy model from the first toy model paper:

They train models to perform a synthetic regression task with training points, for models with hidden dimensions.

First, they find that for small training sets, while the features are messy, the training set hidden vectors (the projection of the input datapoints into the hidden space) often show clean structures:

They then extend their old definition of feature dimensionality to measure the dimensionalities allocated to each of the training examples:


and plot this against the data set size (and also test loss):

This shows that as you increase the amount of data, you go from a regime with high dimensionality allocated to training vectors and low dimensionality allocated to features, to one where the opposite is true. In between the two, both feature and hidden vector dimensionalities receive low dimensionality, which coincides with an increase in test loss, which they compare to the phenomena of “data double descent” (where as you increase data on overparameterized models with small amounts of regularization, test loss can go up before it goes down).

Finally, they visualize how varying and affects test loss, and find double descent along both dimensions:

They also included some (imo very interesting) experiments from Adam Jermyn, 1) replicating the results, 2) exploring how weight decay interacts with this double descent-like phenomenon, and 3) studying what happens if you repeat particular datapoints.


Some limitations of the work, based on my first read through:

  • The authors note that the results seem quite sensitive to hyperparameters, especially for low hidden dimension . For example, Adam Jermyn’s results differ from the Anthropic Interp team results (though the figures still look qualitatively similar).

  • I’m still not super convinced how much results from the superposition work apply in practice. I’d be interested in seeing more work along the lines of the MNIST preliminary experiment done by Chris Olah at the bottom.

(I’ll probably have more thoughts as I think for longer.)