Charlie Steiner comments on Sparse MLP Distillation

Charlie Steiner 17 Jan 2024 8:12 UTC
5 points
0
Huh. the single neuron example failing to converge is pretty wild. It gives me this strong feeling of “the training objective we’re using for sparse autoencoders can’t be right. Clearly we’re not really asking for what we want, and are instead asking for something other than what we want.”
But thinking about it a bit more, it seems like L2 regularization should be solving exactly this problem. Maybe weight decay was below some numerical threshold?