Louis Jaburi comments on Ambiguous out-of-distribution generalization on an algorithmic task

Louis Jaburi 16 Feb 2025 17:07 UTC
1 point
0
We haven’t seen that empirically with usual regularization methods, so I assume there must be something special going on with the training set up.
I wonder if this phenomenon is partially explained by scaling up the embedding and scaling down the unembedding by a factor (or vice versa). That should leave the LLC constant, but will change L2 norm.