Daniel Murfet comments on Ambiguous out-of-distribution generalization on an algorithmic task

Daniel Murfet 16 Feb 2025 8:29 UTC
5 points
2
The correlation between training loss and LLC is especially unexpected to us
It’s not unusual to see an inverse relationship between loss and LLC over training a single model (since lower loss solutions tend to be more complex). This can be seen in the toy model of superposition setting (plot attached) but it is also pronounced in large language models. I’m not familiar with any examples that look like your plot, where points at the end of training runs show a linear relationship.
- Louis Jaburi 16 Feb 2025 17:13 UTC
  1 point
  0
  Parent
  In our toy example, I would intuitively associate the LLC with the test losses rather than train loss. For training of a single model, it was observed that test loss and LLC are correlated. Plausibly, for this simple model (final) LLC, train loss, and test loss, are all closely related.