That is, “flatness” in the loss landscape is about how many nearby-in-parameterspace models achieve similar loss, and you can get that by error-correction, not just by using fewer parameters (such that it takes fewer bits of evidence to find that setting)? Cool!
It seems that using SLT one could give a generally correct treatment of MDL. However, until such results are established
That is, “flatness” in the loss landscape is about how many nearby-in-parameterspace models achieve similar loss, and you can get that by error-correction, not just by using fewer parameters (such that it takes fewer bits of evidence to find that setting)? Cool!
It looks like the author contributed to achieving this in October 2025′s “Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory”?
Right on both counts!