That is, “flatness” in the loss landscape is about how many nearby-in-parameterspace models achieve similar loss, and you can get that by error-correction, not just by using fewer parameters (such that it takes fewer bits of evidence to find that setting)? Cool!
It seems that using SLT one could give a generally correct treatment of MDL. However, until such results are established
It looks like the author contributed to achieving this in October 2025′s “Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory”?
This post was a useful source of intuition when I was reading about singular learning theory the other week (in order to pitch it to an algebraic geometer of my acquaintance along with gifting her a copy of If Anyone Builds It), but I feel like it “buries the lede” for why SLT is cool. (I’m way more excited about “this generalizes minimum description length to neural networks!” than “we could do developmental interpretability maybe.” De gustibus?)