This post was a useful source of intuition when I was reading about singular learning theory the other week (in order to pitch it to an algebraic geometer of my acquaintance along with gifting her a copy of If Anyone Builds It), but I feel like it “buries the lede” for why SLT is cool. (I’m way more excited about “this generalizes minimum description length to neural networks!” than “we could do developmental interpretability maybe.” De gustibus?)
This post was a useful source of intuition when I was reading about singular learning theory the other week (in order to pitch it to an algebraic geometer of my acquaintance along with gifting her a copy of If Anyone Builds It), but I feel like it “buries the lede” for why SLT is cool. (I’m way more excited about “this generalizes minimum description length to neural networks!” than “we could do developmental interpretability maybe.” De gustibus?)