“Wait, you mean many people don’t already see things this way?”
I wish this were obvious :). But it’s 2026 and we’re still getting op-eds from professors arguing that LLMs are “stochastic parrots” or “just shallow pattern-matching.” I wish I could tell laypeople “the scientific consensus is that this is wrong,” but I can’t because there isn’t one. The scaling hypothesis was a fringe position five years ago and remains controversial today. “No free lunch” and “deep learning will just overfit” were standard objections until embarrassingly recently, and you’ll still hear them occasionally.
If (a tractable approximation to) the provably optimal learning algorithm isn’t “real learning,” I don’t know what would be. Yet clearly many smart people don’t believe this. And this leads to seriously different expectations about where the field will go: if deep learning is implementing something like Solomonoff induction, the default expectation shifts toward continued scaling—not to say that scaling must work (efficiency limits, data constraints, a thousand practical issues could intervene), but because there’s no in-principle reason to expect a hard ceiling. That’s something people still dispute, loudly.
That being said, as I mention at the beginning, these ideas aren’t new or original to me, and some people do believe versions of these sorts of claims. Let me crudely break it down into a few groups:
Average ML practitioner / ICLR attendee: The working assumption is still closer to “neural networks do black-box function approximation,” maybe with vague gestures toward “feature learning” or something. They may be aware of mechanistic interpretability but haven’t integrated it into their worldview. They’re usually only dimly aware of the theoretical puzzles if at all (why doesn’t the UAT construction work? why does the same network generalize on real data and memorize random labels?).
ML theory community: Well … it depends what subfield they’re in, knowledge is often pretty siloed. Still, bits and pieces are somewhat well-known by different groups, under different names: things like compositionality, modularity, simplicity bias, etc. For instance, Poggio’s group put out a great paper back in 2017 positing that compositionality is the main way that neural networks avoid the curse of dimensionality in approximation theory. Even then, I think these often remain isolated explanations for isolated phenomena, rather than something like a unified thesis. They’re also not necessarily consensus—people still propose alternate explanations to the generalization problem in 2026! The idea of deep learning as a “universal” learning algorithm, while deeply felt by some, is rarely stated explicitly, and I expect would likely receive substantial pushback (“well deep learning performs badly in X scenario where I kneecapped it”). Many know of Solomonoff induction but don’t think about it in the context of deep learning.
Frontier labs / mechanistic interpretability / AIT-adjacent folks: Here, “deep learning is performing something like Solomonoff induction” wouldn’t make anyone blink, even if they might quibble with the details. It’s already baked into this worldview that deep learning is some kind of universal learning algorithm, that it works by finding mechanistic solutions, that there are few in-principle barriers. But even in this group, many aren’t aware of the totality of the evidence—e.g. I talked to an extremely senior mech interp researcher who wasn’t aware of the approximation theory / depth separation evidence. And few will defend the hypothesis in public. Even among those who accept the informal vibes, far fewer actively think the connection could possibly be made formal or true in any precise sense.
So: the post isn’t claiming novelty for the hypothesis (I say this explicitly at the start). It’s trying to put the evidence in one place, say the thing outright in public, and point toward formalization. If you’re already in the third group, much of it will read as elaboration. But that’s not where most people are.
I wish this were obvious :). But it’s 2026 and we’re still getting op-eds from professors arguing that LLMs are “stochastic parrots” or “just shallow pattern-matching.” I wish I could tell laypeople “the scientific consensus is that this is wrong,” but I can’t because there isn’t one. The scaling hypothesis was a fringe position five years ago and remains controversial today. “No free lunch” and “deep learning will just overfit” were standard objections until embarrassingly recently, and you’ll still hear them occasionally.
If (a tractable approximation to) the provably optimal learning algorithm isn’t “real learning,” I don’t know what would be. Yet clearly many smart people don’t believe this. And this leads to seriously different expectations about where the field will go: if deep learning is implementing something like Solomonoff induction, the default expectation shifts toward continued scaling—not to say that scaling must work (efficiency limits, data constraints, a thousand practical issues could intervene), but because there’s no in-principle reason to expect a hard ceiling. That’s something people still dispute, loudly.
That being said, as I mention at the beginning, these ideas aren’t new or original to me, and some people do believe versions of these sorts of claims. Let me crudely break it down into a few groups:
Average ML practitioner / ICLR attendee: The working assumption is still closer to “neural networks do black-box function approximation,” maybe with vague gestures toward “feature learning” or something. They may be aware of mechanistic interpretability but haven’t integrated it into their worldview. They’re usually only dimly aware of the theoretical puzzles if at all (why doesn’t the UAT construction work? why does the same network generalize on real data and memorize random labels?).
ML theory community: Well … it depends what subfield they’re in, knowledge is often pretty siloed. Still, bits and pieces are somewhat well-known by different groups, under different names: things like compositionality, modularity, simplicity bias, etc. For instance, Poggio’s group put out a great paper back in 2017 positing that compositionality is the main way that neural networks avoid the curse of dimensionality in approximation theory. Even then, I think these often remain isolated explanations for isolated phenomena, rather than something like a unified thesis. They’re also not necessarily consensus—people still propose alternate explanations to the generalization problem in 2026! The idea of deep learning as a “universal” learning algorithm, while deeply felt by some, is rarely stated explicitly, and I expect would likely receive substantial pushback (“well deep learning performs badly in X scenario where I kneecapped it”). Many know of Solomonoff induction but don’t think about it in the context of deep learning.
Frontier labs / mechanistic interpretability / AIT-adjacent folks: Here, “deep learning is performing something like Solomonoff induction” wouldn’t make anyone blink, even if they might quibble with the details. It’s already baked into this worldview that deep learning is some kind of universal learning algorithm, that it works by finding mechanistic solutions, that there are few in-principle barriers. But even in this group, many aren’t aware of the totality of the evidence—e.g. I talked to an extremely senior mech interp researcher who wasn’t aware of the approximation theory / depth separation evidence. And few will defend the hypothesis in public. Even among those who accept the informal vibes, far fewer actively think the connection could possibly be made formal or true in any precise sense.
So: the post isn’t claiming novelty for the hypothesis (I say this explicitly at the start). It’s trying to put the evidence in one place, say the thing outright in public, and point toward formalization. If you’re already in the third group, much of it will read as elaboration. But that’s not where most people are.
Thank you! This really made your thesis click for me.