From an outsider’s perspective, however, it is not obvious that MIRI functioned as a place where deep, hands-on technical understanding of AI systems was systematically acquired, even at a smaller or safer scale.
Just for reference, my “credentials”: I have worked in “machine-learning-adjacent” spaces on and off since the nineties. Some of my earliest professional mentors were veterans of major 80s AI companies before the first AI winter. My knowledge is limited to a catalog of specific tricks that often solve problems, plus a broader idea of “what kind of things often work” and “what kind of things are like the proverbial ‘land war in Asia’ leading to the collapse of empires.”
My impression of MIRI in the 2010s is that they were deeply invested in making one of the classic mistakes. I can’t quite name this classic mistake precisely, but I can point in the general direction and give concrete examples:
Cyc was the last industrial holdout for this classic mistake I can’t quite name. Academia actually mostly stopped making this mistake much earlier, starting in the 90s, and they really didn’t take the Cyc project very seriously at all after that.
MIRI, however, published a lot of papers that seemed to focus on the idea that alignment was essentially some kind of mathematical problem with a mathematical solution? At least that was the impression I always got when I read the abstracts that floated through my feeds. To my nose, their papers had that “Cyc smell”.
One of the good things they did do with these papers (IIRC) was to prove that a bunch of things would never work, for reasons of basic math.
MIRI has since realized, to their great credit, that actual, working AIs look a lot more like some mix of ChatGPT and AlphaGo than they do like Cyc the larger family of things I’m trying to describe. But my read is that a lot of their actual gut knowledge about real-world AI starts with the earliest GPT versions (before ChatGPT).
My personal take on the details, for what it is worth, is that I think they’re overly pessimistic about some arguments (e.g, they think we’re playing Russian roulette in certain areas with 6 bullets loaded, and I’d personally guess 4 or 5), but I think that they’re still far too optimistic about “alignment” in general.
Yudkowsky has been highly critical of Cyc, semantic networks, and the general GOFAI approach for many years, and his approach to building AGI is meaningfully different. It might be that the Bayesian approach to building AGI (or even the hope for an elegant mathematical theory in general) is a mistake, but it is not the same mistake.
The Bayesian approach is basically the simplest possible thing that doesn’t inevitably make the mistake I’m trying to describe. Something like Naive Bayes is still mostly legible if you stare at it for a while, and it was good enough to revolutionize spam filtering. This is because while Naive Bayes generates a big matrix, it depends on extremely concrete pieces of binary evidence. So you can factor the matrix into a bunch of clean matrices, each corresponding to the presence of a specific token. And the training computations for those small matrices are easily explained. Of course, you’re horribly abusing basic probability, but it works in practice.
This does not work for many other problems.
The problem is scaling up the domain complexity. Once you move from a spam filter to speech transcription or object recognition, the matrices get bigger, and the training process gets rapidly more opaque.
But yes, thank you for the correction—I still find a lot of MIRIs work in the 2010s a bit “off” in terms of vibes, but I will happily accept the judgement of people who read the papers in detail. And I would not wish to falsely claim that someone approved of the Cyc project when they didn’t.
I asked a bunch of LLMs with websearch to try and name the classic mistake you’re alluding to:
Opus 4.5: “apriorism about alignment”, the alignment-specific version of “intelligence, properly understood, must be architecturally legible”
DeepSeek + Deep Think: “apriorism in AI”, the “formalist fallacy”, etc
Gemini 3 Pro Thinking: “the logicist paradigm”
Grok 4 Expert: the “neats” paradigm (as in “neats vs scruffies”)
ChatGPT 5.2 Extended Thinking: “neatness bias”, “premature formalisation: locking in crisp, human-legible abstractions before you actually have an empirically grounded handle on how the thing works, and then you end up doing a lot of beautiful work on the wrong interface”
Kimi K2 Thinking: the “knowledge engineering mirage”, “the faith that you can formalize the world’s complexity faster than it generates edge cases”
To be honest these just aren’t very good, they usually do better at naming half-legible vibes.
Yeah, the “Bitter Lesson” refers to a special case of this classic mistake, as do the other essays I linked. Some of those essays were quite well known in their day, at least to various groups of practitioners.
You could do it up in the classic checklist meme format:
Your brilliant AI plan will fail because:
[ ] You assume that you can somehow make the inner workings of intelligence mostly legible.
The people who learn this unpleasant lesson the fastest are AI researchers who process inputs that are obviously arrays of numbers. For example, sound and images are giant arrays of numbers, so speech recognition researchers have known what’s up for decades. But researchers who worked with either natural language or (worse) simplified toy planning systems often thought that they could handwave away the arrays of numbers and find a nice, clear, logical “core” that captured the essence of intelligence.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
But if you slap a camera and a Raspberry Pi onto a Roomba chassis, and wire up a simple gripper arm, then you can speed-run the same brutal lessons in a year, max. You’ll learn that the world is an array of numbers, and that the best “understanding” you can obtain about the world in front of your robot is a probability distribution over “apple”, “Coke can”, “bunch of cherries” or “some unknown reddish object”, each with a number attached. The transformation that sits between the array and the probability distribution always includes at least one big matrix that’s doing illegible things.
Neural networks are just bunches of matrices with even more illegible (non-linear) complications. Biological neurons take the matrix structure and bury it under more than a billion years of biochemistry and incredible complications we’re only starting to discover.
Like I said, this is a natural mistake, and smarter people than most of us here have made this mistake, sometimes for a decade or more.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
Imagine this. Imagine a future world where gradient-driven optimization never achieves aligned AI. But there is success of a different kind. At great cost, ASI arrives. Humanity ends. In his few remaining days, a scholar with the pen name of Rete reflects back on the 80s approach (i.e. using deterministic rules and explicit knowledge) with the words: “The technology wasn’t there yet; it didn’t work commercially. But they were onto something—at the very least, their approach was probably compatible with provably safe intelligence. Under other circumstances, perhaps it would have played a more influential role in promoting human thriving.”
Just for reference, my “credentials”: I have worked in “machine-learning-adjacent” spaces on and off since the nineties. Some of my earliest professional mentors were veterans of major 80s AI companies before the first AI winter. My knowledge is limited to a catalog of specific tricks that often solve problems, plus a broader idea of “what kind of things often work” and “what kind of things are like the proverbial ‘land war in Asia’ leading to the collapse of empires.”
My impression of MIRI in the 2010s is that they were deeply invested in making one of the classic mistakes. I can’t quite name this classic mistake precisely, but I can point in the general direction and give concrete examples:
Imagining that an AI could understand natural language using a graph of facts about the world and some kind of fancy logical deduction
Whatever Chomsky was thinking in the Chomsky/Norvig debates, where he keeps getting upset about statistical algorithms
The people who keep needing to re-learn the Bitter Lesson the hard way
Anyone who thinks AIs are logically designed rather than “grown”
Anyone who argues that you can have an AI that isn’t bunch of “giant inscrutable matrices” (or something even more difficult to understand)
The poor doomed Cyc project, the last true heroic attempt at an AI based on logical rules
That pattern, that’s the thing I’m pointing at.
Cyc was the last industrial holdout for this classic mistake I can’t quite name. Academia actually mostly stopped making this mistake much earlier, starting in the 90s, and they really didn’t take the Cyc project very seriously at all after that.
MIRI, however, published a lot of papers that seemed to focus on the idea that alignment was essentially some kind of mathematical problem with a mathematical solution? At least that was the impression I always got when I read the abstracts that floated through my feeds. To my nose, their papers had that “Cyc smell”.
One of the good things they did do with these papers (IIRC) was to prove that a bunch of things would never work, for reasons of basic math.
MIRI has since realized, to their great credit, that actual, working AIs look a lot more like some mix of ChatGPT and AlphaGo than they do like
Cycthe larger family of things I’m trying to describe. But my read is that a lot of their actual gut knowledge about real-world AI starts with the earliest GPT versions (before ChatGPT).My personal take on the details, for what it is worth, is that I think they’re overly pessimistic about some arguments (e.g, they think we’re playing Russian roulette in certain areas with 6 bullets loaded, and I’d personally guess 4 or 5), but I think that they’re still far too optimistic about “alignment” in general.
Yudkowsky has been highly critical of Cyc, semantic networks, and the general GOFAI approach for many years, and his approach to building AGI is meaningfully different. It might be that the Bayesian approach to building AGI (or even the hope for an elegant mathematical theory in general) is a mistake, but it is not the same mistake.
The Bayesian approach is basically the simplest possible thing that doesn’t inevitably make the mistake I’m trying to describe. Something like Naive Bayes is still mostly legible if you stare at it for a while, and it was good enough to revolutionize spam filtering. This is because while Naive Bayes generates a big matrix, it depends on extremely concrete pieces of binary evidence. So you can factor the matrix into a bunch of clean matrices, each corresponding to the presence of a specific token. And the training computations for those small matrices are easily explained. Of course, you’re horribly abusing basic probability, but it works in practice.
This does not work for many other problems.
The problem is scaling up the domain complexity. Once you move from a spam filter to speech transcription or object recognition, the matrices get bigger, and the training process gets rapidly more opaque.
But yes, thank you for the correction—I still find a lot of MIRIs work in the 2010s a bit “off” in terms of vibes, but I will happily accept the judgement of people who read the papers in detail. And I would not wish to falsely claim that someone approved of the Cyc project when they didn’t.
I asked a bunch of LLMs with websearch to try and name the classic mistake you’re alluding to:
Opus 4.5: “apriorism about alignment”, the alignment-specific version of “intelligence, properly understood, must be architecturally legible”
DeepSeek + Deep Think: “apriorism in AI”, the “formalist fallacy”, etc
Gemini 3 Pro Thinking: “the logicist paradigm”
Grok 4 Expert: the “neats” paradigm (as in “neats vs scruffies”)
ChatGPT 5.2 Extended Thinking: “neatness bias”, “premature formalisation: locking in crisp, human-legible abstractions before you actually have an empirically grounded handle on how the thing works, and then you end up doing a lot of beautiful work on the wrong interface”
Kimi K2 Thinking: the “knowledge engineering mirage”, “the faith that you can formalize the world’s complexity faster than it generates edge cases”
To be honest these just aren’t very good, they usually do better at naming half-legible vibes.
Yeah, the “Bitter Lesson” refers to a special case of this classic mistake, as do the other essays I linked. Some of those essays were quite well known in their day, at least to various groups of practitioners.
You could do it up in the classic checklist meme format:
The people who learn this unpleasant lesson the fastest are AI researchers who process inputs that are obviously arrays of numbers. For example, sound and images are giant arrays of numbers, so speech recognition researchers have known what’s up for decades. But researchers who worked with either natural language or (worse) simplified toy planning systems often thought that they could handwave away the arrays of numbers and find a nice, clear, logical “core” that captured the essence of intelligence.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
But if you slap a camera and a Raspberry Pi onto a Roomba chassis, and wire up a simple gripper arm, then you can speed-run the same brutal lessons in a year, max. You’ll learn that the world is an array of numbers, and that the best “understanding” you can obtain about the world in front of your robot is a probability distribution over “apple”, “Coke can”, “bunch of cherries” or “some unknown reddish object”, each with a number attached. The transformation that sits between the array and the probability distribution always includes at least one big matrix that’s doing illegible things.
Neural networks are just bunches of matrices with even more illegible (non-linear) complications. Biological neurons take the matrix structure and bury it under more than a billion years of biochemistry and incredible complications we’re only starting to discover.
Like I said, this is a natural mistake, and smarter people than most of us here have made this mistake, sometimes for a decade or more.
Imagine this. Imagine a future world where gradient-driven optimization never achieves aligned AI. But there is success of a different kind. At great cost, ASI arrives. Humanity ends. In his few remaining days, a scholar with the pen name of Rete reflects back on the 80s approach (i.e. using deterministic rules and explicit knowledge) with the words: “The technology wasn’t there yet; it didn’t work commercially. But they were onto something—at the very least, their approach was probably compatible with provably safe intelligence. Under other circumstances, perhaps it would have played a more influential role in promoting human thriving.”