chapter 17 surprised me for how well it anticipated modern AI doomerism
It’s perhaps worth highlighting the significant tension between two contrasting claims: on the one hand, the idea that modern AI doomerism was “anticipated” as early as the 19th century, and on the other, the idea that modern AI doom arguments are rationally grounded in a technical understanding of today’s deep learning systems. If the core concerns about AI doom were truly foreseen over a century ago, long before any of the technical details of modern machine learning existed, then I suggest the arguments can’t really be based on those technical details in any deep or meaningful way.
One way to resolve this contradiction is to posit that AI doom arguments are not fundamentally about technical aspects at all, but are instead rooted in a broader philosophical stance—namely, that artificial life is by default bad, dangerous, or disvaluable (for example by virtue of lacking consciousness, or by virtue of being cold and calculating), while biological life is by default good or preferable. However, when framed in this way, the arguments lose much of their perceived depth and rigor, and look more like raw intuition-backed reactions to the idea of mechanical minds than tenable theories.
Strong disagree voted. To me this is analogous to saying that, given that Leonardo da Vinci tried to design a flying machine and believed this to be possible, despite not really understanding aerodynamics, that the Wright brothers believing the aeroplane they designed would fly “can’t really be based on those technical details in any deep or meaningful way.”
“Maybe a thing smarter than humans will eventually displace us” is really not a very complicated argument, and no one is claiming it is. So it should be part of our hypothesis class, and various people like Turing thought of it well before modern ML. The “rationally grounded in a technical understanding of today’s deep learning systems” part is about how we update our probabilities of the hypotheses in our hypothesis class, and how we can comfortably say “yes, terrible outcomes still seem plausible”, as they did on priors without needing to look at AI systems at all (my probability is moderately lower than it would have been without looking at AIs at all, but with massive uncertainty)
Intuition and rigour agreeing is not some kind of highly suspicious gotcha
“Maybe a thing smarter than humans will eventually displace us” is really not a very complicated argument, and no one is claiming it is. So it should be part of our hypothesis class, and various people like Turing thought of it well before modern ML.
This is a claim about what is possible, but I am talking about what people claim is probable. If the core idea of “AI doomerism” is that AI doom is merely possible, then I agree: little evidence is required to believe the claim. In this case, it would be correct to say that someone from the 19th century could indeed have anticipated the arguments for AI doom being possible, as such a claim would be modest and hard to argue against.
Yet a critical component of modern AI doomerism is not merely about what’s possible, but what is likely to occur: many people explicitly assert that AI doom is probable, not merely possible. My point is that if the core reasons supporting this stronger claim could have been anticipated in the 19th century, then it is a mistake to think that the key cruxes generating disagreement about AI doom hinge on technical arguments specific to contemporary deep learning.
The way I think about it, you should have a prior distribution over doom Vs no doom, and then getting a bunch of info about current ML should update that. In my opinion, it is highly unreasonable to have a very low prior on “thing smarter than humans successfully acts significantly against our interests”, and that you should generally be highly uncertain and view this as high variance
So I guess the question is how many people who think doom is very unlikely just start from a really low prior but agree with me on the empirical updates, or start from some more uncertain prior but update a bunch downwards on empirical evidence or at least reasoning about the world. Like oh, companies are rational enough that they just wouldn’t build something that would be dangerous and it’ll be easy to test for and they’ll do this testing. Historically, we’ve solved issues with technology before they arose so this will be fine. Or we can just turn it off if something gets wrong. I would consider even the notion that there exists the ability to turn it off as using information that someone would not recently have had in the 19th century
My guess is that most reasonable people with low P(doom), who are willing to actually engage with probabilities here, start at at least 5% but just update down a bunch for reasons I tend to disagree with/consider wildly overconfident? But maybe you’re arguing that the disagreement stems now from priors?
You strong disagree downvoted my comment, but it’s still not clear to me that you actually disagree with my core claim. I’m not making a claim about priors, or whether it’s reasonable to think that p(doom) might be non-negligible a priori.
My point is instead about whether the specific technical details of deep learning today are ultimately what’s driving some people’s high probability estimates of AI doom. If the intuition behind these high estimates could’ve been provided in the 19th century (without modern ML insights), then modern technical arguments don’t seem to be the real crux.
Therefore, while you might be correct about priors regarding p(doom), or whether existing evidence reinforces high concern for AI doom, these points seem separate from my core claim about the primary motivating intuitions behind a strong belief in AI doom.
(To clarify, I strong disagree voted, I haven’t downvoted at all—I still strongly disagree)
I am confused and feel like I must be misunderstanding your point. It feels like you’re attempting a “gotcha” argument, but I don’t understand your point or who you’re trying to criticize. It seems like bizarre rhetorical practice. It is not a valid argument to say that “people can hold position A for bad reason X, therefore all people who hold position A also hold it for bad reason X even if they claim it is for good reason Y”. But that seems to be your argument? For A=high doom, X=weird 19th century intuition, Y=actually good technical reasons grounded in modern ML. What am I missing? If you want to argue that someone else really believes bad reason X, you need to engage with specific details of that person and why you believe they are saying false things about their beliefs.
I could easily flip this argument. In the 19th century, I’m sure people said machines could never possibly be dangerous—“God will protect us” or “They are tools, and tools are always subservient to man.” or “They will never have a soul, and so can never be truly dangerous.”. This is a raw, intuition-backed argument. People today who claim to believe that AI will be safe for sophisticated technical reasons could have held these same beliefs in the 19th century, which suggests they are being dishonest. Why does your argument hold, but mine break?
I also don’t actually know which people you want to criticize. My sense is that many community members with high p(doom), like Yudkowsky, developed these views 10-20 years ago and haven’t substantially updated since, so obviously they can’t come from nuanced views of modern ML. As far as I am aware they don’t seem to claim their beliefs are heavily driven by sophisticated technical reasons about current ML systems—they simply maintain their existing views. It still seems a strawman to call views formed without specific technical grounding “raw intuition-backed reactions to the idea of mechanical minds”. Like, regardless of how much you agree, “Superintelligence” clearly makes a much more sophisticated case than you imply, while predating deep learning.
I’m not actually aware of anyone who claims to be afraid of just current ML systems due to specific technical reasons. The reasons for being afraid are pretty obvious, but there are very specific facts about these systems that can adjust them. Now that modern deep learning exists, some of these concerns seem validated, while others seem less significant, and new issues have arisen. This seems completely normal and exactly what you would expect? My personal view is that we should be moderately but not extremely concerned about Doom. I understand modern machine learning well, and it hasn’t substantially shifted my position in either direction. The large language model paradigm somewhat increased my optimism about safety, while the shift toward long-horizon RL somewhat increased my concern about Doom, though this development was expected eventually.
Can you give some concrete examples of specific people/public statements that you are trying to criticise here? That might help ground out this disagreement.
I am confused and feel like I must be misunderstanding your point. It feels like you’re attempting a “gotcha” argument, but I don’t understand your point or who you’re trying to criticize. It seems like bizarre rhetorical practice. It is not a valid argument to say that “people can hold position A for bad reason X, therefore all people who hold position A also hold it for bad reason X even if they claim it is for good reason Y”. But that seems to be your argument?
I think you’re overinterpreting my comment and attributing to me the least charitable plausible interpretation of what I wrote (along with most other people commenting and voting in this thread. As a general rule that I’ve learned from my time in online communities, whenever someone makes a claim on an online forum that indicates a rejection of a belief central to that forum’s philosophy, people tend to reply to that person by ruthlessly assuming the most foolish plausible interpretation of their remarks. LessWrong is no exception.)
My actual position is simply this: if the core arguments for AI doom could have genuinely been presented and anticipated in the 19th century, then the crucial factor that actually determines whether most “AI doomers” believe in AI doom is probably something relatively abstract or philosophical, rather than specific technical arguments grounded in the details of machine learning. This does not imply that technical arguments are irrelevant, it just means they’re probably not as cruxy to whether people actually believe that doom is probable or not.
(Also to be clear, unless otherwise indicated, in this thread I am using “belief in AI doom” as shorthand for “belief that AI doom is more likely than not” rather than “belief that AI doom is possible and at least a little bit plausible, so therefore worth worrying about.” I think these two views should generally be distinguished.)
(To clarify, I strong disagree voted, I haven’t downvoted at all—I still strongly disagree)
Oops, I recognize that, I just misstated it in my original comment.
Thanks for clarifying. I’m sorry you feel strawmanned, but I’m still fairly confused.
Possibly the confusion is that you’re using AI doom to mean >50%? I personally think that it is not very reasonable to get that high based on conceptual arguments someone in the 19th century could understand, and definitely not >90%. But getting to >5% seems totally reasonable to me. I didn’t read this post as arguing that you should be >50% back in the 19th century, though I could easily imagine a given author being overconfident. And specific technical details of ML is totally enough for enough of an update to bring you above or below 50%, so this matters. I personally do not think there’s >50% of doom, but am still very concerned.
I think the simple argument “building minds vastly smarter than our own seems dangerous” is in fact pretty compelling, and seems relatively easy to realize beforehand, as e.g. Turing and many others did. Personally, there are not any technical facts about current ML systems which update me more overall either way about our likelihood of survival than this simple argument does.
And I see little reason why they should—technical details of current AI systems strike me as around as relevant to predicting whether future, vastly more intelligent systems will care about us as do e.g. technical details about neuronal firing in beetles about whether a given modern government will care about us. Certainly modern governments wouldn’t exist if neurons hadn’t evolved, and I expect one could in fact probably gather some information relevant to predicting them by studying beetle neurons; maybe even a lot, in principle. It just seems a rather inefficient approach, given how distant the object of study is from the relevant question.
There appears to be a motte-and-bailey worth unpacking. The weaker, easily defensible claim is that advanced AI could be risky or dangerous. This modest assertion requires little evidence, similar to claims that extraterrestrial aliens, advanced genetic engineering of humans, or large-scale human cloning might be dangerous. I do not dispute this modest claim.
The stronger claim about AI doom is that doom is likely rather than merely possible. This substantial claim demands much stronger evidence than the weaker claim. The tension I previously raised addresses this stronger claim of probable AI doom (“AI doomerism”), not the weaker claim that advanced AI might be risky.
Many advocates of the strong claim of AI doom explicitly assert that their belief is backed by technical arguments, such as the counting argument for scheming behavior in SGD, among other arguments. However, if the premise of AI doom does not, in fact, rely on such technical arguments, then it is a mistake to argue about these ideas as if they are the key cruxes generating disagreement about AI doom.
I think the word “technical” is a red herring here. If someone tells me a flood is coming, I don’t much care how much they know about hydrodynamics, even if in principle this knowledge might allow me to model the threat with more confidence. Rather, I care about things like e.g. how sure they are about the direction from which the flood is coming, about the topography of our surroundings, etc. Personally, I expect I’d be much more inclined to make large/confident updates on the basis of information at levels of abstraction like these, than at levels about e.g. hydrodynamics or particle physics or so forth, however much more “technical,” or related-in-principle in some abstract reductionist sense, the latter may be.
I do think there are also many arguments beyond this simple one which clearly justify additional (and more confident) concern. But I try to assess such arguments based on how compelling they are, where “technical precision” is one, but hardly the only factor which might influence this; e.g., another is whether the argument even involves the relevant level of abstraction, or bears on the question at hand.
No, the point is that AI x-risk is commonsensical. “If you drink much from a bottle marked poison it is certain to disagree with you sooner or later” even if you don’t know mechanism of action of poison. We don’t expect Newtonian mechanics to prove that hitting yourself with a brick is quite safe, if we’d found that Newtonian mechanics predicts hitting yourself with a brick to be safe, it would be a huge evidence for Newtonian mechanics to be wrong. Good theories usually support common intuitions.
The other thing here is an isolated demand for rigor: there is no “technical understanding of today’s deep learning systems” which predicts, say, success of AGI labs or that their final products are going to be safe.
If we accept your interpretation—that AI doom is simply the commonsense view—then doesn’t that actually reinforce my point? It suggests that the central concern driving AI doomerism isn’t a set of specific technical arguments grounded in the details of deep learning. Instead, it’s based on broader and more fundamental intuitions about the nature of artificial life and its potential risks. To borrow your analogy: the belief that a brick falling on someone’s head would cause them harm isn’t ultimately rooted in technical disputes within Newtonian mechanics. It’s based on ordinary, everyday experience. Likewise, our conversations about AI doom should focus on the intuitive, commonsense cruxes behind it, rather than pretending that the real disagreement comes from highly specific technical deep learning arguments. Instead of undermining my comment, I think your point actually strengthens it.
I don’t think the mainline doom arguments claim to be rooted in deep learning?
Mostly they’re rigorized intuitive models about the nature of agency/intelligence/goal-directedness, which may go some way toward explaining certain phenomena we see in the behavior of LLMs (ie the Palisade Stockfish experiment). They’re theoretical arguments related to a broad class of intuitions and in many cases predate deep learning as a paradigm.
We can (and many do) argue over whether our lens ought to be top-down or bottom-up, but leaning toward the top down approach isn’t the same thing as relying on a-rigorous anxieties of the kind some felt 100 years ago.
I don’t think the mainline doom arguments claim to be rooted in deep learning?
To verify this claim, we can examine the blurb in Nate and Eliezer’s new book announcement, which states:
If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.
From this quote, I draw two main inferences. First, their primary concern seems to be driven by the nature of existing deep learning technologies. [ETA: To be clear, I mean that it’s the primary factor driving their high p(doom), not that they’d be unconcerned about AI risk without deep learning.] This is suggested by their use of the phrase “anything remotely like current techniques”, which suggests that their core worries stem largely from deep learning rather than all potential AI development pathways. Second, the statement conveys a high degree of confidence in their prediction. This is evident in the fact that the claim is presented without any hedging or uncertainty—there are no phrases like “it’s possible that” or “we think this may occur.” The absence of such qualifiers implies that they see the outcome as highly probable, rather than speculative.
Now, imagine that, using only abstract reasoning available in the 19th century, someone could reasonably arrive at a 5% estimate for the likelihood that AI would pose an existential risk. Then suppose that, after observing the development and capabilities of modern deep learning, this estimate increases to 95%. In that case, I think it would be fair to say that the central or primary source of concern is rooted in the developments in deep learning, rather than in the original abstract arguments. That’s because the bulk of the concern emerged in response to concrete evidence from deep learning, and not from the earlier theoretical reasoning alone. I think this is broadly similar to MIRI’s position, although they may not go quite as far in attributing the shift in concern to deep learning alone.
Conversely, if someone already had a 95% credence in AI posing an existential threat based solely on abstract considerations from the 19th century—before the emergence of deep learning—then it would make more sense to say that their core concern is not based on deep learning at all. Their conviction would have been established independently of modern developments. This latter view is the one I was responding to in my original comment, as it seemed inconsistent with how others—including MIRI—have characterized the origin and basis of their concerns, as I’ve outlined above.
their primary concern seems to be driven by the nature of existing deep learning technologies. This is suggested by their use of the phrase “anything remotely like current techniques”, which suggests that their core worries stem largely from deep learning rather than all potential AI development pathways
You know better! Eliezer at least has been arguing these points far before DL!
He has been warning of a significant risk of catastrophe for a long time, but unless I’m mistaken, he only began explicitly and primarily arguing for a high probability of catastrophe more recently, around the time deep learning emerged. This distinction is essential to my argument, and was highlighted explicitly by my comment.
Yes, I agree your whole comment sucks. I know you know there is a difference between p(doom) and p(doom|AGI soon), and your reasons for having a high p(doom | AGI soon) and low p(doom) can be very different. Indeed a whole factor of p(AGI soon) different!
So we can get the observed shift with most of the “highly technical DL-specific considerations” mainly updating the p(AGI soon) factor via the incredibly complicated and arcane practice of… extrapolating benchmark scores.
Indeed, the fact AGI seems to be arriving so quickly is the main reason most people are worried!
This is not to say they like deep learning. There can be additional reasons deep learning is bad in their book, but is deep learning a core part of their arguments? Hell no! Do you know how I know? I’ve actually read them! Indeed, if you type site:arbital.greaterwrong.com “deep learning” into google, you get back two results. Compare with site:arbital.greaterwrong.com “utility function”, which gives you 5 pages. Now which do you think is more central to their high p(doom | AGI in 5 years)?
I wasn’t asking for your evaluation of the rest of my comment. I was clarifying a specific point because it seemed you had misunderstood what I was saying.
So we can get the observed shift with most of the “highly technical DL-specific considerations” mainly updating the p(AGI soon) factor via the incredibly complicated and arcane practice of… extrapolating benchmark scores.
Indeed, the fact AGI seems to be arriving so quickly is the main reason most people are worried!
If someone says their high p(doom) is driven by short timelines, what they likely mean is that AGI is now expected to arrive via a certain method—namely, deep learning—that is perceived as riskier than what might have emerged under slower or more deliberate development. If that’s the case, it directly supports my core point.
This explanation makes sense to me since expecting AGI to arrive soon doesn’t by itself justify a high probability of doom. After all, it would have been reasonable to have always believed AGI would come eventually, and it would have been unjustified to increase one’s p(doom) over time merely because time is passing.
There can be additional reasons deep learning is bad in their book, but is deep learning a core part of their arguments? Hell no! Do you know how I know? I’ve actually read them!
I think you’re conflating two distinct issues: first, what initially made people worry about AI risk at all; and second, what made people think doom is likely as opposed to merely a possibility worth taking seriously. I’m addressing the second point, not the first.
Please try to engage with what I’m actually saying, rather than continuing to misrepresent my position.
Please try to engage with what I’m actually saying, rather than continuing to misrepresent my position.
It seems everyone has this problem with your writing, have you considered speaking more clearly or perhaps considering people understand you fully and it is you who are wrong?
In this case, I believe it’s the latter, since
If someone says their high p(doom) is driven by short timelines, what they likely mean is that AGI is now expected to arrive via a certain method—namely, deep learning—that is perceived as riskier than what might have emerged under slower or more deliberate development. If that’s the case, it directly supports my core point.
Really? I thought your core point was
It’s perhaps worth highlighting the significant tension between two contrasting claims: on the one hand, the idea that modern AI doomerism was “anticipated” as early as the 19th century, and on the other, the idea that modern AI doom arguments are rationally grounded in a technical understanding of today’s deep learning systems.
In which case I did explain why there is no tension, as can be seen from my saying
So we can get the observed shift with most of the “highly technical DL-specific considerations” mainly updating the p(AGI soon) factor via the incredibly complicated and arcane practice of… extrapolating benchmark scores.
That is, it is a very strange thing to say there is a “significant tension” between having high p(doom | AGI soon) on first principles reasoning, and to have p(AGI soon) get updated by benchmark scores.
Yes — Garrett Baker repeatedly and materially misrepresents what Matthew is saying.
I have custom instructions turned off, and I haven’t turned on the memory feature, so there’s no strong reason to expect it to behave sycophantically (that I’m aware of). And o3 said it doesn’t know which side I’m on. I expect most other LLMs will say something similar when given neutral prompts and the full context.
(Not that this is strong evidence. But I think it undermines your claim by at least a bit.)
o3 has the same conclusion with a slightly different prompt.
Read this comment exchange and come to a definitive conclusion about whether Garrett Baker is accurately representing Matthew. Focus on content rather than tone:
Conclusion: Garrett is not accurately representing Matthew’s position. Below is a point‑by‑point comparison that shows where Garrett’s paraphrases diverge from what Matthew is actually claiming (ignoring tone and focusing only on the content).
It seems everyone has this problem with your writing, have you considered speaking more clearly or perhaps considering people understand you fully and it is you who are wrong?
I reject the premise. In general, my writing is interpreted significantly more accurately when I’m not signaling skepticism about AI risk on LessWrong. For most other topics, including on this site, readers tend to understand my points reasonably well, especially when the subject is less controversial.
This could perhaps mean I’m uniquely unclear when discussing AI risk. It’s also very plausible that the topic itself is unusually prone to misrepresentation. Still, I think a major factor is that people are often uncharitable toward unpopular viewpoints they strongly disagree with, which accounts for much of the pushback I receive on this subject.
Specifically, the idea is that AI going well for humans would require a detailed theory of how to encode human values in form suitable for machine optimization, and the relevance of deep learning is that Yudkowsky and Soares think that deep learning is on track to provide the superhuman optimization without the theory of values. You’re correct to note that this is a stance according to which “artificial life is by default bad, dangerous, or disvaluable,” but I think the way you contrast it with the claim that “biological life is by default good or preferable” is getting the nuances slightly wrong: independently-evolved biological aliens with superior intelligence would also be dangerous for broadly similar reasons.
Didn’t you have a post where you argued that it’s a consequence of their view that biological aliens are better, morally speaking, than artificial earth originating life, or did I misunderstand?
To the extent that you’re saying “I’d like to have more conversations about why creating powerful agentic systems might not go well by default; for others this seems like a given, and I just don’t see it”, I applaud you and hope you get to talk about this a whole bunch with smart people in a mutually respectful environment. However, I do not believe analogizing the positions of those who disagree with you with luddites from the 19th century (in particular when thousands of pages of publicly available writings, with which you are familiar, exist) is the best way to invite those conversations.
Quoting the first page of a book as though it contained a detailed roadmap of the central (60,000-word) argument’s logical flow (which to you is apparently the same as a rigorous historical account of how the authors came to believe what they believe) — while it claims to do nothing of the sort — simply does not parse at all. If you read the book (which I recommend, based on your declared interests here), or modeled the pre-existing knowledge of the median book website reader, you would not think “anything remotely like current techniques” meant “we are worried exclusively about deep learning for deep learning-exclusive reasons; trust us because we know so much about deep learning.”
If you find evidence of Eliezer, Nate, or similar saying “The core reason I am concerned about AI safety is [something very specific about deep learning]; otherwise I would not be concerned”, I would take your claims about MIRI’s past messaging very seriously. As is, no evidence exists before me that I may consider in support of this claim.
Based on what you’ve said so far, you seem to think that all of the cruxes (or at least the most important ones) must either be purely intuitive or purely technical. If they’re purely intuitive, then you dismiss them as the kind of reactionary thinking someone from the 19th century might have come up with. If they’re purely technical, you’d be well-positioned to propose clever technical solutions (or else to discredit your interlocutor on the basis of their credentials).
Reality’s simply messier than that. You likely have both intuitive and technical cruxes, as well as cruxes with irreducible intuitive and technical components (that is, what you see when you survey the technical evidence is shaped by your prior, and your motivations, as is true for anyone; as was true for you when interpreting that book excerpt).
I think you’re surrounded by smart people who would be excited to pour time into talking to you about this, conditional on not opening that discussion with a straw man of their position.
I do not believe analogizing the positions of those who disagree with you with luddites from the 19th century (in particular when thousands of pages of publicly available writings, with which you are familiar, exist) is the best way to invite those conversations.
To clarify, I am not analogizing the positions of those who disagree with me with luddites from the 19th century. This is not my intention, nor was it my argument.
I think we’re talking past each other here, so I will respectfully drop this discussion.
Contemporary AI existential risk concerns originated prior to it being obvious that a dangerous AI would likely involve deep learning, so no one could claim that the arguments that existed in ~2010 involved technical details of deep learning, and you didn’t need to find anything written in the 19th century to establish this.
It’s perhaps worth highlighting the significant tension between two contrasting claims: on the one hand, the idea that modern AI doomerism was “anticipated” as early as the 19th century, and on the other, the idea that modern AI doom arguments are rationally grounded in a technical understanding of today’s deep learning systems. If the core concerns about AI doom were truly foreseen over a century ago, long before any of the technical details of modern machine learning existed, then I suggest the arguments can’t really be based on those technical details in any deep or meaningful way.
One way to resolve this contradiction is to posit that AI doom arguments are not fundamentally about technical aspects at all, but are instead rooted in a broader philosophical stance—namely, that artificial life is by default bad, dangerous, or disvaluable (for example by virtue of lacking consciousness, or by virtue of being cold and calculating), while biological life is by default good or preferable. However, when framed in this way, the arguments lose much of their perceived depth and rigor, and look more like raw intuition-backed reactions to the idea of mechanical minds than tenable theories.
Strong disagree voted. To me this is analogous to saying that, given that Leonardo da Vinci tried to design a flying machine and believed this to be possible, despite not really understanding aerodynamics, that the Wright brothers believing the aeroplane they designed would fly “can’t really be based on those technical details in any deep or meaningful way.”
“Maybe a thing smarter than humans will eventually displace us” is really not a very complicated argument, and no one is claiming it is. So it should be part of our hypothesis class, and various people like Turing thought of it well before modern ML. The “rationally grounded in a technical understanding of today’s deep learning systems” part is about how we update our probabilities of the hypotheses in our hypothesis class, and how we can comfortably say “yes, terrible outcomes still seem plausible”, as they did on priors without needing to look at AI systems at all (my probability is moderately lower than it would have been without looking at AIs at all, but with massive uncertainty)
Intuition and rigour agreeing is not some kind of highly suspicious gotcha
This is a claim about what is possible, but I am talking about what people claim is probable. If the core idea of “AI doomerism” is that AI doom is merely possible, then I agree: little evidence is required to believe the claim. In this case, it would be correct to say that someone from the 19th century could indeed have anticipated the arguments for AI doom being possible, as such a claim would be modest and hard to argue against.
Yet a critical component of modern AI doomerism is not merely about what’s possible, but what is likely to occur: many people explicitly assert that AI doom is probable, not merely possible. My point is that if the core reasons supporting this stronger claim could have been anticipated in the 19th century, then it is a mistake to think that the key cruxes generating disagreement about AI doom hinge on technical arguments specific to contemporary deep learning.
The way I think about it, you should have a prior distribution over doom Vs no doom, and then getting a bunch of info about current ML should update that. In my opinion, it is highly unreasonable to have a very low prior on “thing smarter than humans successfully acts significantly against our interests”, and that you should generally be highly uncertain and view this as high variance
So I guess the question is how many people who think doom is very unlikely just start from a really low prior but agree with me on the empirical updates, or start from some more uncertain prior but update a bunch downwards on empirical evidence or at least reasoning about the world. Like oh, companies are rational enough that they just wouldn’t build something that would be dangerous and it’ll be easy to test for and they’ll do this testing. Historically, we’ve solved issues with technology before they arose so this will be fine. Or we can just turn it off if something gets wrong. I would consider even the notion that there exists the ability to turn it off as using information that someone would not recently have had in the 19th century
My guess is that most reasonable people with low P(doom), who are willing to actually engage with probabilities here, start at at least 5% but just update down a bunch for reasons I tend to disagree with/consider wildly overconfident? But maybe you’re arguing that the disagreement stems now from priors?
You strong disagree downvoted my comment, but it’s still not clear to me that you actually disagree with my core claim. I’m not making a claim about priors, or whether it’s reasonable to think that p(doom) might be non-negligible a priori.
My point is instead about whether the specific technical details of deep learning today are ultimately what’s driving some people’s high probability estimates of AI doom. If the intuition behind these high estimates could’ve been provided in the 19th century (without modern ML insights), then modern technical arguments don’t seem to be the real crux.
Therefore, while you might be correct about priors regarding p(doom), or whether existing evidence reinforces high concern for AI doom, these points seem separate from my core claim about the primary motivating intuitions behind a strong belief in AI doom.
(To clarify, I strong disagree voted, I haven’t downvoted at all—I still strongly disagree)
I am confused and feel like I must be misunderstanding your point. It feels like you’re attempting a “gotcha” argument, but I don’t understand your point or who you’re trying to criticize. It seems like bizarre rhetorical practice. It is not a valid argument to say that “people can hold position A for bad reason X, therefore all people who hold position A also hold it for bad reason X even if they claim it is for good reason Y”. But that seems to be your argument? For A=high doom, X=weird 19th century intuition, Y=actually good technical reasons grounded in modern ML. What am I missing? If you want to argue that someone else really believes bad reason X, you need to engage with specific details of that person and why you believe they are saying false things about their beliefs.
I could easily flip this argument. In the 19th century, I’m sure people said machines could never possibly be dangerous—“God will protect us” or “They are tools, and tools are always subservient to man.” or “They will never have a soul, and so can never be truly dangerous.”. This is a raw, intuition-backed argument. People today who claim to believe that AI will be safe for sophisticated technical reasons could have held these same beliefs in the 19th century, which suggests they are being dishonest. Why does your argument hold, but mine break?
I also don’t actually know which people you want to criticize. My sense is that many community members with high p(doom), like Yudkowsky, developed these views 10-20 years ago and haven’t substantially updated since, so obviously they can’t come from nuanced views of modern ML. As far as I am aware they don’t seem to claim their beliefs are heavily driven by sophisticated technical reasons about current ML systems—they simply maintain their existing views. It still seems a strawman to call views formed without specific technical grounding “raw intuition-backed reactions to the idea of mechanical minds”. Like, regardless of how much you agree, “Superintelligence” clearly makes a much more sophisticated case than you imply, while predating deep learning.
I’m not actually aware of anyone who claims to be afraid of just current ML systems due to specific technical reasons. The reasons for being afraid are pretty obvious, but there are very specific facts about these systems that can adjust them. Now that modern deep learning exists, some of these concerns seem validated, while others seem less significant, and new issues have arisen. This seems completely normal and exactly what you would expect? My personal view is that we should be moderately but not extremely concerned about Doom. I understand modern machine learning well, and it hasn’t substantially shifted my position in either direction. The large language model paradigm somewhat increased my optimism about safety, while the shift toward long-horizon RL somewhat increased my concern about Doom, though this development was expected eventually.
Can you give some concrete examples of specific people/public statements that you are trying to criticise here? That might help ground out this disagreement.
I think you’re overinterpreting my comment and attributing to me the least charitable plausible interpretation of what I wrote (along with most other people commenting and voting in this thread. As a general rule that I’ve learned from my time in online communities, whenever someone makes a claim on an online forum that indicates a rejection of a belief central to that forum’s philosophy, people tend to reply to that person by ruthlessly assuming the most foolish plausible interpretation of their remarks. LessWrong is no exception.)
My actual position is simply this: if the core arguments for AI doom could have genuinely been presented and anticipated in the 19th century, then the crucial factor that actually determines whether most “AI doomers” believe in AI doom is probably something relatively abstract or philosophical, rather than specific technical arguments grounded in the details of machine learning. This does not imply that technical arguments are irrelevant, it just means they’re probably not as cruxy to whether people actually believe that doom is probable or not.
(Also to be clear, unless otherwise indicated, in this thread I am using “belief in AI doom” as shorthand for “belief that AI doom is more likely than not” rather than “belief that AI doom is possible and at least a little bit plausible, so therefore worth worrying about.” I think these two views should generally be distinguished.)
Oops, I recognize that, I just misstated it in my original comment.
Thanks for clarifying. I’m sorry you feel strawmanned, but I’m still fairly confused.
Possibly the confusion is that you’re using AI doom to mean >50%? I personally think that it is not very reasonable to get that high based on conceptual arguments someone in the 19th century could understand, and definitely not >90%. But getting to >5% seems totally reasonable to me. I didn’t read this post as arguing that you should be >50% back in the 19th century, though I could easily imagine a given author being overconfident. And specific technical details of ML is totally enough for enough of an update to bring you above or below 50%, so this matters. I personally do not think there’s >50% of doom, but am still very concerned.
I think the simple argument “building minds vastly smarter than our own seems dangerous” is in fact pretty compelling, and seems relatively easy to realize beforehand, as e.g. Turing and many others did. Personally, there are not any technical facts about current ML systems which update me more overall either way about our likelihood of survival than this simple argument does.
And I see little reason why they should—technical details of current AI systems strike me as around as relevant to predicting whether future, vastly more intelligent systems will care about us as do e.g. technical details about neuronal firing in beetles about whether a given modern government will care about us. Certainly modern governments wouldn’t exist if neurons hadn’t evolved, and I expect one could in fact probably gather some information relevant to predicting them by studying beetle neurons; maybe even a lot, in principle. It just seems a rather inefficient approach, given how distant the object of study is from the relevant question.
There appears to be a motte-and-bailey worth unpacking. The weaker, easily defensible claim is that advanced AI could be risky or dangerous. This modest assertion requires little evidence, similar to claims that extraterrestrial aliens, advanced genetic engineering of humans, or large-scale human cloning might be dangerous. I do not dispute this modest claim.
The stronger claim about AI doom is that doom is likely rather than merely possible. This substantial claim demands much stronger evidence than the weaker claim. The tension I previously raised addresses this stronger claim of probable AI doom (“AI doomerism”), not the weaker claim that advanced AI might be risky.
Many advocates of the strong claim of AI doom explicitly assert that their belief is backed by technical arguments, such as the counting argument for scheming behavior in SGD, among other arguments. However, if the premise of AI doom does not, in fact, rely on such technical arguments, then it is a mistake to argue about these ideas as if they are the key cruxes generating disagreement about AI doom.
I think the word “technical” is a red herring here. If someone tells me a flood is coming, I don’t much care how much they know about hydrodynamics, even if in principle this knowledge might allow me to model the threat with more confidence. Rather, I care about things like e.g. how sure they are about the direction from which the flood is coming, about the topography of our surroundings, etc. Personally, I expect I’d be much more inclined to make large/confident updates on the basis of information at levels of abstraction like these, than at levels about e.g. hydrodynamics or particle physics or so forth, however much more “technical,” or related-in-principle in some abstract reductionist sense, the latter may be.
I do think there are also many arguments beyond this simple one which clearly justify additional (and more confident) concern. But I try to assess such arguments based on how compelling they are, where “technical precision” is one, but hardly the only factor which might influence this; e.g., another is whether the argument even involves the relevant level of abstraction, or bears on the question at hand.
No, the point is that AI x-risk is commonsensical. “If you drink much from a bottle marked poison it is certain to disagree with you sooner or later” even if you don’t know mechanism of action of poison. We don’t expect Newtonian mechanics to prove that hitting yourself with a brick is quite safe, if we’d found that Newtonian mechanics predicts hitting yourself with a brick to be safe, it would be a huge evidence for Newtonian mechanics to be wrong. Good theories usually support common intuitions.
The other thing here is an isolated demand for rigor: there is no “technical understanding of today’s deep learning systems” which predicts, say, success of AGI labs or that their final products are going to be safe.
If we accept your interpretation—that AI doom is simply the commonsense view—then doesn’t that actually reinforce my point? It suggests that the central concern driving AI doomerism isn’t a set of specific technical arguments grounded in the details of deep learning. Instead, it’s based on broader and more fundamental intuitions about the nature of artificial life and its potential risks. To borrow your analogy: the belief that a brick falling on someone’s head would cause them harm isn’t ultimately rooted in technical disputes within Newtonian mechanics. It’s based on ordinary, everyday experience. Likewise, our conversations about AI doom should focus on the intuitive, commonsense cruxes behind it, rather than pretending that the real disagreement comes from highly specific technical deep learning arguments. Instead of undermining my comment, I think your point actually strengthens it.
I don’t think the mainline doom arguments claim to be rooted in deep learning?
Mostly they’re rigorized intuitive models about the nature of agency/intelligence/goal-directedness, which may go some way toward explaining certain phenomena we see in the behavior of LLMs (ie the Palisade Stockfish experiment). They’re theoretical arguments related to a broad class of intuitions and in many cases predate deep learning as a paradigm.
We can (and many do) argue over whether our lens ought to be top-down or bottom-up, but leaning toward the top down approach isn’t the same thing as relying on a-rigorous anxieties of the kind some felt 100 years ago.
To verify this claim, we can examine the blurb in Nate and Eliezer’s new book announcement, which states:
From this quote, I draw two main inferences. First, their primary concern seems to be driven by the nature of existing deep learning technologies. [ETA: To be clear, I mean that it’s the primary factor driving their high p(doom), not that they’d be unconcerned about AI risk without deep learning.] This is suggested by their use of the phrase “anything remotely like current techniques”, which suggests that their core worries stem largely from deep learning rather than all potential AI development pathways. Second, the statement conveys a high degree of confidence in their prediction. This is evident in the fact that the claim is presented without any hedging or uncertainty—there are no phrases like “it’s possible that” or “we think this may occur.” The absence of such qualifiers implies that they see the outcome as highly probable, rather than speculative.
Now, imagine that, using only abstract reasoning available in the 19th century, someone could reasonably arrive at a 5% estimate for the likelihood that AI would pose an existential risk. Then suppose that, after observing the development and capabilities of modern deep learning, this estimate increases to 95%. In that case, I think it would be fair to say that the central or primary source of concern is rooted in the developments in deep learning, rather than in the original abstract arguments. That’s because the bulk of the concern emerged in response to concrete evidence from deep learning, and not from the earlier theoretical reasoning alone. I think this is broadly similar to MIRI’s position, although they may not go quite as far in attributing the shift in concern to deep learning alone.
Conversely, if someone already had a 95% credence in AI posing an existential threat based solely on abstract considerations from the 19th century—before the emergence of deep learning—then it would make more sense to say that their core concern is not based on deep learning at all. Their conviction would have been established independently of modern developments. This latter view is the one I was responding to in my original comment, as it seemed inconsistent with how others—including MIRI—have characterized the origin and basis of their concerns, as I’ve outlined above.
You know better! Eliezer at least has been arguing these points far before DL!
He has been warning of a significant risk of catastrophe for a long time, but unless I’m mistaken, he only began explicitly and primarily arguing for a high probability of catastrophe more recently, around the time deep learning emerged. This distinction is essential to my argument, and was highlighted explicitly by my comment.
Yes, I agree your whole comment sucks. I know you know there is a difference between p(doom) and p(doom|AGI soon), and your reasons for having a high p(doom | AGI soon) and low p(doom) can be very different. Indeed a whole factor of p(AGI soon) different!
So we can get the observed shift with most of the “highly technical DL-specific considerations” mainly updating the p(AGI soon) factor via the incredibly complicated and arcane practice of… extrapolating benchmark scores.
Indeed, the fact AGI seems to be arriving so quickly is the main reason most people are worried!
This is not to say they like deep learning. There can be additional reasons deep learning is bad in their book, but is deep learning a core part of their arguments? Hell no! Do you know how I know? I’ve actually read them! Indeed, if you type
site:arbital.greaterwrong.com “deep learning”into google, you get back two results. Compare withsite:arbital.greaterwrong.com “utility function”, which gives you 5 pages. Now which do you think is more central to their high p(doom | AGI in 5 years)?I wasn’t asking for your evaluation of the rest of my comment. I was clarifying a specific point because it seemed you had misunderstood what I was saying.
If someone says their high p(doom) is driven by short timelines, what they likely mean is that AGI is now expected to arrive via a certain method—namely, deep learning—that is perceived as riskier than what might have emerged under slower or more deliberate development. If that’s the case, it directly supports my core point.
This explanation makes sense to me since expecting AGI to arrive soon doesn’t by itself justify a high probability of doom. After all, it would have been reasonable to have always believed AGI would come eventually, and it would have been unjustified to increase one’s p(doom) over time merely because time is passing.
I think you’re conflating two distinct issues: first, what initially made people worry about AI risk at all; and second, what made people think doom is likely as opposed to merely a possibility worth taking seriously. I’m addressing the second point, not the first.
Please try to engage with what I’m actually saying, rather than continuing to misrepresent my position.
It seems everyone has this problem with your writing, have you considered speaking more clearly or perhaps considering people understand you fully and it is you who are wrong?
In this case, I believe it’s the latter, since
Really? I thought your core point was
In which case I did explain why there is no tension, as can be seen from my saying
That is, it is a very strange thing to say there is a “significant tension” between having high p(doom | AGI soon) on first principles reasoning, and to have p(AGI soon) get updated by benchmark scores.
This is o3′s take, for what it’s worth:
I have custom instructions turned off, and I haven’t turned on the memory feature, so there’s no strong reason to expect it to behave sycophantically (that I’m aware of). And o3 said it doesn’t know which side I’m on. I expect most other LLMs will say something similar when given neutral prompts and the full context.
(Not that this is strong evidence. But I think it undermines your claim by at least a bit.)
o3 has the same conclusion with a slightly different prompt.
That link seems to be broken.ETA: now fixed by Thomas.oops, this was on my work account from which you can’t make public links. Replaced the link with the prompt and beginning of o3 output.
I get the same result with Claude, but when I push at all it caves & says I understand you fine.
It seems a crux is what you mean by “tension”.
Just to offer my two cents, I do not have this problem and I think Matthew is extremely clear.
Can you rephrase his argument in your own words? In particular, define what “tension” means.
I reject the premise. In general, my writing is interpreted significantly more accurately when I’m not signaling skepticism about AI risk on LessWrong. For most other topics, including on this site, readers tend to understand my points reasonably well, especially when the subject is less controversial.
This could perhaps mean I’m uniquely unclear when discussing AI risk. It’s also very plausible that the topic itself is unusually prone to misrepresentation. Still, I think a major factor is that people are often uncharitable toward unpopular viewpoints they strongly disagree with, which accounts for much of the pushback I receive on this subject.
Specifically, the idea is that AI going well for humans would require a detailed theory of how to encode human values in form suitable for machine optimization, and the relevance of deep learning is that Yudkowsky and Soares think that deep learning is on track to provide the superhuman optimization without the theory of values. You’re correct to note that this is a stance according to which “artificial life is by default bad, dangerous, or disvaluable,” but I think the way you contrast it with the claim that “biological life is by default good or preferable” is getting the nuances slightly wrong: independently-evolved biological aliens with superior intelligence would also be dangerous for broadly similar reasons.
Didn’t you have a post where you argued that it’s a consequence of their view that biological aliens are better, morally speaking, than artificial earth originating life, or did I misunderstand?
To the extent that you’re saying “I’d like to have more conversations about why creating powerful agentic systems might not go well by default; for others this seems like a given, and I just don’t see it”, I applaud you and hope you get to talk about this a whole bunch with smart people in a mutually respectful environment. However, I do not believe analogizing the positions of those who disagree with you with luddites from the 19th century (in particular when thousands of pages of publicly available writings, with which you are familiar, exist) is the best way to invite those conversations.
Quoting the first page of a book as though it contained a detailed roadmap of the central (60,000-word) argument’s logical flow (which to you is apparently the same as a rigorous historical account of how the authors came to believe what they believe) — while it claims to do nothing of the sort — simply does not parse at all. If you read the book (which I recommend, based on your declared interests here), or modeled the pre-existing knowledge of the median book website reader, you would not think “anything remotely like current techniques” meant “we are worried exclusively about deep learning for deep learning-exclusive reasons; trust us because we know so much about deep learning.”
If you find evidence of Eliezer, Nate, or similar saying “The core reason I am concerned about AI safety is [something very specific about deep learning]; otherwise I would not be concerned”, I would take your claims about MIRI’s past messaging very seriously. As is, no evidence exists before me that I may consider in support of this claim.
Based on what you’ve said so far, you seem to think that all of the cruxes (or at least the most important ones) must either be purely intuitive or purely technical. If they’re purely intuitive, then you dismiss them as the kind of reactionary thinking someone from the 19th century might have come up with. If they’re purely technical, you’d be well-positioned to propose clever technical solutions (or else to discredit your interlocutor on the basis of their credentials).
Reality’s simply messier than that. You likely have both intuitive and technical cruxes, as well as cruxes with irreducible intuitive and technical components (that is, what you see when you survey the technical evidence is shaped by your prior, and your motivations, as is true for anyone; as was true for you when interpreting that book excerpt).
I think you’re surrounded by smart people who would be excited to pour time into talking to you about this, conditional on not opening that discussion with a straw man of their position.
To clarify, I am not analogizing the positions of those who disagree with me with luddites from the 19th century. This is not my intention, nor was it my argument.
I think we’re talking past each other here, so I will respectfully drop this discussion.
Contemporary AI existential risk concerns originated prior to it being obvious that a dangerous AI would likely involve deep learning, so no one could claim that the arguments that existed in ~2010 involved technical details of deep learning, and you didn’t need to find anything written in the 19th century to establish this.