The authors argue that people are trying to build ASI (superintelligent AI), and we should expect them to succeed sooner or later, even if they obviously haven’t succeeded YET. I agree. (I lean “later” more than the authors, but that’s a minor disagreement.)
Ultra-fast minds that can do superhuman-quality thinking at 10,000 times the speed, that do not age and die, that make copies of their most successful representatives, that have been refined by billions of trials into unhuman kinds of thinking that work tirelessly and generalize more accurately from less data, and that can turn all that intelligence to analyzing and understanding and ultimately improving themselves—these minds would exceed ours.
The possibility of a machine intellect that manages to exceed human performance in all pragmatically important domains in which we operate has been called many things. We will describe it using the term “superintelligence,” meaning a mind much more capable than any human at almost every sort of steering and prediction problem—at least, those problems where there is room to substantially improve over human performance.[ii] …
(It sounds like sci-fi, but remember that every technology is sci-fi until it’s invented!)
They further argue that we should expect people to accidentally make misaligned ASI, utterly indifferent to whether humans live or die, even its own creators. They have a 3-part disjunctive argument:
(A) Nobody today has a plausible plan to make ASI that is not egregiously misaligned. It’s an inherently hard technical problem. Current approaches are not on track.
(B) Even if (A) were not true, there are things about the structure of the problem that make it unlikely we would solve it, e.g.:
(B1) Like space probes, you can’t do perfectly realistic tests in advance. No test environment is exactly like outer space. And many problems are unfixable from the ground. Likewise, if ASI has an opportunity to escape control, that’s a new situation, and there’s no do-over.
(B2) Like nuclear reactors, building ASI will involve fast-moving dynamics, narrow margins for error, and self-amplification, but in a much more complicated and hard-to-model system.
(B3) Like computer security, there can be adversarial dynamics where the ASI is trying to escape constraints, act deceptively, cover its tracks, and find and exploit edge cases.
Anyway, I agree with the conclusion of (A) but disagree with much of the book’s argument for it, as I have discussed many times (e.g. §3 of my “Sharp Left Turn” post). I think their arguments for (B) & (C) are solid. …And sufficient by themselves! It seems overdetermined!
The authors propose to get an international treaty to pause progress towards superintelligence, including both scaling & R&D. I’m for it, although I don’t hold out much hope for such efforts to have more than marginal impact. I expect that AI capabilities would rebrand as AI safety, and plow ahead:
The problem is: public advocacy is way too centered on LLMs, from my perspective. Thus, those researchers I mentioned, who are messing around with new paradigms on arXiv, are in a great position to twist “Pause AI” type public advocacy into support for what they’re doing!
“You don’t like LLMs?”, the non-LLM AGI capabilities researchers say to the Pause AI people, “Well how about that! I don’t like LLMs either! Clearly we are on the same team!”
This is not idle speculation—almost everyone that I can think of who is doing the most dangerous kind of AI capabilities research, the kind aiming to develop a new more-powerful-than-LLM AI paradigm, is already branding their work in a way that vibes with safety. For example, see here where I push back on someone using the word “controllability” to talk about his work advancing AI capabilities beyond the limits of LLMs. Ditto for “robustness” (example), “adaptability” (e.g. in the paper I was criticizing here), and even “interpretability” (details).
I think these people are generally sincere but mistaken, and I expect that, just as they have fooled themselves, they will also successfully fool their friends, their colleagues, and government regulators…
(source). For my part, I’m gonna keep working directly on (A). I think the world will be diving into the whirling knives of (A–C), sooner or later, and we’d better prepare as best we can.
The target audience of the book is not AI alignment experts like me, but rather novices. I obviously can’t speak from personal experience as to whether it’s a good read for those people, but anecdotally lots of people seem to think it is. So, I recommend the book to anyone.
I would start by saying that I mostly agree with you here. On this point specifically, however,
AI capabilities would rebrand as AI safety
I mean, 3 of the leading AI labs (DeepMind, OpenAI, Anthropic) were founded explicitly under or attached to the banner of AI safety. OpenAI and Anthropic were even founded as “the safer alternatives” to DeepMind and OpenAI! You also don’t have to go back very far to find AI safety funders and community voices promoting those labs as places to work to advance AI safety (whereas today you’d be hard-pressed to find someone on this forum advocating for working for OpenAI). So I would say that either what you are saying has already more or less come to pass, or that there is some blurriness about these categories that makes trying to draw a firm line quite difficult. I think a bit of both are true.
The authors propose to get an international treaty to pause progress towards superintelligence, including both scaling & R&D. I’m for it, although I don’t hold out much hope for such efforts to have more than marginal impact. I expect that AI capabilities would rebrand as AI safety, and plow ahead:
The problem is: public advocacy is way too centered on LLMs, from my perspective. Thus, those researchers I mentioned, who are messing around with new paradigms on arXiv, are in a great position to twist “Pause AI” type public advocacy into support for what they’re doing!
[...]
I think these people are generally sincere but mistaken, and I expect that, just as they have fooled themselves, they will also successfully fool their friends, their colleagues, and government regulators…
This seems way too pessimistic to me. (Or like sure it’s going to be hard and I’m not super optimistic, but given that you’re also relatively pessimistic the international AI R&D shutdown approach doesn’t seem too unpromising to me.)
Sure they are going to try to convince government regulators that their research is great for safety, but we’re going to try to convince the public and regulators otherwise.
I mean it’s sorta understandable to say that we currently seem to be in a relatively weak position and getting sufficient change seems hard, but movements can grow quickly. Yeah understandable that this doesn’t seem super convincing, but I think we have a handful of smart people who might be able to find ways to effectively shift the gameboard here. Idk.
More to the point though, conditional that we manage to internationally ban AI R&D, it doesn’t obviously seem that much more difficult or that much less likely that we manage to also ban AI safety efforts which can lead to AI capability increases, based on the understanding that those efforts are likely delusional and alignment is out of reach. (Tbc I would try to not ban your research, but given that your agenda is the only one I am aware of into which I put significantly more than 0 hope, it’s not clear to me that it’s worth overcomplicating the ban around that.)
Also in this common knowledge problem domain, self-fulfilling prophecies are sorta a thing, and I think it’s a bit harmful to the cause if you post on twitter and bluesky that you don’t have much hope in government action. Tbc, don’t say the opposite either, keep your integrity, but maybe leave the critizism on lesswrong? Idk.
Quick book review of “If Anyone Builds It, Everyone Dies” (cross-post from X/twitter & bluesky):
Just read the new book If Anyone Builds It, Everyone Dies. Upshot: Recommended! I ~90% agree with it.
The authors argue that people are trying to build ASI (superintelligent AI), and we should expect them to succeed sooner or later, even if they obviously haven’t succeeded YET. I agree. (I lean “later” more than the authors, but that’s a minor disagreement.)
(It sounds like sci-fi, but remember that every technology is sci-fi until it’s invented!)
They further argue that we should expect people to accidentally make misaligned ASI, utterly indifferent to whether humans live or die, even its own creators. They have a 3-part disjunctive argument:
(A) Nobody today has a plausible plan to make ASI that is not egregiously misaligned. It’s an inherently hard technical problem. Current approaches are not on track.
(B) Even if (A) were not true, there are things about the structure of the problem that make it unlikely we would solve it, e.g.:
(B1) Like space probes, you can’t do perfectly realistic tests in advance. No test environment is exactly like outer space. And many problems are unfixable from the ground. Likewise, if ASI has an opportunity to escape control, that’s a new situation, and there’s no do-over.
(B2) Like nuclear reactors, building ASI will involve fast-moving dynamics, narrow margins for error, and self-amplification, but in a much more complicated and hard-to-model system.
(B3) Like computer security, there can be adversarial dynamics where the ASI is trying to escape constraints, act deceptively, cover its tracks, and find and exploit edge cases.
(C) EVEN IF (A) & (B) were not issues, we’re still on track to fail because AI companies & researchers are not treating this as a serious problem with billions of lives on the line. For example, in the online supplement, the authors compare AI culture to other endeavors with lives at stake. The authors cite the FAA, which will indefinitely ground planes if something seems amiss, do exhaustive 200-page postmortems of problems, etc. Meanwhile AI companies claim to be close to ASI, but when wild AI misbehavior happens, they don’t “halt, melt, and catch fire”, but rather plow ahead, deploy surface-level patches, publicly downplay the issue, and meanwhile launch an extraordinarily aggressively lobbying campaign against any government oversight whatsoever.
Anyway, I agree with the conclusion of (A) but disagree with much of the book’s argument for it, as I have discussed many times (e.g. §3 of my “Sharp Left Turn” post). I think their arguments for (B) & (C) are solid. …And sufficient by themselves! It seems overdetermined!
The authors propose to get an international treaty to pause progress towards superintelligence, including both scaling & R&D. I’m for it, although I don’t hold out much hope for such efforts to have more than marginal impact. I expect that AI capabilities would rebrand as AI safety, and plow ahead:
(source). For my part, I’m gonna keep working directly on (A). I think the world will be diving into the whirling knives of (A–C), sooner or later, and we’d better prepare as best we can.
The target audience of the book is not AI alignment experts like me, but rather novices. I obviously can’t speak from personal experience as to whether it’s a good read for those people, but anecdotally lots of people seem to think it is. So, I recommend the book to anyone.
I would start by saying that I mostly agree with you here. On this point specifically, however,
I mean, 3 of the leading AI labs (DeepMind, OpenAI, Anthropic) were founded explicitly under or attached to the banner of AI safety. OpenAI and Anthropic were even founded as “the safer alternatives” to DeepMind and OpenAI! You also don’t have to go back very far to find AI safety funders and community voices promoting those labs as places to work to advance AI safety (whereas today you’d be hard-pressed to find someone on this forum advocating for working for OpenAI). So I would say that either what you are saying has already more or less come to pass, or that there is some blurriness about these categories that makes trying to draw a firm line quite difficult. I think a bit of both are true.
This seems way too pessimistic to me. (Or like sure it’s going to be hard and I’m not super optimistic, but given that you’re also relatively pessimistic the international AI R&D shutdown approach doesn’t seem too unpromising to me.)
Sure they are going to try to convince government regulators that their research is great for safety, but we’re going to try to convince the public and regulators otherwise.
I mean it’s sorta understandable to say that we currently seem to be in a relatively weak position and getting sufficient change seems hard, but movements can grow quickly. Yeah understandable that this doesn’t seem super convincing, but I think we have a handful of smart people who might be able to find ways to effectively shift the gameboard here. Idk.
More to the point though, conditional that we manage to internationally ban AI R&D, it doesn’t obviously seem that much more difficult or that much less likely that we manage to also ban AI safety efforts which can lead to AI capability increases, based on the understanding that those efforts are likely delusional and alignment is out of reach. (Tbc I would try to not ban your research, but given that your agenda is the only one I am aware of into which I put significantly more than 0 hope, it’s not clear to me that it’s worth overcomplicating the ban around that.)
Also in this common knowledge problem domain, self-fulfilling prophecies are sorta a thing, and I think it’s a bit harmful to the cause if you post on twitter and bluesky that you don’t have much hope in government action. Tbc, don’t say the opposite either, keep your integrity, but maybe leave the critizism on lesswrong? Idk.