The statement “IABIED” is true even if the book IABIED is mostly false
There are many critical posts on LW about If Anyone Builds It, Everyone Dies.
There are detailed disagreements with particular arguments, object-level claims, and—to a lesser extent—technical assertions. But I think much of this criticism conflates three distinct propositions that deserve separate evaluation:
The arguments in the book are sound. The entire body or majority of specific arguments, examples, and reasoning chains presented in IABIED are valid and persuasive.
The title claim is true. The statement “if anyone builds it, everyone dies” accurately describes our situation with AGI development.
The policy recommendation is correct. We should “shut it all down” via the specific interventions Yudkowsky and Soares propose.
These three claims have different truth conditions and require different standards of evidence. Yet I observe many critics treating them as a package deal—rejecting (2) and (3) primarily on the basis of disagreeing with (1).
Personally, I find the arguments in IABIED straightforward and valid. I’m genuinely surprised by the degree of pushback from LessWrong, though this probably reflects my own bubble among rationalists and AI safety people. But this post isn’t about relitigating those object-level arguments.
Because, I believe that the authors have made a compelling case that even if >95% of their specific arguments are incorrect, the core claim “if anyone builds it, everyone dies” still holds true.
The case for x-risk from AGI doesn’t rest on any single argument being ironclad. It rests on the conjunctive claim that we need to solve all of these problems simultaneously under severe time pressure, and the problems are diverse, hard, independent, and their equivalents are not usually solved in a way which is required for ASI to work out well.
What puzzles me even more is the resistance to (3) given acceptance of some substantial probability of (2).
The logical structure here should be clear: Inasmuch as “if anyone builds it, everyone dies” (2) doesn’t require full endorsement of every argument in the book (1), the proposal to “shut it all down” (3) doesn’t require certainty about (2) either.
To say it very trivially, we don’t need P(doom) = 0.99 to justify extraordinary precautions. We just need it to be (relatively) non-negligible, and we need the stakes to be astronomical.
Which they are.
So here’s my ask for critics of IABIED: Please make it much more explicit why rejecting (1) justifies rejecting (2) or (3) in your particular case.
What’s the specific logical connection you’re drawing? Are you claiming that:
All the arguments are so flawed that the probability of (2) drops below some threshold?
The correlation between the quality of specific arguments and claim validity is so tight that (1) being false for some particular arguments makes (2) unlikely?
The policy implications of (3) are so costly that only absolute certainty about (2) would justify them? Many people say that “shutting down” is “unrealistic”. But the feasibility of implementation is not the same as the desirability of implementation, no?
You do not reject (2) or (3), but just (1)? But then, make it clearer, please!
Something else entirely?
I don’t believe this at all, and I’m not sure that you do either. I do believe that the title claim IABIED is largely true, but believe very much more strongly that it would be false if >95% of the arguments in the book were incorrect.
I’m not sure whether you are being hyperbolic with the “>95%” claim, or have actually gone through a sample of at least 50 arguments in the book and seriously examined what the world would look like if at most 2 of those were correct with all the rest failing to hold.
From what I’ve seen, the title claim would be seriously in doubt if even half of the arguments failed. Mainly because the world would necessarily be extraordinarily different in major respects from the way I or the authors believe that it is.
I’m still writing up and refining my thoughts on the book, so I’ll be brief here and may be skipping some steps. But a few points of criticism:
If (2) is true and (1) is false (that is, if doom is likely and the book fails to make a strong case for it), then it is hard to imagine an amount of criticism that the book does not deserve. I’m pretty skeptical of the thesis of the book, but if I were less skeptical, I think I would be even more critical of the book itself.
We can accept both (1) and (2) and reject (3) if the probability of reaching AGI is low enough. Are we more likely to successfully solve alignment (or to forestall AGI forever) by putting our effort here? This is only plausible if LLMs in particular are very likely to become AGI.
I’m also not at all convinced by (2), and I don’t think the book addresses my concerns even a bit? A colossal superintelligence, generating in a day insights that would take all of humanity a decade, which developed a desire to not just satisfy but optimize even its smallest desires to the largest possible degree (even for things it was not trained to optimize), would probably destroy humanity, yes, even if its training gave it instincts based in human though. I’d put p(doom) at maybe 80% in that scenario. But why would this scenario happen?
As I understand it, the normal argument here tries to say that the only way to have goals really is to optimize a utility function that you can define and most utility functions don’t have a big positive coefficient on “humans exist” so doom. This argument fails for reasons that I’m hoping to type up when I’m confident I’ve understood the book (and associated work) better. I do not think we have any appreciable chance of creating this type of superintelligence by anything that could plausibly develop from modern methods, and I think we’re more likely to get this sort of superintelligence if we give ourselves no information whatsoever about what machine intelligence might look like. Shutting everything down right now doesn’t seem like it could plausibly help more than it would hurt.
I also want to push back on your core claim here, which reflects a dangerous way of thinking:
This is never true? No argument is so compelling that it can be wrong about every claim of fact and still convince skeptics, because skeptics do not believe that they are wrong about the facts. If you believe something like this about any topic, then you have failed to model people who disagree with you (not failed to model them accurately, failed to model them at all) and you probably shouldn’t be seeking out productive disagreement until you fix that.