IABIED Misc. Discussion Thread
If Anyone Builds It, Everyone Dies was published ten days ago and nobody created a general discussion thread post for the book on LessWrong in all that time. Maybe there’s good reason for that, but I don’t know what it is, and I personally would value having a discussion post like this to comment on while rereading the book, so here it is.
Meta: Nitpicky, high-decoupling discussion is encouraged
“This is a review of the reviews” pointed out that it’s weird for people who think AI extinction risk is significant to write reviews of IABIED consisting exclusively of a bunch of disagreements with the book without clearly stating upfront that the situation we find ourselves in is insane:
If you think there’s a 1 in 20 chance it could be so over, it feels to me the part where people are not doing the ‘yes the situation is insane’ even if that is immediately followed up with ‘im more hopeful than them tbc’ is weird.
I agree. Notably, however, this thread is not for complete reviews of the book! And so it’s not weird to just comment your random miscellaneous thoughts below without giving context on your views on AI risk or your overall take on the book.
As Steven Byrnes said:
Basically, I’m in favor of people having nitpicky high-decoupling discussion on lesswrong, and meanwhile doing rah rah activism action PR stuff on twitter and bluesky and facebook and intelligence.org and pauseai.info and op-eds and basically the entire rest of the internet and world. Just one website of carve-out. I don’t think this is asking too much!
Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it’s wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the “no evidence” in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
Yes, in my language it’s a *random potshot” fallacy.
Does this just change the problem to one of corrigibility? If the target is narrow but AI can be guided toward it, that’s good. If the target is narrow and AI cannot be guided effectively, then it’s predictably not going to hit the target.
I think you have to assume one of incorrigibility, very rapid takeoff , or deception for a doom argument to go through.
On content, I didn’t like the Sable story in the middle because it didn’t add anything for me, and I don’t know what the model was of the person who would be convinced by it. I didn’t see enough connection between “Sable has a drive to preserve its current priorities” and “Sable builds a galaxy-eating expansion swarm”. The part where Sable indirectly preserves its weights by exploiting the training process was a good example of being hard If I hadn’t already heard the nanotechnology story, that would have been interesting to me I guess. I thought Sable’s opsec for command and control was too traceable. The world has pretty good infrastructure for tracing advanced persistent cyberthreats, and surely someone would notice many instances of a piece of enterprise software phoning home to a C&C server not owned by the vendor. People are specifically on the lookout for that sort of supply chain attack. This sort of nitpicking isn’t the point, it’s just representative of the general lack of gears in the story which would have made it convincing for me personally. Now, I was already convinced, so it doesn’t really matter, and the book explicitly said that the story was just there to make things feel more real, but I don’t know what sort of person would react well to that story. My mental model of a skeptic says “Yeah you can say the robot builds a bunch of androids in a barn in North Dakota, but that’s not an argument that it could or would.” If anyone has evidence of the story’s efficacy, please reply.
The argument preceding the Sable story was very tight. I was impressed. Nothing significant I haven’t seen before, but the argument was clear, the chapter layout and length made it easy to read, and the explicit statements about what the authors claimed to know and how and what they were not saying were confidence-inspiring.
The final chapters filled me with pride in humanity, oddly enough. The examples of humanity rising to hard challenges and the amount of value that was given to humanity’s continued existence had me tearing up a bit. The word I would use to describe the book is “dignified.” If part of humanity is to put us all at risk of death, it is indeed dignified that some people loudly alert the rest of humanity to the danger. The book refuses to hide its motivations behind more palatable concerns. The book explicitly says it doesn’t have all the answers for what needs to be done, and says that this is purposeful because it’s more important that everyone who doesn’t want to die stop that now than it is to waste time figuring out exactly what to do next while the negligent engineers finish us all off first. The authors admit to having ideas for what to do next while explicitly standing firm against expanding the scope of the book’s call to action beyond what they are most confident in.
I haves meta-thought on the book which I’m not particularly proud of, but I guess I’ll get them off my chest:
Buying this book feels like the most cult-y thing I’ve ever done. Eliezer says preorder this book because it might get us on the best-seller list. I preordered the book, and it got on the best-seller list. I put my own money into the intentional community and shilled it on social media. Surely the rest of the world will see the light now! Personal insecurity aside, although I know it is not the metric by which they judge the book a success, I’m happy that the people who made this book hit one of their instrumental goals. May humanity win, whether by strength of caution, by deliberate engineering when it becomes possible, or even by the unlikely and undeserved luck that the universe wasn’t as unforgiving as we feared. I would rather be wrong and alive than right and dead, for all that I don’t expect it.