Statement of Support for “If Anyone Builds It, Everyone Dies”
Mutual-Knowledgeposting
The purpose of this post is to build mutual knowledge that many (most?) of us on LessWrong support If Anyone Builds It, Everyone Dies.
Inside of LW, not every user is a long-timer who’s already seen consistent signals of support for these kinds of claims. A post like this could make the difference in strengthening vs. weakening the perception of how much everyone knows that everyone knows (...) that everyone supports the book.
Externally, people who wonder how seriously the book is being taken may check LessWrong and look for an indicator of how much support the book has from the community that Eliezer Yudkowsky originally founded.
The LessWrong frontpage, where high-voted posts are generally based on “whether users want to see more of a kind of content”, wouldn’t by default map a large amount of internal support for IABIED into a frontpage that signals support. What seems to emerge from the current system is an active discussion of various aspects of the book including well-written criticisms, disagreements and nitpicks.
Statement of Support
I support If Anyone Builds It, Everyone Dies.
That is:
I think the book’s thesis is likely, or at least all-too-plausibly right: That building an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, will cause human extinction.
I think the world where the book becomes an extremely popular bestseller is much better on expectation than the world where it doesn’t
I generally respect MIRI’s work and consider it underreported and underrated
Similarity to the CAIS Statement on AI Risk
The famous 2023 Center for AI Safety Statement on AI risk reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
I’m extremely happy that this statement exists and has so many prominent signatories. While many people considered it too obvious and trivial to need stating, many others who weren’t following the situation closely (or are motivated to think otherwise) had assumed there wasn’t this level of consensus on the content of the statement across academia and industry.
Notably, the statement wasn’t a total consensus that everyone signed, or that everyone who signed agreed with passionately, yet it still documented a meaningfully widespread consensus, and was a hugely valuable exercise. I think LW might benefit from having a similar kind of mutual-knowledge-building Statement on this occasion.
I think this is an absurd statement of the book’s thesis: the book is plainly saying something much stronger than that. How did you pick 15% rather than for example 80% or 95%?
I am basically on the fence about the statement you have made. For reasons described here, I think P(human extinction|AI takeover) is like 30%, and I separately think P(AI takeover) is like 45%.
I think this is probably right, but it’s unclear.
It depends on what you mean by “work”. I think their work making AI risk arguments at the level of detail represented by the book is massively underreported and underrated (and Eliezer’s work on making a rationalist/AI safety community is very underrated). I think that their technical work is overrated by LessWrong readers.
Yeah there’s always going to be a gray area when everyone is being asked if their complex belief-state maps to one side of a binary question like endorsing a statement of support.
I’ve updated the first bullet to just use the book’s phrasing. It now says:
I think the book’s thesis is likely, or at least all-too-plausibly right: That building an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, will cause human extinction.
I don’t think LessWrong is the right venue for this, or really, I don’t want it to become that venue. I basically agree with Steven Byrnes here, but generalized to also include rallying others to agree to stuff. I think its reasonable to make a post arguing people should publicly declare their stances on such things.
I think this sort of thing skirts too close to group-think.
Further, I don’t know what it means to “support” a book. Does the book need help? How can that be? It’s a book! See also: Ethnic Tension And Meaningless Arguments, in particular
I know you clarify what you mean by “support” later, and of course you are committing a much smaller sin than the religious person in the example, but if someone said the “statement of support” you outline, what they are literally saying is meaningless. That statement can only be understood in the context of a political game they’re playing with the rest of society.
Now that political game may be necessary and useful, but I’d rather LessWrong not be a battlefield for such games, and remain the place everyone can go to to think, not fight.
Edit: This comment appeared on Liron’s recent Doom Debates podcast. I want to clarify that as far as I can tell, the only disagreement I have with anything either Liron or Holly were saying was about whether LessWrong should be a venue for political games, not about the merits of mutual knowledge, or the value of playing politics, or the benefits of coordination, or much really. I think their frames argument is more nuanced than presented, but I’m sympathetic to that position too. I even agree with their characterization of “support” as not actually meaningless. Indeed, it is meaningful within the political game we’re playing, as I say above.
What I disagree with is about whether LessWrong ought to be a venue for playing politics. Politics is the Mind-Killer, and when politics is around, arguments become soldiers, surveys become statements of support, and nuance gets trampled.
During my below conversation with Liron, it wasn’t clear to me that he was arguing that signing such a statement of support would be a good political move, I thought his argument was on purely epistemic grounds (though I did believe that his goal was politics anyway). Politically, a statement of support makes sense. Ethnic Tension games work, after all. Our argument, as I see it now, is about whether Ethnic Tension games ought to be played on LessWrong.
I can only say to that I don’t believe this would be satisfying for the current users, is moderator consensus, that if such games start getting played more I’m sure to use LessWrong less, and it would be a step towards destroying a very special place on the Internet. Maybe that step is worth it, but it shouldn’t be taken without thought and acknowledgement of the cost.
If someone said, “I think [IABIED]’s thesis is likely, or at least all-too-plausibly right: That building an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, will cause human extinction.” — you think that’s meaningless?
I dunno what I’m not understanding. That statement seems extremely meaningful and important to me.
The statement I’m referring to is “I support If Anyone Builds It, Everyone Dies.”. That is the meaningless statement.
Oh ok, but I’m just using it as a shorthand for the longer statement I wrote write under it, which I guess you agree is clear.
“I support If Anyone Builds It, Everyone Dies.” is a really bad shorthand though, because it doesn’t mean anything. Its playing the Ethnic Tension game when it could very easily not be, eg by using the phrase “I agree with If Anyone Builds It, Everyone Dies.”
I’m open to other shorthands.
“Support” is kind of weak. To make it like the CAIS statement, maybe “I largely agree with IABIED”. Or “I ~agree with IABIED”. Or “I agree* with IABIED [bring your own footnotes]” 🙂
”Statement of ~agreement with IABIED”
Ah yeah that’d probably be better
Your shorthand is very fine for building a political coalition, and I’m pretty sure that is your goal, so I think you shouldn’t change it. Just don’t do it on LessWrong!
I’m not sure what we disagree about here? This current post is a place where people can publicly declare their stances. Ideally it would have a poll or aggregation feature in the main post, because I’m not even asking people to put their names on their declaration, I’m happy to just have the community infrastructure aggregate a high fraction of people’s existing views into mutual knowledge.
I’m not clear whether you think your argument generalizes to claiming the CAIS statement was skirting too close to group-think too?
What I liked about the CAIS statement was that it enabled people to know that top academics and industry leaders were significantly agreeing on an extremely important belief. Without that kind of mutual-knowledge-building statement, the natural course of events is that it’s actually quite hard to tell what people think about the most basic claims. Even today, I find myself constantly pointing to the CAIS letter when people on the internet claim that AI x-risk is a totally unserious belief.
I think the situation we find ourselves on LW is that the community hasn’t been polled or aggregated in a meaningful way, on a belief that seems to me about as consensus to us as the CAIS statement was to experts in AI.
The strongest part of the argument to me is that “LW isn’t the place for it”. Fair enough, LW can be whatever people want… I just think it’s sad that no existing place was ever formed for this important missing piece of community mutual-knowledge building, except the place founded by the guy and the idea that currently could have used the support.
Aggregating people’s existing belief-states into mutual knowledge seems like good “thinking” to me—part of the infrastructure that makes thinking more effective at climbing the tree of knowledge.
What did the CAIS letter do?
Restating in my own words, it got a bunch of well-respected, legibly smart people in AI and other areas to sign a letter saying they think AI risk is important. This is clearly very useful for people who don’t know anything about AI, since they must defer to some community to tell them about AI, and the CAIS letter was targeting the community society delegates with that task. It worked because the general public cared what the people who signed the letter had to say, and so then people changed their minds.
This statement of support is (as you say) to be used similarly to the CAIS letter. That is, to convince people who care what LessWrongers have to say that AI risk is a big deal. But the only people who care what LessWrongers have to say are other LessWrongers!
This is where the group think comes in, and this is what makes your statement of support both much more useless and much more harmful than the CAIS letter. Suppose your plan works. That there are LessWrongers who get convinced by the number of LessWrongers who have signed your statement of support. Then they will sign the statement, but then that will make the statement more convincing, and more people will sign, and so on. That is the exact dynamic which causes group think to occur.
I’m not saying this is likely, but 1) There are degrees of group think, and even if you don’t have a run-away explosion of group think, you can cause more or less of it, and this definitely seems like it’d cause more group think, and 2) This is your plan, this is how the CAIS letter worked. If this sequence of events is unlikely, then your plan is unlikely to even work.
Not like this, this is not how unbiased, truth-seeking surveys look!
I disagree with that premise. The goal of LessWrong, as I understand it, is to lead the world on having correct opinions about important topics. I would never assume away the possibility of that goal.
Well then you’re wrong on both counts, as well as the reasoning you’re using to derive the second count from the first.
LessWrong is not about leading the world, indeed note that there is a literal about page with a “What LessWrong is About” section, which of course says that LessWrong is about “apply[ing] the rationality lessons we’ve accumulated to any topic that interests us”, with not a lick of mention about leadership or even communities other than LessWrong except to say “Right now, AI seems like one of the most (or the most) important topics for humanity”, which does not sound like LessWrong is trying for a leadership role on the subject (though individual LessWrongers may).
Second, even if it were true that LessWrong’s goal is to have others care what LessWrong thinks, we do not have any guarantee LessWrong is succeeding at that goal. so your inference is simply invalid and frankly absurd.
Edit: Perhaps this is where we disagree too, about the purpose of LessWrong, somewhere along the line you got the mistaken impression that LessWrong would like to try leading.
I’m happy to agree on the crux that if one accepts “the only people who care what LessWrongers have to say are other LessWrongers” (which I currently don’t), then that would weaken the case for mutual knowledge — I would say by about half. The other half of my claim is that building mutual knowledge benefits other LessWrongers.
I have argued both that your argument for why “The goal of LessWrong [...] is to lead the world on having correct opinions about important topics” is false, why even if it was true this would not therefore imply people outside LessWrong care about the views of LessWrong, and why your particular strategy to build “mutual knowledge” is misguided & harmful to LessWrongers. So far I have seen zero real engagement on these points from you.
Especially on the “building mutual knowledge benefits other LessWrongers.” (implicitly, mutual knowledge via your post here) point, you have just consistently restated your belief ‘mutual knowledge is good’ as if it were an argument. That is not how arguments work, nor is my argument even “mutual knowledge is bad”.
I see this comment as continued minimal engagement. You don’t have to change your mind! It would be fine to say “I don’t know how to respond to your arguments, they are good, but not convincing for reasons I don’t know why”, but instead you post this conversation on X (previously Twitter)[1] as if its obvious why I’m wrong, calling it “crab bucketing”.
I just feel like your engagement here has been disingenuous, even if polite when talking here on LessWrong.
Note: I don’t follow you on X (previously Twitter), nor do I use X (previously Twitter), but I did notice a change in voting patterns, so I figured someone (probably you) shared the conversation.
What’s the issue with my Twitter post? It just says I see your comment as representative of many LWers, and the same thing I said in my previous reply, that aggregating people’s belief-states into mutual knowledge is actually part of “thinking” rather than “fighting”.
I find the criticism for my quality of engagement in this thread distasteful, as I’ve provided substantive object-level engagement with each of your comments so far. I could equally criticize you for bringing up multiple sub-points per post that leave me no way to respond in a time-efficient way without being called “minimal”, but I won’t, because I don’t see either of our behaviors so far as breaking out of the boundaries of productive LessWrong discourse. My claim about this community’s “crab-bucketing” was a separate tweet not intended as a reply to you.
Ok, I’ll pick this sub-argument to expand on. You correctly point out that what I wrote does not text-match the “What LessWrong is about” section. My argument would be that this cited quote:
As well as Eliezer’s post about “Something to protect”—imply that a community that practices rationality ought to somehow optimize the causal connection between their practice of rationality and the impact that it has.
This obviously leaves room for people to have disagreeing interpretations of what LessWrong ought to do, as you and I currently do.
I think my responses have all had at most two distinct arguments, so I’m not sure in what sense I’m “bringing up multiple sub-points per post that leave [you] no way to respond in a time-efficient way without being called ’minimal’”. In the case that I am, that is also what the ‘not worth getting into’ emoji is for.
(other than this one, which has three)
Room for disagreement does not imply any disagreement is valid.
It says
But this is not what I’m doing, nor what I’ve argued for anywhere, and as I’ve said before,
For example, I really like the LessWrong surveys! I take those every year!
I just think your post is harmful and uninformative.
What’s the minimally modified version of posting this “Statement of Support for IABIED” you’d feel good about? Presumably the upper bound for your desired level of modification would be if we included a yearly survey question about whether people agree with the quoted central claim from the book?
I would feel better if your post was more of the following form: “I am curious about the actual level of agreement with IABI, here is a list of directly quoted claims made in the book (which I put in the comments below). React with agree/disagree about each claim. If your answer is instead “mu”, then you can argue about it or something idk. Please also feel free to comment claims the book makes which you are interested in getting a survey of LessWrong opinions on yourself”.
I think I would actually strong-upvote this post if it existed, provided the moderation seemed appropriate, and the seeded claims concrete and not phrased leadingly.
Edit: Bonus points for encouraging people to use the probability reacts rather than agree/disagree reacts.
Thanks. The reactions to such a post would constitute a stronger common knowledge signal of community agreement with the book (to the degree that such agreement is in fact present in the community).
I wonder if it would be better to make the agree-voting anonymous (like LW post voting) or with people’s names attached to their votes (like react-voting).
I’m sure this is going too far for you, but I also personally wish LW could go even further toward turning a sufficient amount of mutual support expressed in that form (if it turns out to exist) into a frontpage that actually looks like what most humans expect a supportive front page around a big event to look like (moreso than having a banner mentioning it and discussion mentioning it).
Again, the separate tweet about LW crab-bucketing in my Twitter thread wasn’t meant as a response to to you in this LW thread.
I agree that “room for disagreement does not imply any disagreement is valid”, and am not seeing anything left to respond to on that point.
I could be described as supporting the book in the sense that I preordered it, and I bought two extra copies to give as gifts. I’m planning to give one of them tomorrow to an LLM-obsessed mathematics professor at San Francisco State University. But the reason I’m giving him the book is because I want him to read it and think carefully about the arguments on the merits, because I think that mitigating the risk of extinction from AI should be a global priority. It’s about the issue, not supporting MIRI or any particular book.
Use the agree/disagree react buttons on this poll-comment to build mutual knowledge about what LWers believe!
I don’t know what “all-too-plausibly” means. Depending on the probabilities that this implies I may agree or disagree.
If you don’t know whether the book’s thesis being “all-too-plausibly” or “not all-too-plausibly” right describes your position better, you can just go ahead and not count yourself as a supporter of IABIED (or flip a coin, or don’t participate in the aggregation effort). The mutual-knowledge I’m hoping to build is among people who don’t see this as a gray-area question, because I think that’s already a pretty high fraction (maybe majority) of LWers.
“If Anyone Builds It, Everyone Dies” is more true than not[1].
Personally my p(doom)≈60%, which may be reduced by ~10%-15% by applying the best known safety techniques, but then we’ve exhausted my epistemic supply of “easy alignment worlds”.
After that is the desert of “alignment is actually really hard” worlds. We may get another 5% because mildly smart AI systems refuse to construct their successors because they know they can’t solve alignment under those conditions.
So the title of that book is more correct than not. I think the AI-assisted alignment feels the most promising to me, but also reckless as hell. Best is human intelligence enhancement through some kind of genetech or neurotech. Feels like very few people with influence are planning for “alignment is really hard” worlds.
I nevertheless often dunk on MIRI because I would like them to spill more on their agent foundations thoughts, and because I think the arguments don’t rise above the level of “pretty good heuristics”. Definitely not to the level of “physical law” which we’ve usually used to “predict endpoints but not trajectories”.
The statement, not the book. The book was (on a skim) underwhelming, but I expected that, and it probably doesn’t matter very much since I’m Not The Target Audience.
Well, (having not read any of the arguments/reservations below) my
guessuninformed intuition is that this could be part of the crux...”Supporting”/”signing-on-to” the 22 word statement from CAIS in 2023 is one thing. But, “supporting” all of the 256 pages of words in the IABIED book, that does seem like a different thing entirely? In general, I wouldn’t expect to find many LWers to be “followers of the book” (as it were)...
That said, I do wonder what percentage of the LW Community would “support” the simple 6 word statement, regardless of the content, explanations, arguments, etc. in the book it self?
I agree with this statement “If Anyone Builds It, Everyone Dies”.
(Where, this book is just one expression of support and explanation for that 6 word statement. And not necessarily the “definitive works”, “be all and end all”, “the holy grail”, for the entire idea...)
And then conversely, what percentage of LWers would disagree with the 6 words in that statement… e.g.
“If some specific person/group builds it, not everyone dies.”
Or, perhaps even...
“If some specific person/group builds it, no one dies.”
That said, I also wonder what percentage of the LW Community would support/”sign-on-to” the 22 word statement from CAIS from 2023, as well...
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
Probably more would support that 22 word statement, versus the 6 word statement above… But, yeah hard to say, maybe not?
(I agree-reacted to the statement, because I think polling is often a very reasonable default activity. I am not upvoting the post, especially not for a goal of turning the frontpage post list into a space to signal the positions of LW, rather than for interesting and novel arguments and/or well written explanations.)
(Added: To be clear Liron did not ask for upvotes; but the post talked about what gets upvoted on frontpage, and then posted the poll claim, which seemed suggestive to me of a standard terrible failure mode for discussion spaces; so I want to disclaim upvoting in order to make the frontpage signal positions to low-engagement LW readers.)
Fair enough. I didn’t find a LW poll feature and I dunno how to better achieve mutual knowledge about LW’s level of support for the book.
You can just ask people to use reacts! I’ve done a few posts like that (example) though I used hidden backend features to do a more complicated setup.
Ah thx I didn’t know about that. A poll in the top-level post seems like a potentially useful feature. At least in special cases like right now where it seems like many/most people hold an important shared position, and building mutual knowledge about that seems valuable, a poll can do that. I’ll try a comment poll. Edit: here it is
If we manage to build an AI that (1) possesses current LLM abilities, and (2) meets or exceeds the competence of a bright 12-year-old in other abilities, then I give us an 80+% chance of takeover by a superintelligence. After that, I expect the superintelligence to make all the important decisions about the future of humanity, in much the same way that humans make all the important decisions about the future of stray dogs. Euthanasia? Sterilization? A nice human rescue program and house-training? It won’t be our call. And since the AI probably pretty damn alien, we probably won’t like its call.
So I endorse, “If anyone builds it, then everyone dies. Or maybe it you’re incredibly lucky, you might just get spayed and lose all control over your life.” I don’t see this as a big improvement.
As a general rule, I feel like building an incomprehensible alien superintelligence is a very dumb move, for reasons that should be painfully self-evident to anyone who gives it 10 seconds of genuine thought.