A case for courage, when speaking of AI danger
I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.
I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it seriously too.
I have another whole spiel about how it’s possible to speak on these issues with a voice of authority. Nobel laureates and lab heads and the most cited researchers in the field are saying there’s a real issue here. If someone is dismissive, you can be like “What do you think you know that the Nobel laureates and the lab heads and the most cited researchers don’t? Where do you get your confidence?” You don’t need to talk like it’s a fringe concern, because it isn’t.
And in the last year or so I’ve started collecting anecdotes such as the time I had dinner with an elected official, and I encouraged other people at the dinner to really speak their minds, and the most courage they could muster was statements like, “Well, perhaps future AIs could help Iranians figure out how to build nuclear material.” To which the elected official replied, “My worries are much bigger than that; I worry about recursive self-improvement and superintelligence wiping us completely off the map, and I think it could start in as little as three years.”
These spiels haven’t always moved people. I’m regularly told that I’m just an idealistic rationalist who’s enamored by the virtue of truth, and who’s trying to worm out of a taboo tradeoff between honesty and effectiveness. I’m sometimes told that the elected officials are probably just trying to say what they know their supporters want to hear. And I am not at liberty to share all the details from those sorts of dinners, which makes it a little harder to share my evidence.
But I am at liberty to share the praise we’ve received for my forthcoming book. Eliezer and I do not mince words in the book, and the responses we got from readers were much more positive than I was expecting, even given all my spiels.
You might wonder how filtered this evidence is. It’s filtered in some ways and not in others.
We cold-emailed a bunch of famous people (like Obama and Oprah), and got a low response rate. Separately, there’s a whole tier of media personalities that said they don’t really do book endorsements; many of those instead invited us on their shows sometime around book launch. Most people who work in AI declined to comment. Most elected officials declined to comment (even while some gave private praise, for whatever that’s worth).
But among national security professionals, I think we only approached seven of them. Five of them gave strong praise, one of them (Shanahan) gave a qualified statement, and the seventh said they didn’t have time (which might have been a polite expression of disinterest). Which is a much stronger showing than I was expecting, from the national security community.
We also had a high response rate among people like Ben Bernanke and George Church and Stephen Fry. These are people that we had some very tangential connection to — like “one of my friends is childhood friends with the son of a well-connected economist.” Among those sorts of connections, almost everyone had a reaction somewhere between “yep, this sure is an important issue that people should be discussing” and “holy shit”, and most offered us a solid endorsement. (In fact, I don’t recall any such wacky plans that didn’t pan out, but there were probably a few I’m just not thinking of. We tried to keep a table of all our attempts but it quickly devolved into chaos.)
I think this is pretty solid evidence for “people are ready to hear about AI danger, if you speak bluntly and with courage.”[1] Indeed, I’ve been surprised and heartened by the response to the book.
I think loads of people in this community are being far too cowardly in their communication, especially in the policy world.
I’m friends with some of the folk who helped draft SB 1047, so I’ll pick on them a bit.
SB 1047 was an ill-fated California AI bill that more-or-less required AI companies to file annual safety reports. When it was being drafted, I advised that it should be much more explicit about how AI poses an extinction threat, and about the need for regulations with teeth to avert that danger. The very first drafts of SB 1047 had the faintest of teeth, and then it was quickly defanged (by the legislature) before being passed and ultimately vetoed.
One of my friends who helped author the bill took this sequence of events as evidence that my advice was dead wrong. According to them, the bill was just slightly too controversial. Perhaps if it had been just a little more watered down then maybe it would have passed.
I took the evidence differently.
I noted statements like Senator Ted Cruz saying that regulations would “set the U.S. behind China in the race to lead AI innovation. [...] We should be doing everything possible to unleash competition with China, not putting up artificial roadblocks.” I noted J.D. Vance saying that regulations would “entrench the tech incumbents that we actually have, and make it actually harder for new entrants to create the innovation that’s going to power the next generation of American growth.”
On my view, these were evidence that the message wasn’t working. Politicians were not understanding the bill as being about extinction threats; they were understanding the bill as being about regulatory capture of a normal budding technology industry.
...Which makes sense. If people really believed that everyone was gonna die from this stuff, why would they be putting forth a bill that asks for annual reporting requirements? Why, that’d practically be fishy. People can often tell when you’re being fishy.
It’s been a tricky disagreement to resolve. My friend and I each saw a way that the evidenced supported our existing position, which means it didn’t actually discriminate between our views.[2]
But, this friend who helped draft SB 1047? I’ve been soliciting their advice about which of the book endorsements would be more or less impressive to folks in D.C. And as they’ve seen the endorsements coming in, they said that all these endorsements were “slightly breaking [their] model of where things are in the overton window.” So perhaps we’re finally starting to see some clearer evidence that the courageous strategy actually works, in real life.[3]
I think many people who understand the dangers of AI are being too cowardly in their communications.[4] Especially among people in D.C. I think that such communication is useless at best (because it doesn’t focus attention on the real problems) and harmful at worst (because it smells fishy). And I think it’s probably been harmful to date.
If you need a dose of courage, maybe go back and look at the advance praise for If Anyone Builds It, Everyone Dies again. Recall that the praise came from many people we had no prior relationship with, including some people who were initially quite skeptical. Those blurbs don’t feel to me like they’re typical, run-of-the-mill book endorsements. Those blurbs feel to me like unusual evidence that the world is ready to talk about this problem openly.
Those blurbs feel to me like early signs that people with a wide variety of backgrounds and world-views are pretty likely to adopt a sensible viewpoint upon hearing real arguments made with conviction.
The job isn’t done yet. Society being ready to have the conversation doesn’t mean that society is having the conversation. We’re still somehow in a world where I occasionally have dinner with elected officials who worry about recursive self-improvement leading to superintelligence inside of three years — and yet none of the congress members who privately praise our book are willing to endorse it publicly yet.
There’s work to be done. And now is perhaps the right time to do it.
This possible window of opportunity is why Eliezer and I wrote the book in the first place. Once I started noticing that the message was really working, e.g. in private meetings with elected officials in D.C., I came back and told Eliezer that I thought the time was ripe. Perhaps a clear, convincing, and successful book (with an associated media tour) will do a lot to thrust this conversation into the mainstream. And then maybe humanity will decide to do something else, and maybe we won’t all die.
So, at the risk of pandering: If you see any of the hope here that I see, consider helping us make this book cause a big splash.
The time is ripe. Pre-orders can help our book debut on the best-seller lists, and being on the best-seller list will influence whether our book, and this conversation, catches national attention.
Last month we were hitting Amazon best-seller lists in our categories despite still being in pre-order mode (which I’m told is rare). I’m heartened by this support. If we can keep up the pace of the last month for the next two and a half months, we’ll be in good shape. If we can keep up the pace and then get an extra burst of support right as the book launches, we’ll be in great shape. I think that’s a possibility, with support from this community.
We’re doing everything we can on our end, and it’s going well. If you want to coordinate about breaking open the Overton window, now is quite plausibly the best time for it.
If we do our job well here, there’s a real chance that politicians and journalists and your neighbor are taking extinction-level dangers from AI seriously by the end of the year.
And regardless of whether you think our book is worth throwing your weight behind, I hope you consider that you don’t need to hide your beliefs about AI danger, nor be ashamed of them. People respond well when you speak to them with courage.
- ^
The “courage” property I’m talking advocating here is about naming what you think is the real problem plainly, not about being impolite. This notion of courage is completely compatible with saying “you don’t have to agree with me, Mr. Senator, but my best understanding of the evidence is [belief]. If ever you’re interested in discussing the reasons in detail, I’d be happy to. And until then, we can work together in areas where our interests overlap.” The concept is about naming what you think are the key issues, straightforwardly, in plain language. There are loads of ways to do so politely. See also this comment.
- ^
That wasn’t the only case where the evidence was tricky to read. In January, Senator Cruz had Elon Musk on his podcast and asked “How real is the prospect of killer robots annihilating humanity?” in a context where the senator was overall a bit dismissive. Some of my friends read this as a bad sign, on account of the dismissiveness. I read it it as a good sign, because he was at least starting to think about the key dangers.
And shortly after J.D. Vance read AI 2027, he was asked “Do you think that the U.S. government is capable in a scenario — not like the ultimate Skynet scenario — but just a scenario where A.I. seems to be getting out of control in some way, of taking a pause?” and he answered:
I don’t know. That’s a good question.
The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause? And then we find ourselves all enslaved to P.R.C.-mediated A.I.?
Some of my friends read this as a bad sign, because he seemed intent on a suicide race. I read it as a good sign, because — again — he was engaging with the real issues at hand, and he wasn’t treating AI as just another industry at the mercy of over-eager regulators.
- ^
My friend caveats that they’re still not sure SB 1047 should have been more courageous. All we have actually observed is courage working in 2025, which does not necessarily imply it would have worked in 2023.
- ^
By “cowardice” here I mean the content, not the tone or demeanor. I acknowledge that perceived arrogance and overconfidence can annoy people in communication, and can cause backlash. For more on what I mean by courageous vs cowardly content, see this comment. I also spell out the argument more explicitly in this thread.
- AI Moratorium Stripped From BBB by 1 Jul 2025 18:50 UTC; 70 points) (
- AI Safety Landscape & Strategic Gaps by 17 Sep 2025 23:02 UTC; 68 points) (EA Forum;
- AI #123: Moratorium Moratorium by 3 Jul 2025 15:40 UTC; 33 points) (
- Why is LW not about winning? by 13 Jul 2025 22:36 UTC; 21 points) (
- 2 Jul 2025 1:30 UTC; 14 points) 's comment on AI Moratorium Stripped From BBB by (
- 4 Aug 2025 16:14 UTC; 7 points) 's comment on Saying Goodbye by (
- 22 Jul 2025 10:48 UTC; 1 point) 's comment on We’re Not Advertising Enough (Post 3 of 7 on AI Governance) by (EA Forum;
- 6 Jul 2025 0:50 UTC; -9 points) 's comment on [Meta] New moderation tools and moderation guidelines by (
While I disagree with Nate on a wide variety of topics (including implicit claims in this post), I do want to explicitly highlight strong agreement with this:
The position that is “obvious and sensible” doesn’t have to be “if anyone builds it, everyone dies”. I don’t believe that position. It could instead be “there is a real threat model for existential risk, and it is important that society does more to address it than it is currently doing”. If you’re going to share concerns at all, figure out the position you do have courage in, and then discuss that as if it is obvious and sensible, not as if you are ashamed of it.
(Note that I am not convinced that you should always be sharing your concerns. This is a claim about how you should share concerns, conditional on having decided that you are going to share them.)
The potential failure mode I see with this is that, if you’re not paying sufficient attention to your rhetoric, you run the risk of activating people’s anti-Pascal’s mugging instincts.[1] As Jeremy Gillen said:
“You should pay attention to this even if you think there’s a really small chance of it happening because if it does happen, the consequences will be massive” is something most people hear pretty regularly. In lieu of spending inordinate amounts of time teasing out the specific probabilities and expected utilities, they create heuristics allowing them to ignore these kinds of claims.
Note: in my experience obtained from reading Substack comments on posts by EA-skeptical authors, the belief that “EA wants you to care about AI safety because there’s a low probability of a really bad outcome” is both extremely prevalent and also causes EA/AI x-risk proponents/etc to be viewed quite negatively (like they’re using argumentative Dark Arts or something similar).
I don’t think most anyone who’s studied the issues at hand thinks the chance of danger is “really small”, even among people who disagree with me quite a lot (see e.g. here). I think folks who retreat to arguments like “you should pay attention to this even if you think there’s a really small chance of it happening” are doing a bunch of damage, and this is one of many problems I attribute to a lack of this “courage” stuff I’m trying to describe.
When I speak of “finding a position you have courage in”, I do not mean “find a position that you think should be logically unassailable.” I’m apparently not doing a very good job at transmitting the concept, but here’s some positive and negative examples:
✓ “The race towards superintelligence is ridiculously risky and I don’t think humanity should be doing it.”
✓ “I’m not talking about a tiny risk. On my model, this is your most likely cause of death.”
✓ “Oh I think nobody should be allowed to race towards superintelligence, but I’m trying to build it anyway because I think I’ll do it better than the next guy. Ideally all AI companies in the world should be shut down, though, because we’d need way more time to do this properly.” (The action is perhaps a bit cowardly, but the statement is courageous, if spoken by someone for whom it’s true.)
✗ “Well we can all agree that it’d be bad if AIs were used to enable terrorists to make bioweapons” (spoken by someone who thinks the danger from superintelligence is pressing).
✗ “Even if you think the chance of it happening is very small, it’s worth focusing on, because the consequences are so huge” (spoken by someone who believes the danger is substantial).
✗ “In some unlikely but extreme cases, these companies put civilization at risk, and the companies should be responsible for managing those tail risks” (spoken by someone who believes the danger is substantial).
One litmus test here is: have you communicated the real core of the situation and its direness as you perceive it? Not like “have you caused them as much concern as you can manage to”, more like “have you actually just straightforwardly named the key issue”. (There’s also some caveats here about how, if you think there’s a lowish chance of disaster because you think humanity will come to its senses and change course, then this notion of “courage” I’m trying to name still entails communicating how humanity is currently on course for a full-fledged disaster, without mincing words.)
Something I notice is that in the good examples you use only I statements. “I don’t think humanity should be doing it”, “I’m not talking about a tiny risk”, “Oh I think I’ll do it better than the next guy”.
Whereas in the bad examples it’s different, “Well we can all agree that it’d be bad if AIs were used to enable terrorists to make bioweapons”, “Even if you think the chance of it happening is very small”, “In some unlikely but extreme cases, these companies put civilization at risk”
I think with the bad examples there’s a lot of pressure for the other person to agree, “the companies should be responsible (because I say so)”, “Even if you think… Its still worth focusing on (because I’ve decided what you should care about)”, “Well we can all agree (I’ve already decided you agree and you’re not getting a choice otherwise)”
Whereas with the good examples the other person is not under any pressure to agree, so they are completely free to think about the things you’re saying. I think that’s also part of what makes these statements courageous, that it’s stated in a way where the other person is free to agree or dissagree as they wish, and so you trust that what your saying is compelling enough to be persuasive on its own.
This link doesn’t seem to include people like Quintin Pope and the AI Optimists, who are the most notorious AI risk skeptics I can think of who have nonetheless written about Eliezer’s arguments (example). If I recall correctly, I think Pope said sometime before his departure from this site that his P(doom) is around 1%.
Yup, that claim is wrong. I’m not ⇐ 1% but I have met educated skeptics who are. Not sure why Nate made this claim since it isn’t relevant to his point—could just delete that first sentence.
(The existence of exceptions is why I said “most anyone” instead of “anyone”.)
I have a similar experience from reading Hacker News. Seems to me that people who write it don’t really want to express an opinion on AI, more like they are using absurdity heuristic for an attack by association against EA. (Attacking EA is their goal, arguing by “AI apocalypse is unlikely” is merely a convenient weapon.)
Both Soares and me get mixed reviews for our social presentation, so you might want to take all this with a grain of salt. But here’s my two cents of response.
I agree it’s good to not seem ashamed of something you’re saying. I think this is mostly a special case of how it’s good to be personally confident. (See Soares here for some good advice on what it means to be confident in a confusing situation.) One reason is that it’s really helpful to convey to your conversational partner that, despite the fact you’re interested in what they think, you’ll be fine and chill regardless of how they respond to you; this lets them relax and e.g. say their true objections if they want to.
But I think it’s generally a mistake to act as if beliefs are “obvious” if in fact they aren’t obvious to your audience. That is, I think that you should talk differently when saying the following different types of statements:
Statements that your audience already agrees with
Statements that your audience already disagrees with
Statements that your audience hasn’t thought about yet, but that they’ll immediately agree with upon consideration
Statements that your audience hasn’t thought about yet, but that they’ll not really have an opinion about when you say them
Statements that your audience has heard about and feels unsure about.
This is mostly because people are more comfortable if they think you understand their state. If you act as if a statement will be obvious to them, they’ll implicitly wonder whether you think they’re an idiot, whether you’re very confused about how many people agree with you, whether you’re an ideologue. They’ll feel like you’re not following the social script that gives them affordance to say “actually that’s not obvious, why do you think that?”
I think that when someone asks you about AI doom, you should say your concerns plainly, but while making it clear that a lot of the things you’re saying might not be obvious to the person you’re talking to, and preferably signposting which you think are more obvious. E.g. for me I generally try to note in my words and tone of voice that I think “AI is a huge deal” is a simpler thing to be persuaded of than “the AI might want to take our stuff”.
I personally find it really annoying when people act as if something they’re saying is obvious to me, when it is in fact contentious.
(See here for a blog post criticizing me for acting as if weird beliefs are obvious in a talk :P I gave that talk intending to more explain than persuade, hoping that I could invite pushback from the audience that we can talk through. But it didn’t come across that way, unfortunately, probably because I didn’t signal well enough that I was expecting and hoping for pushback.)
I agree that it’s usually helpful and kind to model your conversation-partner’s belief-state (and act accordingly).
And for the avoidance of doubt: I am not advocating that anyone pretend they think something is obvious when they in fact do not.
By “share your concerns as if they’re obvious and sensible”, I was primarily attempting to communicate something more like: I think it’s easy for LessWrong locals to get lost in arguments like whether AI might go fine because we’re all in a simulation anyway, or confused by turf wars about whether AI has a 90+% chance of killing us or “only” a ~25% chance. If someone leans towards the 90+% model and gets asked their thoughts on AI, I think it’s worse for them to answer in a fashion that’s all wobbly and uncertain because they don’t want to be seen as overconfident against the 25%ers, and better for them to connect back to the part where this whole situation (where companies are trying to build machine superintelligence with very little idea of what they’re doing) is wacky and insane and reckless, and speak from there.
I don’t think one should condescend about the obviousness of it. I do think that this community is generally dramatically failing to make the argument “humanity is building machine superintelligence while having very little idea of what it’s doing, and that’s just pretty crazy on its face” because it keeps getting lost in the weeds (or in local turf wars).
And I was secondarily attempting to communicate something like: I think our friends in the policy world tend to cede far too much social ground. A bunch of folks in DC seem to think that the views of (say) Yann LeCun and similar is the scientific consensus with only a few radicals protesting, whereas the actual facts is that “there’s actually a pretty big problem here” is much closer to consensus, and that a lack of scientific consensus is a negative sign rather than a positive sign in a situation like this one (because it’s an indication that the scientific field has been able to get this far without really knowing what the heck it’s doing, which doesn’t bode well if it goes farther). I think loads of folks are mentally framing themselves as fighting for an unpopular fringe wacky view when that’s not the case, and they’re accidentally signaling “my view is wacky and fringe” in cases where that’s both false and harmful.
(I was mixing both these meanings into one sentence because I was trying to merely name my old spiels rather than actually giving them, because presenting the old spiels was not the point of the post. Perhaps I’ll edit the OP to make this point clearer, with apologies to future people for confusion caused if the lines that Buck and I are quoting have disappeared.)
I don’t think the weed/local turf wars really cause the problems here, why do you think that?
The weeds/local turf wars seem like way smaller problems for AI-safety-concerned people communicating that the situation seems crazy than e.g. the fact that a bunch of the AI safety people work at AI companies.
Idk, seems plausible.
The hypothesized effect is: people who have been engaged in the weeds/turf wars think of themselves as “uncertain” (between e.g. the 25%ers and the 90+%ers) and forget that they’re actually quite confident about some proposition like “this whole situation is reckless and crazy and Earth would be way better off if we stopped”. And then there’s a disconnect where (e.g.) an elected official asks a local how bad things look, and they answer while mentally inhabiting the uncertain position (“well I’m not sure whether it’s 25%ish or 90%ish risk”), and all they manage to communicate is a bunch of wishy-washy uncertainty. And (on this theory) they’d do a bunch better if they set aside all the local disagreements and went back to the prima-facie “obvious” recklessness/insanity of the situation and tried to communicate about that first. (It is, I think, usually the most significant part to communicate!)
Yeah I am pretty skeptical that this is a big effect—I don’t know anyone who I think speaks
unconfidentlywithout the courage of their convictions when talking to audiences like elected officials for this kind of reason—but idk.Whoa, this seems very implausible to me. Speaking with the courage of one’s convictions in situations which feel high-stakes is an extremely high bar, and I know of few people who I’d describe as consistently doing this.
If you don’t know anyone who isn’t in this category, consider whether your standards for this are far too low.
I read Buck’s comment as consistent with him knowing people who speak without the courage of their convictions for other reasons than stuff like “being uncertain between 25% doom and 90% doom”.
Huh! I’ve been in various conversations with elected officials and have had the sense that most people speak without the courage of their convictions (which is not quite the same thing as “confidence”, but which is more what the post is about, and which is the property I’m more interested in discussing in this comment section, and one factor of the lack of courage is broadcasting uncertainty about things like “25% vs 90+%” when they could instead be broadcasting confidence about “this is ridiculous and should stop”). In my experience, it’s common to the point of others expressing explicit surprise when someone does and it works (as per the anecdote in the post).
I am uncertain to what degree we’re seeing very different conversations, versus to what degree I just haven’t communicated the phenomena I’m talking about, versus to what degree we’re making different inferences from similar observations.
I don’t think your anecdote supports that it’s important to have the courage of your convictions when talking. I think the people I know who worked on SB-1047 are totally happy to say “it’s ridiculous that these companies don’t have any of the types of constraints that might help mitigate extreme risks from their work” without wavering because of the 25%-vs-90% thing. I interpret your anecdote as being evidence about which AI-concerned-beliefs go over well, not about how you should say them. (Idk how important this is, np if you don’t want to engage further.)
A few claims from the post (made at varying levels of explicitness) are:
1 . Often people are themselves motivated by concern X (ex: “the race to superintelligence is reckless and highly dangerous”) and decide to talk about concern Y instead (ex: “AI-enabled biorisks”), perhaps because they think it is more palatable.
2 . Focusing on the “palatable” concerns is a pretty grave mistake.
2a. The claims Y are often not in fact more palatable; people are often pretty willing to talk about the concerns that actually motivate you.
2b. When people try talking about the concerns that actually motivate them while loudly signalling that they think their ideas are shameful and weird, this is not a terribly good test of claim (2a).
2c. Talking about claims other than the key ones comes at a steep opportunity cost.
2d. Talking about claims other than the key ones risks confusing people who are trying to make sense of the situation.
2e. Talking about claims other than the key ones risks making enemies of allies (when those would-be allies agree about the high-stakes issues and disagree about how to treat the mild stuff).
2f. Talking about claims other than the key ones triggers people’s bullshit detectors.
3 . Nate suspects that many people are confusing “I’d be uncomfortable saying something radically different from the social consensus” with “if I said something radically different from the social consensus then it would go over poorly”, and that this conflation is hindering their ability to update on the evidence.
3a. Nate is hopeful that evidence of many people’s receptiveness to key concerns will help address this failure.
3b. Nate suspects that various tropes and mental stances associated with the word “couarge” are perhaps a remedy to this particular error, and hopes that advice like “speak with the courage of your convictions” is helpful for remembering the evidence in (3a) and overcoming the error of (3).
I don’t think this is in much tension with my model.
For one thing, that whole sentence has a bunch of the property I’d call “cowardice”. “Risks” is how one describes tail possibilities; if one believes that a car is hurtling towards a cliff-edge, it’s a bit cowardly to say “I think we should perhaps talk about gravity risks” rather than “STOP”. And the clause “help mitigate extreme risks from their work” lets the speaker pretend the risks are tiny on Monday and large on Tuesday; it doesn’t extend the speaker’s own neck.
For another thing, willingness to say that sort of sentence when someone else brings up the risks (or to say that sort of sentence in private) is very different from putting the property I call “courage” into the draft legislation itself.
I observe that SB-1047 itself doesn’t say anything about a big looming extinction threat that requires narrowly targeted legislation. It maybe gives the faintest of allusions to it, and treads no closer than that. The bill lacks the courage of the conviction “AI is on track to ruin everything.” Perhaps you believe this simply reflects the will of Scott Weiner. (And for the record: I think it’s commendable that Senator Weiner put forth a bill that was also trying to get a handle on sub-extinction threats, though it’s not what I would have done.) But my guess is that the bill would be written very differently if the authors believed that the whole world knew how insane and reckless the race to superintelligence is. And “write as you would if your ideas were already in the Overton window” is not exactly what I mean by “have the courage of your convictions”, but it’s close.
(This is also roughly my answer to the protest “a lot of the people in D.C. really do care about AI-enabled biorisk a bunch!”. If the whole world was like “this race to superintelligence is insane and suicidal; let’s start addressing that”, would the same people be saying “well our first priority should be AI-enabled biorisk; we can get to stopping the suicide race later”? Because my bet is that they’re implicitly focusing on issues that they think will fly, and I think that this “focus on stuff you think will fly” calculation is gravely erroneous and harmful.)
As for how the DC anecdote relates: it gives an example of people committing error (1), and it provides fairly direct evidence for claims (2a) and (2c). (It also provided evidence for (3a) and (3b), in that the people at the dinner all expressed surprise to me post-facto, and conceptualized this pretty well in terms of ‘courage’, and have been much more Nate!courageous at future meetings I’ve attended, to what seem to me like good effects. Though I didn’t spell those bits out in the original post.)
I agree that one could see this evidence and say “well it only shows that courage works for that exact argument in that exact time period” (as is mentioned in a footnote, and as is a running theme throughout the post). Various other parts of the post provide evidence for other claims (e.g. the Vance, Cruz, and Sacks references provide evidence for (2d), (2e), and (2f)). I don’t expect this post to be wholly persuasive, and indeed a variant of it has been sitting in my drafts folder for years. I’m putting it out now in part (because I am trying to post more in the lead-up to the book and in part) because folks have started saying things like “this is slightly breaking my model of where things are in the overton window”, which causes me to think that maybe the evidence has finally piled up high enough that people can start to internalize hypothesis (3), even despite how bad and wrong it might feel for them to (e.g.) draft legislation in accordance with beliefs of theirs that radically disagree with perceived social consensus.
Ok. I don’t think your original post is clear about which of these many different theses it has, or which points it thinks are evidence for other points, or how strongly you think any of them.
I don’t know how to understand your thesis other than “in politics you should always pitch people by saying how the issue looks to you, Overton window or personalized persuasion style be damned”. I think the strong version of this claim is obviously false. Though maybe it’s good advice for you (because it matches your personality profile) and perhaps it’s good advice for many/most of the people we know.
I think that making SB-1047 more restrictive would have made it less likely to pass, because it would have made it easier to attack and fewer people would agree that it’s a step in the right direction. I don’t understand who you think would have flipped from negative to positive on the bill based on it being stronger—surely not the AI companies and VCs who lobbyied against it and probably eventually persuaded Newsom to veto?
I feel like the core thing that we’ve seen in DC is that the Overton window has shifted, almost entirely as a result of AI capabilities getting better, and now people are both more receptive to some of these arguments and more willing to acknowledge their sympathy.
To be clear, my recommendation for SB-1047 was not “be basically the same bill but talk about extinction risks and levy a few more restrictions on the labs”, but rather “focus very explicitly on the extinction threat; say ‘this bill is trying to address a looming danger described by a variety of scientists and industry leaders’ or suchlike, shape the bill differently to actually address the extinction threat straightforwardly”.
I don’t have a strong take on whether SB-1047 would have been more likely to pass in that world. My recollection is that, back when I attempted to give this advice, I said I thought it would make the bill less likely to pass but more likely to have good effects on the conversation (in addition to it being much more likely to matter in cases where it did pass). But that could easily be hindsight bias; it’s been a few years. And post facto, the modern question of what is “more likely” depends a bunch on things like how stochastic you think Newsom is (we already observed that he vetoed the real bill, so I think there’s a decent argument that a bill with different content has a better chance even if it’s a lower than our a-priori odds on SB-1047), though that’s a digression.
I do think that SB-1047 would have had a substantially better effect on the conversation if it was targeted towards the “superintelligence is on track to kill us all” stuff. I think this is a pretty low bar because I think that SB-1047 had an effect that was somewhere between neutral and quite bad, depending on which follow-on effects you attribute to it. Big visible bad effects that I think you can maybe attribute to it are Cruz and Vance polarizing against (what they perceived as) attempts to regulate a budding normal tech industry, and some big dems also solidifying a position against doing much (e.g. Newsom and Pelosi). More insidiously and less clearly, I suspect that SB-1047 was a force holding the Overton window together. It was implicitly saying “you can’t talk about the danger that AI kills everyone and be taken seriously” to all who would listen. It was implicitly saying “this is a sort of problem that could be pretty-well addressed by requiring labs to file annual safety reports” to all who would listen. I think these are some pretty false and harmful memes.
With regards to the Overton window shifting: I think this effect is somewhat real, but I doubt it has as much importance as you imply.
For one thing, I started meeting with various staffers in the summer of 2023, and the reception I got is a big part of why I started pitching Eliezer on the world being ready for a book (a project that we started in early 2024). Also, the anecdote in the post is dated to late 2024 but before o3 or DeepSeek. Tbc, it did seem to me like the conversation changed markedly in the wake of DeepSeek, but it changed from a baseline of elected officials being receptive in ways that shocked onlookers.
For another thing, in my experience, anecdotes like “the AI cheats and then hides it” or experimental results like “the AI avoids shutdown sometimes” are doing as much if not more of the lifting as capabilities advances. (Though I think that’s somewhat of a digression.)
For a third thing, I suspect that one piece of the puzzle you’re missing is how much the Overton window has been shifting because courageous people have been putting in the legwork for the last couple years. My guess is that the folks putting these demos and arguments in front of members of congress are a big part of why we’re seeing the shift, and my guess is that the ones who are blunt and courageous are causing the shift to happen moreso (and are causing it to happen in a better direction).
I’m worried about the people who go in and talk only about (e.g.) AI-enabled biorisk while avoiding saying a word about superintelligence or loss-of-control. I think this happens pretty often and that it comes with a big opportunity cost in the best cases, and that it’s actively harmful in the worst cases—when (e.g.) it reinforces a silly Overton window, or when it shuts down some congress member’s budding thoughts about the key problems, or when it orients them towards silly issues. I also think it spends down future credibility; I think it risks exasperating them when you try to come back next year and say that we’re on track to all die. I also think that the lack of earnestness is fishy in a noticeable way (per the link in the OP).
[edited for clarity and to fix typos, with apologies about breaking the emoji-reaction highlights]
Ok. I agree with many particular points here, and there are others that I think are wrong, and others where I’m unsure.
For what it’s worth, I think SB-1047 would have been good for AI takeover risk on the merits, even though (as you note) it isn’t close to all we’d want from AI regulation.
Yeah to reiterate, idk why you think
My guess is that the main reason they broadcast uncertainty is because they’re worried that their position is unpalatable, rather than because of their internal sense of uncertainty.
FWIW I broadcast the former rather than the latter because from the 25% perspective there are many possible worlds which the “stop” coalition ends up making much worse, and therefore I can’t honestly broadcast “this is ridiculous and should stop” without being more specific about what I’d want from the stop coalition.
A (loose) analogy: leftists in Iran who confidently argued “the Shah’s regime is ridiculous and should stop”. It turned out that there was so much variance in how it stopped that this argument wasn’t actually a good one to confidently broadcast, despite in some sense being correct.
Maybe it’s hard to communicate nuance, but it seems like there’s a crazy thing going on where many people in the AI x-risk community think something like “Well obviously I wish it would stop, and the current situation does seem crazy and unacceptable by any normal standards of risk management. But there’s a lot of nuance in what I actually think we should do, and I don’t want to advocate for a harmful stop.”
And these people end up communicating to external people something like “Stopping is a naive strategy, and continuing (maybe with some safeguards etc) is my preferred strategy for now.”
This seems to miss out the really important part where they would actually want to stop if we could, but it seems hard and difficult/nuanced to get right.
Yeah, I agree that it’s easy to err in that direction, and I’ve sometimes done so. Going forward I’m trying to more consistently say the “obviously I wish people just wouldn’t do this” part.
Though note that even claims like “unacceptable by any normal standards of risk management” feel off to me. We’re talking about the future of humanity, there is no normal standard of risk management. This should feel as silly as the US or UK invoking “normal standards of risk management” in debates over whether to join WW2.
To check, do you have particular people in mind for this hypothesis? Seems kinda rude to name them here, but could you maybe send me some guesses privately? I currently don’t find this hypothesis as stated very plausible, or like sure maybe, but I think it’s a relatively small fraction of the effect.
Sorry for budging in, but I can’t help but notice I agree with both what you and So8res are saying, but I think you aren’t arguing about the same thing.
You seem to be talking about the dimension of “confidence,” “obviousness,” etc. and arguing that most proponents of AI concern seem to have enough of it, and shouldn’t increase it too much.
So8res seems to be talking about another dimension which is harder to name. “Frank futuristicness” maybe? Though not really.
If you adjust your “frank futuristicness” to an absurdly high setting, you’ll sound a bit crazy. You’ll tell lawmakers “I’m not confident, and this isn’t obvious, but I think that unless we pause AI right now, we risk a 50% chance of building a misaligned superintelligence. It might use nanobots to convert all the matter in the universe into paperclips, and the stars and galaxies will fade one by one.”
But if you adjust your “frank futuristicness” to an absurdly low setting, you’ll end up being ineffective. You’ll say “I am confident, and this is obvious: we should regulate AI companies more because they are less regulated than other companies. For example the companies which research vaccines have to jump through so many clinical trial loops, meanwhile AI models are just as untested as vaccines and really, they have just as much potential to harm people. And on top of that we can’t even prove that humanity is safe from AI, so we should be careful. I don’t want to give an examples for how exactly humanity isn’t safe from AI because it might sound like sci-fi, so I’ll only talk about it abstractly. My point is, we should follow the precautionary principle and be slow because it doesn’t hurt.”
There is an optimum level of “frank futuristicness” between the absurdly high setting and the absurdly low setting. But most people are far below this optimum level.
Ok; what do you think of Soares’s claim that SB-1047 should have been made stronger and the connections to existential risk should have been made more clearly? That seems probably false to me.
I think that if it were to go ahead, it should have been made stronger and clearer. But this wouldn’t have been politically feasible, and therefore if that were the standard being aimed for it wouldn’t have gone ahead.
This I think would have been better than the outcome that actually happened.
To be honest, I don’t know.
All I know is that a lot of organizations seem to be shy talking about the AI takeover risk, and the endorsements the book got surprised me, regarding how receptive government officials are. (Considering how little cherry-picking they did.)
My very uneducated guess is that Newsom vetoed the bill because he was more of a consequentialist/Longtermist than the cause-driven lawmakers who passed the bill, so one can argue the failure mode was a “lack of appeal to consequentialist interests.” One might argue “it passed by cause-driven lawmakers by a wide margin, but got blocked by the consequentialist.” But the cause-driven vs. consequentialist motives are pure speculation, I know nothing about these people aside from Newsom’s explanation...
From my perspective, FWIW, the endorsements we got would have been surprising even if they had been maximally cherry-picked. You usually just can’t find cherries like those.
Just commenting to say this post helped me build the courage to talk about AI risks around me more confidently and to stop thinking I needed to understand everything about it before saying I am deeply worried. Thanks Soares!
Hi there,
I just made an account to offer my anecdotes as an outside observer. I admit, I’m not well versed in the of AI, I’m only a college graduate with a focus on biomedicine. But as a person who found his way to this site following the 2023 letter about AI extinction, I’ve nearly constantly read as much information as I could process to understand the state and stakes of the AI Risk conversation. For what it’s worth, I applaud those who have the courage to speak about something that in the grand scheme of things is hard to discuss with people without sounding condescending, apathetic or crazy. For someone like me, who has trouble with expressing the blood chilling dread I feel reading AI related headlines, I have nothing but the upmost respect for those who may have an influence in guiding this timeline in to one that succeeds.
I just feel like the communication form well informed people gets marred buy the sense of incapabilty. What I mean by that is the average layperson will engage with a meme or a light post about the risks of AI, but then swiftly become disillusioned by the prosepect of not being able to influence anything. (An aside, this is essentially the same situation I felt trying to communicate risks of pandemics and safe health practices) Decades of protests for various causes rarely amount to meaningful changes, and the causes often change and get drowned out but yet another cause to rally behind. Congress is too caught up in culture wars and polarization to be even remotely effective, and the administration seems hellbent on accelerating to the very cliff we should be avoiding. The bills that could’ve have the smallest of teeth get lobbied into being no more than a slap in the wrist. I worry that AI risk, while being probably the single most important issue of this decade, drowns in the pits of disillusionment because the layperson can’t meaningfully change it. They feel, and I feel we are passengers on a ride that the ones in “control” see us as acceptable casualties.
I also feel as if many of the “good” outcomes may come off as extremely uncomfortable and possibly dystopian. People care about the essence of humanity, the art, the music, the emotions both good and bad. The trajectory we are on now feels like the culmination of valuing extreme profit over the essence of humanity (see the copious amounts of people who post about dreaming about running way to nature and escaping a “capitalist nightmare” People are also selfish at best, and selfless for maybe 20 people on a good day in my experience, so it’s much easier to fret about where the next paycheck is going to go, over how AI will happen. They can accept the inevitability of extinction by AI if we fail to align it, but it falls into the same indifference as know the invitablity of death baring any sci-fi tech.
What can be done about this? Honestly I don’t know. For those who are fighting the good fight, have heart and have courage, but also have compassion and empathy for your fellow man. More than likely they are just as worried and scared. Try not to give false hope, but use the wins that you do have on the off chance to fuel momentum. Strike the balance between humility and assertiveness so that people look to you all as the experts you are. I wish those luck.
It’s important for everyone to know that there are many things they can personally do to help steer the world onto a safer trajectory. I am a volunteer with PauseAI, and while I am very bad at grassroots organizing and lobbying, I still managed to start a local chapter and have some positive impact on my federal representative’s office.
I suggest checking out PauseAI’s Take Action page. I also strongly endorse ControlAI, who are more centralized and have an excellent newsletter.
I appreciate your comment, and the work that the individuals of both organizations are attempting to do now, but I can’t help but feel my point was slightly missed.
It’s not that I wouldn’t support these organizations, or even the majority of laypersons would, it’s more simply that there’s a fatigue that comes with the style of protest and outreach these organizations do. There’s only so many dire warnings, so many pickett lines, so many cold calls and outreach before your target audiences becomes either overwhelmed and spirals in to a bout of severe depressions and anxiety (as I have many of times), or over saturated and dismissive since they see it as yet another failed dream of the naive. Unfortunately, years upon years of protest and subsequent media portrayal of protests have supplied people with a bitter pill of indifference and massaged the thoughts of people who could do things into cynical skeptics.
If one wants the Overton window to swing to in a way that could archive meaningful change, the masses must have both hope and direction in which to change things while also striking a delicate balance of being grounded while anticipating greater risks. If you lead with “hey there’s a reasonable chance that this hyper intelligent software will have the ability to completely dismantle the planet into atoms in a microscopic timescale, so we need to stop it now and then point humanity into manifest destiny into the cosmos as the rightful stewards of the stars” people will blankly stare at you with bewilderment. But if you instead start by addressing things like job risks, deepfakes, concentration of power and the totalitarianism, tangible real issues people can see now, they may begin to open that door and then be more susceptible to discussing and acting on existential risk because they have the momentum behind them.
Any movement, grassroots or grand, needs to have the momentum to slog through the doubts and denial, and I feel like if just a few minor sparks like wind could lead to a cascade of positive change. You have to push your victories just as if not higher (within reason as to avoid false hope) than you beat the drum of doom. But again, I’m not an expert. I’m just a kid who is scared and spent time dialoging with people with a drive to create a better future. Please keep fighting the good fight, please keep raging against the coming dark.
I spent approximately a year at PauseAI (Global) soul-searching over this, and I’ve come to the conclusion that this strategy is not a good idea. This is something I’ve changed my mind about.
My original view was something like:
“If we convince people to get on board with pausing AI for any reason, they’ll eventually come round to the extinction risk concerns, in the same way that people who become vegan for e.g. environmental reason usually come around to the moral concerns. This is more efficient than trying to hit people with the extinction risk concerns from the start, since more people will be open to listening to non-extinction concerns.”
I think this is wrong. Recruiting people for non-extinction reasons was harder than I expected. I remember at one point I found a facebook group with 80K people called “Artists Against Generative AI” and got the organizers to share our stuff there and we literally got zero uptake from that. We did a few media campaigns on copyright grounds, and we didn’t get much attention on that. I’m still not sure why this was the case, but we just didn’t make headway. We didn’t get any wins to leverage, we didn’t build any momentum. And even if we did, we would have been pointed in the wrong direction.
I now think something like this:
“Everything we do should have the threat of extinction front-and-centre. We might protest about specific things, but this is ‘wave’ content in the MIRI sense (I don’t know if MIRI is still doing this, I haven’t actually seen them put out any ‘wave’ content but I could easily have missed it) and needs to be fed back into extinction threat concerns. Everything we talk about that isn’t about extinction is in some sense a compromise to The Current Moment, and we should be careful of this.”
Example: we recently protested about DeepMind breaking their commitments from the Seoul summit. Whether or not they keep their commitments is probably not an X-risk lynchpin, but it is something that’s happening now, and it is genuinely bad for them to be defecting in this way. Our signage, our speeches, and our comedic skit all featured extinction risk as a/the major reason to be concerned about whether DeepMind is following their own commitments. This is still a compromise to The Current Moment, a compromise between pointing out 100% clear issues—DeepMind 100% definitely broke their commitments this isn’t debatable, and there is no regulation besides voluntary commitments—and pointing out the actual reason we care whether DeepMind is following their commitments.
In summary:
Hi there!
I apologize for not responding to this very insightful comment, I really appreciate your perspective on my admittedly scatter brained thought parent comment. Your comment definitely has caused me to reflect a-bit on my own, and updated me away slightly from my original position.
I feel I may have been a bit ignorant to the actual state of PauseAI, as like I said in my original comments and replies it felt like an organization dangerously close to becoming orphaned from people’s thought processes. I’m glad to hear there are some ways around the issue I described. Maybe write a top level post about how this shift in understanding is benefiting your messaging to the general public? It may inform others of novel ways to spread a positive movement.
Whether the bill is about extinction threats is nearly irrelevant; what’s important is what its first- and second-order effects will be. Any sensible politician will know of all kinds of cases where a bill’s (or constitutional amendment’s) full effects were either not known or explicitly disavowed by the bill’s proponents.
If you want to quell China fears here, you need to make the in-PRC push for this book a peer to the push for the American English one. Of course, the PRC takes a dim view of foreigners trying to bolster social movements, so — and I’m trying to not say this flippantly — good luck with that.
(I also think pushing this book’s pile of memes in China is a good idea because they’re probably about as capable of making AGI as we are, but that’s a separate issue.)
How would you recommend pushing this book’s pile of memes in China? My first thought would be trying to organize some conference with non-Chinese and (especially) Chinese experts, the kinds who advise the government, centered around the claims of the book. I don’t know how the CCP would view this though, I’m not an expert on Chinese internal politics.
I have no idea whatsoever, sorry.
AI risk discussions are happening at elite and EU institutional levels, but in my experience they’re not reaching regular European citizens, especially in Spain (where I’m originally from).
This is a major problem because politicians respond to what their constituents care about, and if people aren’t talking about it, they won’t demand action on it.
The crux of the issue is that most Europeans simply aren’t familiar with AI risk arguments, so they’re not discussing or prioritizing these issues among their circles. Without this kind of public awareness, further political action from the EU is unlikely, and will likely wane as AI gets more and more important in the economy.
I’d like to encourage you to translate the book into Spanish, French, German, and Italian could help bridge this gap.
Spanish would be especially valuable given its global reach, and how under-discussed these risks are among Spanish society. But the same point extends to other EU countries. There’s almost no awareness of AI risks among citizens, and this book could change that completely.
I really like this.
I think AI concern can fail in two ways:
We lose the argument. We confidently state our confident positions, with no shyness or sugarcoating. It grabs a lot of attention, but it’s negative attention and ridicule. We get a ton of engagement, by those who consider us a low status morbid curiosity. We make a lot of powerful enemies, who relentlessly attack AI concern and convince almost everyone.
We don’t lose any argument, but the argument never happens. We have a ton of impressive endorsements, and anyone who actually reads about the drama learn that our side consists of high status scientists and geniuses. We have no enemies—the only people rarer than someone arguing for AI risk is someone arguing against AI risk. And yet… we are ignored. Politicians are simply too busy to think about this. They may think “I guess your logic is correct… but no other politicians seem to be invested in this, and I don’t really want to be the first one.”
Being bolder increases the “losing argument” risk but decreases the “argument never happens” risk. And this is exactly what we want at this point in time. (As long as you don’t do negative actions like traffic obstruction protests.)
PS: I also think there are two kinds of burden of proof:
Rational burden of proof. The debater who argues we are 100% safe has the burden of proof, while the debater arguing that “building a more intelligent species doesn’t seem very safe” has no burden of proof.
Psychological burden of proof. The debater arguing the position that “everyone seems to agree with” has no burden of proof, while the debater arguing the radical extreme position has the burden of proof.
How the heck do we decide which position is the “radical extreme position?” It depends on many things, e.g. how many expert endorsements support AI concern, and how many expert endorsements (e.g. Yann LeCun) reject it. But clearly, the balance seems to be in favour of AI concern yet it’s still AI concern which suffers from the psychological burden of proof.
So maybe the problem is not expert endorsements, but ordinary layman beliefs? Well 55% of Americans surveyed agree that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Only 12% disagree.
So maybe it really is vibes?! You just have to emphasize that this is a strongly supported position, this is what experts think, and if you think “this is insanity,” you’re out of the loop. You gotta read it up, because this great paradigm shift quietly happened, when you were paying attention to other things.
Given that the psychological burden of proof might work this way, even risk (1), “we lose the argument,” could actually be reduced if we are more confident.
I haven’t read the book (though pre ordered it!) but I don’t think the statement in its title is true. I agree however with this post that people should be honest about their beliefs.
It seems like I see this sort of thing (people being evasive or euphemistic when discussing AI risk) a LOT. Even discussions by the AI2027 authors, who specifically describe a blow by blow extinction scenario that ends with AI killing 99% of humans with a bioweapon and then cleaning up the survivors with drones, frequently refer to it with sanitized language like “it could move against humans” [1] or “end up taking over” [2]. Guys you literally wrote a month by month extinction scenario with receipts! No need to protect our feelings here!
https://open.substack.com/pub/astralcodexten/p/introducing-ai-2027?utm_source=share&utm_medium=android&r=qmni)
https://www.dwarkesh.com/p/scott-daniel
I can attest that for me talking about AI dangers in an ashamed way has rarely if ever prompted a positive response. I’ve noticed and been told that it gives ‘intellectual smartass’ vibes rather than ‘concerned person’ vibes.
A plea for having the courage to morally stigmatize the people working in the AGI industry:
I agree with Nate Soares that we need to show much more courage in publicly sharing our technical judgments about AI risks—based on our understanding of AI, the difficulties of AI alignment, the nature of corporate & geopolitical arms races, the challenges of new technology regulation & treaties, etc.
But we also need to show much more courage in publicly sharing our social and moral judgments about the evils of the real-life, flesh-and-blood people who are driving these AI risks—specifically, the people leading the AI industry, working in it, funding it, lobbying for it, and defending it on social media.
Sharing our technical concerns about these abstract risks isn’t enough. We also have to morally stigmatize the specific groups of people imposing these risks on all of us.
We need the moral courage to label other people evil when they’re doing evil things.
If we don’t do this, we look like hypocrites who don’t really believe that AGI/ASI would be dangerous.
Moral psychology teaches us that moral judgments are typically attached not just to specific actions, or to emergent social forces (e.g. the ‘Moloch’ of runaway competition), or to sad Pareto-inferior outcomes of game-theoretic dilemmas, but to people. We judge people. As moral agents. Yes, even including AI researchers and devs.
If we want to make credible claims that ‘the AGI industry is recklessly imposing extinction risks on all of our kids’, and we’re not willing to take the next step of saying ‘and also, the people working on AGI are reckless and evil and wrong and should be criticized, stigmatized, ostracized, and punished’, then nobody will take us seriously.
As any parent knows, if some Bad Guy threatens your kids, you defend your kids and you denounce the Bad Guy. Your natural instinct is to rally social support to punish them. This is basic social primate parenting, of the sort that’s been protecting kids in social groups for tens of millions of years.
If you don’t bother to rally morally outraged support against those who threaten kids, then the threat wasn’t real. This is how normal people think. And rightfully so.
So why don’t we have the guts to vilify the AGI leaders, devs, investors, and apologists, if we’re so concerned about AGI risk?
Because too many rationalists, EAs, tech enthusiasts, LessWrong people, etc still see those AI guys as ‘in our tribe’, based on sharing certain traits we hold dear—high IQ, high openness, high decoupling, Aspy systematizing, Bay Area Rationalist-adjacent, etc. You might know some of the people working in OpenAI, Anthropic, DeepMind, etc. -- they might be your friends, housemates, neighbors, relatives, old school chums, etc.
But if you take seriously their determination to build AGI/ASI—or even to work in ‘AI safety’ at those companies, doing their performative safety-washing and PR—then they are not the good guys.
We have to denounce them as the Bad Guys. As traitors to our species. And then, later, once they’ve experienced the most intense moral shame they’ve ever felt, and gone through a few months of the worst existential despair they’ve ever felt, and they’ve suffered the worst social ostracism they’ve ever experienced, we need to offer them a path towards redemption—by blowing the whistle on their former employers, telling the public what they’ve seen on the inside of the AI industry, and joining the fight against ASI.
This isn’t ‘playing dirty’ or ‘giving in to our worst social instincts’. On the contrary. Moral stigmatization and ostracism of evil-doers is how social primate groups have enforced cooperation norms for millions of years. It’s what keeps the peace, and supports good social norms, and protects the group. If we’re not willing to use the moral adaptations that evolved specifically to protect our social groups from internal and external threats, then we’re not really taking those threats seriously.
PS I outlined this ‘moral backlash’ strategy for slowing reckless AI development in this EA Forum point
I’m with you up until here; this isn’t just a technical debate, it’s a moral and social and political conflict with high stakes, and good and bad actions.
To be really nitpicky, I technically agree with this as stated: we should stigmatize groups as such, e.g. “the AGI capabilities research community” is evil.
Oops, this is partially but importantly WRONG. From Braχot 10a:
Not everyone who is doing evil things is evil. Some people are evil. You should hate no more than necessary, but not less than that. You should hate evil, and hate evildoers if necessary, but not if not necessary.
Schmidhuber? Evil. Sutton? Evil. Larry Page? Evil. If, after reflection, you endorse omnicide, you’re evil. Altman? Evil and probably a sociopath.
Up-and-coming research star at an AI lab? Might be evil, might not be. Doing something evil? Yes. Is evil? Maybe, it depends.
Essentializing someone by calling them evil is an escalation of a conflict. You’re closing off lines of communication and gradual change. You’re polarizing things: it’s harder for that one person to make gradual moves in belief space and social space and life-narrative space, and it’s harder for groups to have group negotiations. Sometimes escalation is good and difficult and necessary, but sometimes escalation is really bad! Doing a more complicated subtle thing with more complicated boundaries is more difficult. And more brave, if we’re debating bravery here.
So:
Good:
Good:
Good:
Bad:
Sidenote:
I agree that this is an improper motivation for treating some actions with kid gloves, which will lead to incorrect action; and that this is some of what’s actually happening.
TsviBT—thanks for a thoughtful comment.
I understand your point about labelling industries, actions, and goals as evil, but being cautious about labelling individuals as evil.
But I don’t think it’s compelling.
You wrote ‘You’re closing off lines of communication and gradual change. You’re polarizing things.’
Yes, I am. We’ve had open lines of communication between AI devs and AI safety experts for a decade. We’ve had pleas for gradual change. Mutual respect, and all that. Trying to use normal channels of moral persuasion. Well-intentioned EAs going to work inside the AI companies to try to nudge them in safer directions.
None of that has worked. AI capabilities development is outstripping AI safety developments at an ever-increasing rate. The financial temptations to stay working inside AI companies keep increasing, even as the X risks keep increasing. Timelines are getting shorter.
The right time to ‘polarize things’ is when we still have some moral and social leverage to stop reckless ASI development. The wrong time is after it’s too late.
Altman, Amodei, Hassabis, and Wang are buying people’s souls—paying them hundreds of thousands or millions of dollars a year to work on ASI development, despite most of their workers they supervise knowing that they’re likely to be increasing extinction risk.
This isn’t just a case of ‘collective evil’ being done by otherwise good people. This is a case of paying people so much that they ignore their ethical qualms about what they’re doing. That makes the evil very individual, and very specific. And I think that’s worth pointing out.
(This rhetoric is not quite my rhetoric, but I want to affirm that I do believe that ~most people working at big AI companies are contributing to the worst atrocity in human history, are doing things that are deontologically prohibited, and are morally responsible for that.)
Ben—so, we’re saying the same things, but you’re using gentler euphemisms.
I say ‘evil’; you say ‘deontologically prohibited’.
Given the urgency of communicating ASI extinction risks to the public, why is this the time for gentle euphemisms?
For one, I think I’m a bit scared of regretting my choices. Like, calling someone evil and then being wrong about it isn’t something where you just get to say “oops, I made a mistake” afterwards, you did meaningfully move to socially ostracize someone, mark them as deeply untrustworthy, and say that good people should remove their power, and you kind of owe them something significant if you get that wrong.
For two, a person who has done evil, versus a person who is evil, are quite different things. I think that it’s sadly not always the case that a person’s character is aligned with a particular behavior of theirs. I think it’s not accurate to think of all the people building the doomsday machines as generically evil and someone who will do awful things in lots of different contexts, I think there’s a lot of variation in the people and their psychologies and predispositions, and some are screwing up here (almost unforgivably, to be clear) in ways they wouldn’t screw up in different situations.
I do think many of the historical people most widely considered to be evil now were similarly not awful in full generality, or even across most contexts. For example, Eichmann, the ops lead for the Holocaust, was apparently a good husband and father, and generally took care not to violate local norms in his life or work. Yet personally I feel quite comfortable describing him as evil, despite “evil” being a fuzzy folk term of the sort which tends to imperfectly/lossily describe any given referent.
I’m not quite sure what I make of this, I’ll take this opportunity to think aloud about it.
I often take a perspective where most people are born a kludgey mess, and then if they work hard they can become something principled and consistent and well-defined. But without that, they don’t have much in the way of persistent beliefs or morals such that they can be called ‘good’ or ‘evil’.
I think of an evil person as someone more like Voldemort in HPMOR, who has reflected on his principles and will be persistently a murdering sociopath, than someone who ended up making horrendous decisions but wouldn’t in a different time and place. I think if you put me under a lot of unexpected political forces and forced me to make high-stakes decisions, I could make bad decisions, but not because I’m a fundamentally bad person.
I do think it makes sense to write people off as bad people, in our civilization. There are people who have poor impulse control, who have poor empathy, who are pathological liars, and who aren’t save-able by any of our current means, and will always end up in jail or hurting people around them. I rarely interact with such people so it’s hard for me to keep this in mind, but I do believe such people exist.
But evil seems a bit stronger than that, it seems a bit more exceptional. Perhaps I would consider SBF an evil person; he seems to me someone who knew he was a sociopath from a young age, and didn’t care about people, and would lie and deceive, was hyper-competent, and I expect that if you release him into society he will robustly continue to do extreme amounts of damage.
Is that who Eichmann was? I haven’t read the classic book on him, but I thought the point of ‘the banality of evil’ was that he seemed quite boring and like many other people? Is it the case that you could replace Eichmann with like >10% of the population and get similar outcomes? 1%? I am not sure if it is accurate to think of that large a chunk of people as ‘evil’, as being the kind of robustly bad people who should probably be thrown in prison for the protection of civilization. My current (superficial) understanding is that Eichmann enacted an atrocity without being someone who would persistently do so in many societies. He had the capacity for great evil, but this was not something he would reliably seek out.
It is possible that somehow thousands of people like SBF and Voldemort have gotten together to work at AI companies; I don’t currently believe that. To be clear, I think that if we believe there are evil people, then it must surely describe some of the people working at big AI companies that are building doomsday machines, who are very resiliently doing so in the face of knowing that they’re hastening the end of humanity, but I don’t currently think it describes most of the people.
This concludes my thinking aloud; I would be quite interested to read more of how your perspective differs, and why.
(cf. Are Your Enemies Innately Evil? from the Sequences)
Ben—your subtext here seems to be that only lower-class violent criminals are truly ‘evil’, whereas very few middle/upper-class white-collar people are truly evil (with a few notable exceptions such as SBF or Voldemort) -- with the implications that the majority of ASI devs can’t possibly be evil in the ways I’ve argued.
I think that doesn’t fit the psychological and criminological research on the substantial overlap between psychopathy and sociopathy, and between violent and non-violent crime.
It also doesn’t fit the standard EA point that a lot of ‘non-evil’ people can get swept up in doing evil collective acts as parts of collectively evil industries, such as slave-trading, factory farming, Big Tobacco, the private prison system, etc. - but that often, the best way to fight such industries is to use moral stigmatization.
You mis-read me on the first point; I said that (something kind of like) ‘lower-class violent criminals’ are sometimes dysfunctional and bad people, but I was distinguishing that from someone more hyper competent and self-aware like SBF or Voldemort; I said that only the latter are evil. (For instance, they’ve hurt orders of magnitude more people.)
(I’m genuinely not sure what research you’re referring to – I am expect you are 100x as familiar with the literature as I am, and FWIW I’d be happy to get a pointer or two of things to read.[1])
The standard EA point is to use moral stigmatization? Even if that’s accurate, I’m afraid I no longer have any trust in EAs to do ethics well. As an example that you will be sympathetic to, lots of them have endorsed working at AI companies over the past decade (but many many other examples have persuaded me of this point).
To be clear, I am supportive of moral stigma being associated with working at AI companies. I’ve shown up to multiple protests outside the companies (and I brought my mum!). If you have any particular actions in mind to encourage me to do (I’m probably not doing as much as I could) I’m interested to hear them. Perhaps you could write a guide to how to act when dealing with people in your social scene who work on building doomsday devices in a way that holds a firm moral line while not being socially self-destructive / not immediately blowing up all of your friendships. I do think more actionable advice would be helpful.
I expect it’s the case that crime rates correlate with impulsivity, low-IQ, and wealth (negatively). Perhaps you’re saying that psychopathy and sociopathy do not correlate with social class? That sounds plausible. (I’m also not sure what you’re referring to with the violent part, my guess is that violent crimes do correlate with social class.)
Eichmann was definitely evil. The popular conception of Eichmann as merely an ordinary guy who was “just doing his job” and was “boring” is partly mischaracterization of Arendt’s work, partly her own mistakes (i.e., her characterizations, which are no longer considered accurate by historians).
An example of the former:
(Hmm… that last line is rather reminiscent of something, no?)
Concerning the latter:
I am one of those people that are supposed to be stigmatized/detterred by this action. I doubt this tactic will be effective. This thread (including the disgusting comparison to Eichmann who directed the killing of millions in the real world—not in some hypothetical future one) does not motivate me to interact with the people holding such positions. Given that much of my extended family was wiped out by the holocaust, I find these Nazi comparisons abhorrent, and would not look forward to interact with people making them whether or not they decide to boycott me.
BTW this is not some original tactic, PETA is using similar approaches for veganism. I don’t think they are very effective either.
To @So8res—I am surprised and disappointed that this Godwin’s law thread survived a moderation policy that is described as “Reign of Terror”
I’ve often appreciated your contributions here, but given the stakes of existential risk, I do think that if my beliefs about risk from AI are even remotely correct, then it’s hard to escape the conclusion that the people presently working at labs are committing the greatest atrocity that anyone in human history has or will ever commit.
The logic of this does not seem that complicated, and while I disagree with Geoffrey Miller on how he goes about doing things, I have even less sympathy for someone reacting to a bunch of people really thinking extremely seriously and carefully about whether what that person is doing might be extremely bad with “if people making such comparisons decide to ostracize me then I consider it a nice bonus”. You don’t have to agree, but man, I feel like you clearly have the logical pieces to understand why one could believe you are causing extremely great harm, without that implying the insanity of the person believing that.
I respect at least some of the people working at capability labs. One thing that unites all of the ones I do respect is that they treat their role at those labs with the understanding that they are in a position of momentous responsibility, and that them making mistakes could indeed cause historically unprecedented levels of harm. I wish you did the same here.
I edited the original post to make the same point with less sarcasm.
I take risk from AI very seriously which is precisely why I am working in alignment at OpenAI. I am also open to talking with people having different opinions, which is why I try to follow this forum (and also preordered the book). But I do draw the line at people making Nazi comparisons.
FWIW I think radicals often hurt the causes they espouse, whether it is animal rights, climate change, or Palestine. Even if after decades the radicals are perceived to have been on “the right side of history”, their impact was often negative and it caused that to have taken longer: David Shor was famously cancelled for making this point in the context of the civil rights movement.
Sorry to hear the conversation was on a difficult topic for you; I imagine that is true for many of the Jewish folks we have around these parts.
FWIW I think we were discussing Eichmann in order to analyze what ‘evil’ is or isn’t, and did not make any direct comparisons between him and anyone.
...oh, now I see what Said’s “Hmm… that last line is rather reminiscent of something, no?” is probably making such a comparison (I couldn’t tell what he meant it when I read it initially). I can see why you’d respond negatively to that. While there’s a valid point to be made about how people who just try to gain status/power/career-capital without thinking about ethics can do horrendous things, I do not think that it is healthy for discourse to express that in the passive-aggressive way that Said did.
The comparisons invite themselves, frankly. “Careerism without moral evaluation of the consequences of one’s work” is a perfect encapsulation of the attitudes of many of the people who work in frontier AI labs, and I decline to pretend otherwise.
(And I must also say that I find the “Jewish people must not be compared to Nazis” stance to be rather absurd, especially in this sort of case. I’m Jewish myself, and I think that refusing to learn, from that particular historical example, any lessons whatsoever that could possibly ever apply to our own behavior, is morally irresponsible in the extreme.)
EDIT: Although the primary motivation of my comment about Eichmann was indeed to correct the perception of the historians’ consensus, so if you prefer, I can remove the comparison to a separate comment; the rest of the comment stands without that part.
I agree with your middle paragraph.
To be clear, I would approve more of a comment that made the comparison overtly[0], rather than one that made it in a subtle way that was harder to notice or that people missed (I did not realize what you were referring to until I tried to puzzle at why boaz had gotten so upset!). I think it is not healthy for people to only realize later that they were compared to Nazis. And I think it fair for them to consider that an underhanded way to cause them social punishment, to do it in a way that was hard to directly respond to. I believe it’s healthier for attacks[1] to be open and clear.
[0] To be clear, there may still be good reasons to not throw in such a jab at this point in the conversation, but my main point is that doing it with subtlety makes it worse, not better, because it also feels sneaky.
[1] “Attacks”, a word which here means “statements that declare someone has a deeply rotten character or whose behavior has violated an important norm, in a way that if widely believed will cause people to punish them”.
(I don’t mean to derail this thread with discussion of discussion norms. Perhaps if we build that “move discourse elsewhere button” that can later be applied back to this thread.)
Thank you Ben. I don’t think name calling and comparisons are helpful to a constructive debate, which I am happy to have. Happy 4th!
boazbarak—I don’t understand your implication that my position is ‘radical’.
I have exactly the same view on the magnitude of ASI extinction risk that every leader of a major AI company does—that it’s a significant risk.
The main difference between them and me is that they are willing to push ahead with ASI development despite the significant risk of human extinction, and I think they are utterly evil for doing so, because they’re endangering all of our kids.
In my view, risking extinction for some vague promise of an ASI utopia is the radical position. Protecting us from extinction is a mainstream, commonsense, utterly normal human position.
(From a moderation perspective:
I consider the following question-cluster to be squarely topical: “Suppose one believes it is evil to advance AI capabilities towards superintelligence, on the grounds that such a superintelligence would quite likely to kill us all. Suppose further that one fails to unapologetically name this perceived evil as ‘evil’, e.g. out of a sense of social discomfort. Is that a failure of courage, in the sense of this post?”
I consider the following question-cluster to be a tangent: “Suppose person X is contributing to a project that I believe will, in the future, cause great harms. Does person X count as ‘evil’? Even if X agrees with me about which outcomes are good and disagrees about the consequences of the project? Even if the harms of the project have not yet occurred? Even if X would not be robustly harmful in other circumstances? What if X thinks they’re trying to nudge the project in a less-bad direction?”
I consider the following sort of question to be sliding into the controversy attractor: “Are people working at AI companies evil?”
The LW mods told me they’re considering implementing a tool to move discussions to the open thread (so that they may continue without derailing the topical discussions). FYI @habryka: if it existed, I might use it on the tangents, idk. I encourage people to pump against the controversy attractor.)
I agree with you on the categorization of 1 and 2. I think there is a reason why Godwin’s law was created once thread follow the controversy attractor to this direction they tend to be unproductive.
I completely agree this discussion should be moved outside your post. But the counterintuitive mechanics of LessWrong mean a derailing discussion may actually increase the visibility and upvotes of your original message (by bumping it in the “recent discussion”).
(It’s probably still bad if it’s high up in the comment section.)
It’s too bad you can only delete comment threads, you can’t move them to the bottom or make them collapsed by default.
The apparent aim of OpenAI (making AGI, even though we don’t know how to do so without killing everyone) is evil.
I agree that a comparison to Eichmann is not optimal.
Instead, if AI turns out to have consequences so bad that they outweigh the good, it’ll have been better to compare people working at AI labs to Thomas Midgley, who insisted that his leaded gasoline couldn’t be dangerous even when presented with counter-evidence, and Edward Teller, who (as far as I can tell) was simply fascinated by the engineering challenges of scaling hydrogen bombs to levels that could incinerate entire continents.
These two people still embody two archetypes of what could reasonably be called “evil”, but arguably fit better with the psychology of people currently working at AI labs.
These two examples also avoid Godwin’s law type attractors.
That’s interesting to hear that many historians believe he was secretly more ideologically motivated than Arendt thought, and also believe that he portrayed a false face during all of the trials, thanks for the info.
Yes, it takes courage to call people out as evil, because you might be wrong, you might unjustly ruin their lives, you might have mistakenly turned them into scapegoat, etc. Moral stigmatization carries these risks. Always has.
And people understand this. Which is why, if we’re not willing to call the AGI industry leaders and devs evil, then people will see us failing to have the courage of our convictions. They will rightly see that we’re not actually confident enough in our judgments about AI X-risk to take the bold step of pointing fingers and saying ‘WRONG!’.
So, we can hedge our social bets, and try to play nice with the AGI industry, and worry about making such mistakes. Or, we can save humanity.
To be clear, I think it would probably be reasonable for some external body like the UN to attempt to prosecute & imprison ~everyone working at big AI companies for their role in racing to build doomsday machines. (Most people in prison are not evil.) I’m a bit unsure if it makes sense to do things like this retroactively rather than to just outlaw it going forward, but I think it sometime makes sense to prosecute atrocities after the fact even if there wasn’t a law against it at the time. For instance my understanding is that the Nuremberg trials set precedents for prosecuting people for war crimes, crimes against humanity, and crimes against peace, even though legally they weren’t crimes at the time that they happened.
I just have genuine uncertainty about the character of many of the people in the big AI companies and I don’t believe they’re all fundamentally rotten people! And I think language is something that can easily get bent out of shape when the stakes are high, and I don’t want to lose my ability to speak and be understood. Consequently I find I care about not falsely calling people’s character/nature evil when what I think is happening is that they are committing an atrocity, which is similar but distinct.
My answer to this is “because framing things in terms of evil turns the situation more mindkilly, not really the right gears, and I think this domain needs clarity-of-thought more than it needs a social-conflict orientation”
(I’m not that confident about that, and don’t super object to other people calling them evil. But I think “they are most likely committing a great atrocity” is pretty non-euphemistic and more true)
Openly calling people evil has some element of “deciding who to be in social conflict with”, but insofar as it has some element of “this is simply an accurate description of the world” then FWIW I want to note that this consideration partially cuts in favor of just plainly stating what counts as evil, even whether specific people have met that bar.
If I thought evil was a more useful gear here I’d be more into it. Like, I think “they probably committing an atrocity” carves reality at the joints.
I think there are maybe 3 people involved with AGI development who seem like they might be best described as “evil” (one of whom is Sam Altman, who I feel comfortable naming because I think we’ve seen evidence of him doing nearterm more mundane evil, rather than making guesses about their character and motivations)
I think it probably isn’t helpful to think of Eichmann as evil, though again fine to say “he commited atrocities” or even “he did evil.”
Yeah that makes sense.
As an aside, I notice that I currently feel much more reticent to name individuals who there is not some sort of legal/civilizational consensus about their character. I think I am a bit worried about contributing to the dehumanization of people who are active players, and a drop in basic standards of decency toward them, even if I were to believe they were evil.
I’m personally against this as matter of principle, and I also don’t think it’ll work.
Moral stigmatizing only works against a captive audience. It doesn’t work against people who can very easily ignore you.
You’re more likely to stop eating meat if a kind understanding vegetarian/vegan talks to you and makes you connect with her story of how she stopped eating meat. You’re more likely to simply ignore a militant one who calls you a murderer.
Moral stigmatizing failed to stop nuclear weapon developers, even though many of them were the same kind of “nerd” as AI researchers.
People see Robert Oppenheimer saying “Now, I am become Death, the destroyer of worlds” as some morally deep stuff. “The scientific community ostracized [Edward] Teller,” not because he was very eager to build bigger bombs (like the hydrogen bomb and his proposed Sundial), but because he made Oppenheimer lose his security clearance by saying bad stuff about him.
Which game do you choose to play? The game of dispassionate discussion, where the truth is on your side? Or the game of Twitter-like motivated reasoning, where your side looks much more low status than the AI lab people, and the status quo is certainly not on your side?
Imagine how badly we’ll lose the argument if people on our side are calling them evil and murderous and they’re talking like a sensible average Joe trying to have a conversation with us.
Moral stigmatization seems to backfire rather than help for militant vegans because signalling hostility is a bad strategy when you’re the underdog going against the mainstream. It’s extremely big ask for ordinary people show hostility towards other ordinary people who no one else is hostile towards. It’s even difficult for ordinary people to be associated with a movement which shows such hostility. Most people just want to move on with their lives.
I think you’re underestimating the power of backlashes to aggressive activism. And I say this, despite the fact just a few minutes ago I was arguing to others that they’re overestimating the power of backlashes.
The most promising path to slowing down AI is government regulation, not individuals ceasing to do AI research.
- Think about animal cruelty. Government regulation has succeeded on this many times. Trying to shame people who work in factory farms into stopping, has never worked, and wise activists don’t even consider doing this.
- Think about paying workers more. Raising the minimum wage works. Shaming companies into feeling guilty doesn’t. Even going on strike doesn’t work as well as minimum wage laws.
- Despite the fact half of the employees refusing to work is like 10 times more powerful than non-employees holding a sign saying “you’re evil.”
- Especially a tiny minority of society holding those signs
- Though then again, moral condemnation is a source of government regulation.
Disclaimer: not an expert just a guy on the internet
Strong disagree, but strong upvote because it’s “big if true.” Thank you for proposing a big crazy idea that you believe will work. I’ve done that a number of times, and I’ve been downvoted into the ground without explanation, instead of given any encouraging “here’s why I don’t think this will work, but thank you.”
Hi Knight, thanks for the thoughtful reply.
I’m curious whether you read the longer piece about moral stigmatization that I linked to at EA Forum? It’s here, and it addresses several of your points.
I have a much more positive view about the effectiveness of moral stigmatization, which I think has been at the heart of almost every successful moral progress movement in history. The anti-slavery movement stigmatized slavery. The anti-vivisection movement stigmatized torturing animals for ‘experiments’. The women’s rights movement stigmatized misogyny. The gay rights movement stigmatized homophobia.
After the world wars, biological and chemical weapons were not just regulated, but morally stigmatized. The anti-landmine campaign stigmatized landmines.
Even in the case of nuclear weapons, the anti-nukes peace movement stigmatized the use and spread of nukes, and was important in nuclear non-proliferation, and IMHO played a role in the heroic individual decisions by Arkhipov and others not to use nukes when they could have.
Regulation and treaties aimed to reduce the development, spread, and use of Bad Thing X, without moral stigmatization of Bad Thing X, doesn’t usually work very well. Formal law and informal social norms must typically reinforce each other.
I see no prospect for effective, strongly enforced regulation of ASI development without moral stigmatization of ASI development. This is because, ultimately, ‘regulation’ relies on the coercive power of the state—which relies on agents of the state (e.g. police, military, SWAT teams, special ops teams) being willing to enforce regulations even against people with very strong incentives not to comply. And these agents of the state simply won’t be willing to use government force against ASI devs violating regulations unless these agents already believe that the regulations are righteous and morally compelling.
That’s a very good point, and these examples really changes my intuition from “I can’t see this being a good idea,” to “this might make sense, this might not, it’s complicated.” And my earlier disagreement mostly came from my intuition.
I still have disagreements, but just to clarify I now agree your idea deserves more attention that it’s getting.
My remaining disagreement is I think stigmatization only reaches the extreme level of “these people are literally evil and vile,” after the majority of people already agree.
In places in India where the majority of people are already vegetarians, and already feel that eating meat is wrong, the social punishment of meat eaters does seem to deter them.
But in places where most people don’t think eating meat is wrong, prematurely calling meat eaters evil may backfire. This is because you’ve created a “moral-duel” where you force outside observers to either think the meat-eater is the bad guy, or you’re the bad guy (or stupid guy). This “moral-duel” drains the moral standing of both sides.
If you’re near the endgame, and 90% of people already are vegetarians, then this moral-duel will first deplete the meat-eater’s moral standing, and may solidify vegetarianism.
But if you’re at the beginning, when only 1% of people support your movement. You desperately want to invest your support and credibility into further growing your support and credibility, rather than burning it in a moral-duel against the meat-eater majority the way militant vegans did.
Nurturing credibility is especially important for AI Notkilleveryoneism, where the main obstacle is a lack of credibility and “this sounds like science fiction.”
Finally, at least only go after the AI lab CEOs, as they have relatively less moral standing, compared to the rank and file researchers.
E.g. in this quicktake Mikhail Samin appealed to researchers as friends asking them to stop “deferring” to their CEO.
Even for nuclear weapons, biological weapons, chemical weapons, landmines, it was hard to punish scientists researching it. Even for the death penalty, it was hard to punish the firing squad soldiers. It’s easier to stick it to the leaders. In an influential book by early feminist Lady Constance Lytton, she repeatedly described the policemen (who fought the movement) and even prison guards as very good people and focused the blame on the leaders.
PS: I read your post, it was a fascinating read. I agree with the direction of it and I agree the factors you mention are significant, but it might not go quite as far as you describe?
Knight—thanks again for the constructive engagement.
I take your point that if a group is a tiny and obscure minority, and they’re calling the majority view ‘evil’, and trying to stigmatize their behavior, that can backfire.
However, the surveys and polls I’ve seen indicate that the majority of humans already have serious concerns about AI risks, and in some sense are already onboard with ‘AI Notkilleveryoneism’. Many people are under-informed or misinformed in various ways about AI, but convincing the majority of humanity that the AI industry is acting recklessly seems like it’s already pretty close to feasible—if not already accomplished.
I think the real problem here is raising public awareness about how many people are already on team ‘AI Notkilleveryoneism’ rather than team ‘AI accelerationist’. This is a ‘common knowledge’ problem from game theory—the majority needs to know that they’re in the majority, in order to do successful moral stigmatization of the minority (in this case, the AI developers).
Haha you’re right, in another comment I was saying
To be honest, I’m extremely confused. Somehow, AI Notkilleveryoneism… is both a tiny minority and a majority at the same time.
That makes sense, it seems to explain things. The median AI expert also has a 5% to 10% chance of extinction, which is huge.
I’m still not in favour of stigmatizing AI developers, especially right now. Whether AI Notkilleveryoneism is a real minority or an imagined minority, if it gets into a moral-duel with AI developers, it will lose status, and it will be harder for it to grow (by convincing people to agree with it, or by convincing people who privately agree to come out of the closet).
People tend to follow “the experts” instead of their very uncertain intuitions about whether something is dangerous. With global warming, the experts were climatologists. With cigarette toxicity, the experts were doctors. But with AI risk, you were saying that,
It sounds like, the expertise people look to when deciding “whether AI risk is serious or sci-fi,” comes from leading AI scientists, and even AI company CEOs. Very unfortunately, we may depend on our good relations with them… :(
Moral ostracisation of factory farmers is somewhat ineffective because the vast majority of people are implicated in factory farming. They fund it every day and view eating animals as a key part of their identity.
Calling factory farming murder/torture is calling nearly every member of the public a murderer/torturer. (Which may be true but is unlikely to get them to change their habits)
Calling the race to ASI murder is only calling AI researchers and funders murderers. The general public are not morally implicated and don’t view use of AI as a key part of their identity.
The polling shows that they’re not on board with the pace of AI development, think it poses a significant risk of human extinction, and that they don’t trust the CEOs of AI companies to act responsibly.
That’s a very good point, and I didn’t really analyze the comparison.
I guess maybe meat eating isn’t the best comparison.
The closest comparison might be researchers developing some other technology, which maybe 2⁄3 people see as a net negative. E.g. nuclear weapons, autonomous weapons, methods for extracting fossil fuel, tobacco, etc.
But no campaign even really tried to stigmatize these researchers. Every single campaign against these technologies have targeted the companies, CEOs, or politicians leading them, without really any attack towards the researchers. Attacking them is sort of untested.
This is self-indulgent, impotent fantasy. Everyone agrees that people hurting children is bad. People are split on whether AGI/ASI is an existential threat.[1] There is no “we” beyond “people who agree with you”. “They” are not going to have anything like the reaction you’re imagining. Your strategy of screaming and screaming and screaming and screaming and screaming and screaming and screaming and screaming is not an effective way of changing anyone’s mind.
Anyone responding “but it IS an existential threat!!” is missing the point.
Richard—I think you’re just factually wrong that ‘people are split on whether AGI/ASI is an existential threat’.
Thousands of people signed the 2023 CAIS statement on AI risk, including almost every leading AI scientist, AI company CEO, AI researcher, AI safety expert, etc.
There are a few exceptions, such as Yann LeCun. And there are a few AI CEOs, such as Sam Altman, who had previously acknowledged the existential risks, but now downplay them.
But if all the leading figures in the industry—including Altman, Amodei, Hassabis, etc—have publicly and repeatedly acknowledged the existential risks, why would you claim ‘people are split’?
You just mentioned LeCun and “a few AI CEOs, such as Sam Altman” as exceptions, so it isn’t by any means “all the leading figures”. I would also name Mark Zuckerberg, who has started “Superintelligence Labs” with the aim of “personal superintelligence for everyone”, with nary a mention of how if anyone builds it, everyone dies. Presumably all the talent he’s bought are on board with that.
I also see various figures (no names to hand) pooh-poohing the very idea of ASI at all, or of ASI as existential threat. They may be driven by the bias of dismissing the possibility of anything so disastrous as to make them have to Do Something and miss lunch, but right or wrong, that’s what they say.
And however many there are on each side, I stand by my judgement of the futility of screaming shame at the other side, and of the self-gratifying fantasy about how “they” will react.
I think this is a pretty unhelpful frame. Most people working at an AI lab are somewhere between “person of unremarkable moral character who tells themselves a vague story about how they’re doing good things” and “deeply principled person trying their best to improve the world as best they can”. I think working at an AI lab requires less failure of moral character than, say, working at a tobacco company, for all that the former can have much worse effects on the world.
There are a few people I think it is fair to describe as actively morally bad, and willfully violating deontology—it seems likely to me that this is true of Sam Altman, for instance—but I think “evil” is just not a very helpful word here, will not usefully model the actions of AI lab employees, and will come across as obviously disingenuous to anyone who hears such rhetoric if they actually interact with any of the people you’re denigrating. If you had to be evil to end the world, the world would be a lot safer!
I think it’s fine and good to concentrate moral opprobrium at specific actions people take that are unprincipled or clear violations of deontology—companies going back on commitments, people taking on roles or supporting positions that violate principles they’ve previously expressed, people making cowardly statements that don’t accurately reflect their beliefs for the sake of currying favor. I think it’s also fine and good to try and convince people that what they’re doing is harmful, and that they should quit their jobs or turn whistleblower or otherwise change course. But the mere choice of job title is usually not a deontology violation for these people, because they don’t think it has the harms to the world you think it does! (I think at this point it is probably somewhat of a deontological violation to work in most roles at OpenAI or Meta AI even under typical x-risk-skeptical worldviews, but only one that indicates ethical mediocrity rather than ethical bankruptcy.)
(For context, I work on capabilities at Anthropic, because I think that reduces existential risk on net; I think there’s around a 25% chance that this is a horrible mistake and immensely harmful for the world. I think it’s probably quite bad for the world to work on capabilities at other AI labs.)
I don’t think this step is locally valid? Or at least, in many situations, I don’t think ignorance of the consequences if your actions absolves you of responsibility for them.
As an example, if you work hard to help elect a politician who you believe was principled and good, and then when they get into office they’re a craven sellout who causes thousands of people to die, you bear some responsibility for it and for cleaning up your mess. As another example, if you work hard at a company and then it turns out the company is a scam and you’ve stolen money from all your customers, you bear some responsibility to clean up the mess and help the people whose lives your work ruined.
Relatedly, it is often the case that the right point to apply liability is when someone takes an action with a lot of downside, regardless of intent. Here are some legal examples a shoggoth gave me of holding people accountable even if they didn’t know the harm they were causing.
A company can be liable for harm caused by its products even if it followed all safety procedures.
Employers are held responsible for harms caused by employees acting within the scope of their job.
Sellers may be liable for false statements that cause harm, even if they made them in good faith.
These examples are a bit different. Anyhow, I think that if you work at a company that builds a doomsday machine, you bear some responsibility for that even if you didn’t know.
Yeah, sorry—I agree that was a bit sloppy of me. I think it is very reasonable to accuse people working at major AI labs of something like negligence / willful ignorance, and I agree that can be a pretty serious moral failing (indeed I think it’s plausibly the primary moral failing of many AI lab employees). My objection is more to the way the parent comment is connoting “evil” just from one’s employer leading to bad outcomes as if those outcomes are the known intent of such employees.
Drake—this seems like special pleading from an AI industry insider.
You wrote ‘I think working at an AI lab requires less failure of moral character than, say, working at a tobacco company, for all that the former can have much worse effects on the world.’
That doesn’t make sense to me. Tobacco kills about 8 million people a year globally. ASI could kill about 8 billion. The main reason that AI lab workers think that their moral character is better than that of tobacco industry workers is that the tobacco industry has already been morally stigmatized over the last several decades—whereas the AI industry has not yet been morally stigmatized in proportion to its likely harms.
Of course, ordinary workers in any harm-imposing industry can always make the argument that they’re good (or at least ethically mediocre) people, that they’re just following orders, trying to feed their families, weren’t aware of the harms, etc.
But that argument does not apply to smart people working in the AI industry—who have mostly already been exposed to the many arguments that AGI/ASI is a uniquely dangerous technology. And their own CEOs have already acknowledged these risks. And yet people continue to work in this industry.
Maybe a few workers at a few AI companies might be having a net positive impact in reducing AI X-risk. Maybe you’re one of the lucky few. Maybe.
The future is much much bigger than 8 billion people. Causing the extinction of humanity is much worse than killing 8 billion people. This really matters a lot for arriving at the right moral conclusions here.
Could you say a bit more about why you view the “extinction >>> 8B” as so important?
I’d have assumed that at your P(extinction), even treating extinction as just 8B deaths still vastly outweighs the possible lives saved from AI medical progress?
I don’t think it’s remotely as obvious then! If you don’t care about future people, then your key priority is to achieve immortality for the current generation, for which I do think building AGI is probably your best bet.
If it were to take 50+ years to build AGI, that would imply most people on earth have died of aging, and so you should have probably just rushed towards AGI if you think that would have been less than 50% likely to cause extinction.
People who hold this position are arguing for things like “we should only slow down AI development if for each year of slowing down we would be reducing risk of human extinction by more than 1%”, which is a policy that if acted on consistently would more likely than not cause humanity’s extinction within 100 years (as you would be accepting a minimum of a 1% chance of death each year in exchange for faster AI development).
Here are ChatGPTs actuarial tables about how long the current population is expected to survive:
By the logic of the future not being bigger than 8 billion people, you should lock in a policy that has a 50% chance of causing human extinction, if it allows current people alive to extend their lifespan by more than ~35 years. I am more doomy than that about AI, in that I assign much more than 50% probability that deploying superintelligence would kill everyone, but it’s definitely a claim that requires a lot more thinking through than the usual “the risk is at least 10% or so”.
Thanks for explaining that, really appreciate it! One thing I notice I’d been assuming: that “8B-only” people would have a policy like “care about the 8B people who are living today, but also the people in say 20 years who’ve been born in the intervening time period.” But that’s basically just a policy of caring about future people! Because there’s not really a difference between “future people at the point that they’ve actually been born” and “future people generally”
I have different intuitions about “causing someone not to be born” versus “waiting for someone to be born, and then killing them”. So I do think that if someone sets in motion today events that reliably end in the human race dying out in 2035, the moral cost of this might be any of
“the people alive in both 2025 and 2035”
“everyone alive in 2035”
“everyone alive in 2035, plus (perhaps with some discounting) all the kids they would have had, and the kids they would have had...”
according to different sets of intuitions. And actually I guess (1) would be rarest, so even though both (2) and (3) involve “caring about future people” in some sense, I do think they’re important to distinguish. (Caring about “future-present” versus “future-future” people?)
If your goal is to maximize the expected fraction of currently alive humans who live for over 1000 years, you shouldn’t in fact make ongoingly make gambles that make it more likely than not everyone dies unless it turns out that it’s really hard to achieve this without immense risk. Perhaps that is your view: the only (realistic) way to get risk below ~50% is to delay for over 30 years. But this by no means a consensus perspective among those who are very worried about AI risk.
Separately, I don’t expect that we have many tradeoffs between elimination of human control over the future and the probability of currently alive people living for much longer other than AI, so after we eat that, there aren’t further tradeoffs to make. I think you agree with this, but your wording makes it seem as though you think there are ongoing hard tradeoffs that can’t be avoided.
I think that “we should only slow down AI development if for each year of slowing down we would be reducing risk of human extinction by more than 1%” is not a sufficient crux for the (expensive) actions which I most want at current margins, at least if you have my empirical views. I think it is very unlikely (~7%?) that in practice we reach near the level of response (in terms of spending/delaying for misalignment risk reduction) that would be rational given this “1% / year” view and my empirical views, so my empirical views suffice to imply very different actions.
For instance, delaying for ~10 years prior to building wildly superhuman AI (while using controlled AIs at or somewhat below the level of top human experts) seems like it probably makes sense on my views but this moral perspective, especially if you can use the controlled AIs to substantially reduce/delay ongoing deaths which seems plausible. Things like massively investing in safety/alignment work also easily makes sense. There are policies that substantially reduce the risk which merely require massive effort (and which don’t particularly delay powerful AI) that we could be applying.
I do think that this policy wouldn’t be on board with the sort of long pause that (e.g.) MIRI often discusses and it does materially alter what look like the best policies (though ultimately I don’t expect to get close to these best policies anyway).
habryka—‘If you don’t care about future people’—but why would any sane person not care at all about future people?
You offer a bunch of speculative math about longevity vs extinction risk.
OK, why not run some actual analysis on which is more likely to promote longevity research: direct biomedical research on longevity, or indirect AI research on AGI in hopes that it somehow, speculatively, solves longevity?
The AI industry is currently spending something on the order of $200 billion a year on research. The biomedical research on longevity, by contrast, is currently far less than $10 billion a year.
If we spent the $200 billion a year on longevity, instead of on AI, do you seriously think that we’d do worse on solving longevity? That’s what I would advocate. And it would involve virtually no extinction risk.
You are reading things into my comments I didn’t say. I of course don’t agree, or consider it reasonable, to “not care about future people”, that’s the whole context of this subthread.
My guess is if one did adopt a position that no future people matter (which again I do not think is a reasonable position), then I think the case for slowing down AI looks a lot worse. Not bad enough to make it an obvious slam that it’s bad, and my guess overall even under that worldview it would be dumb to rush towards developing AGI like we are currently doing, but it makes the case a lot weaker. There is much less to lose if you do not care about the future.
My guess is for the purpose of just solving longevity, AGI investment would indeed strongly outperform general biomedical investment. Humanity just isn’t very good at turning money into medical progress on demand like this.
It seems virtuous and good to be clear about which assumptions are load-bearing to my recommended actions. If I didn’t care about the future, I would definitely be advocating for a different mix of policies, though it likely would still involve marginal AI slowdown, but my guess is less forcefully, and a bunch of slowdown-related actions would become net bad.
I agree with you that a typical instance of working at an AI lab has worse consequences in expectation than working at a tobacco company, and I think that for a person who shares all your epistemic beliefs to work in a typical role at an AI lab would indeed be a worse failure of moral character than to work at a tobacco company.
I also agree that in many cases people at AI labs have been exposed at least once to arguments which, if they had better epistemics and dedicated more time to thinking about the consequences of their work, could have convinced them that it was bad for the world for them to do such work. And I do think the failure to engage with such arguments and seriously consider them, in situations like these, is a stain on someone’s character! But I think it’s the sort of ethical failure which a majority of humans will make by default, rather than something indicative of remarkably bad morality.
I just don’t think this sort of utilitarian calculus makes sense to apply when considering the actions of people who don’t share the object-level beliefs at hand! I think people who worked to promulgate communism in the late 19th century were not unusually evil, for instance.
This also seems locally invalid. Most people in fact don’t make this ethical failure because they don’t work at AI labs, nor do they dedicate their lives to work which has nearly as much power or influence on others as this.
It does seem consistent (and agree with commonsense morality) to say that if you are smart enough to locate the levers of power in the world, and you pursue them, then you have a moral responsibility to make sure you use them right if you do get your hands on them, otherwise we will call you evil and grossly irresponsible.
Oh cool, if we’re deciding it’s now virtuous to ostracize people we don’t like and declare them evil, I have a list of enemies I’d like to go after too. This is a great weapon, and fun to use! (Why did we ever stop using it?) Who else can we persecute? There are several much weaker and more-hated groups we could do to warm up.
Criminal negligence leading to catastrophic consequences is already ostracized and persecuted, because, well, it’s a crime.
Sure. But if an AI company grows an ASI that extinguishes humanity, who is left to sue them? Who is left to prosecute them?
The threat of legal action for criminal negligence is not an effective deterrent if there is no criminal justice system left, because there is no human species left.
Some do, others dont. A lot of people are smart enough to spot cherry picking.
I think the very fact that some do is already evidence that this is not a completely unfounded concern, which is what this is driving at.
The amount of respected professional doctors who worry about the impact of witches casting curses on public health is approximately zero. The amount of climate scientists who worry about the impact of CO2 emissions on the future of life and civilization on Earth is approximately 100%. The amount of world class AI-related scientists and researchers who worry about AI apocalypse is not as much but it’s also far from negligible. If there’s a mistake in their thinking, it’s the kind of mistake that even a high level expert can make, and it’s not particularly more likely than the other side being the one who makes a mistake.
Yes—but when some people say “I think there is danger here” and others say “I think there is no danger here,” most people (reasonably!) resolve that to “huh, there could be some danger here”… and the possibility of danger is a danger.
Curated. I have been at times more cautious in communicating my object level views than I now wish I had been. I appreciate this post as a flag for courage: something others might see, and which might counter some of the (according to me) prevailing messages of caution. Those messages, at least in my case, significantly contributed to my caution, and I wish there had been something like this post around for me to read before I had to decide how cautious to be.
The argument this post presents for the conclusion that many people should be braver in communicating about AI x-risk by their own lights, is only moderately convincing. It relies heavily on how the book’s blurbs were sampled, and it seems likely that a lot of optimization went into getting strong snippets rather than a representative sample. I find it hard to update much on this without knowing the details of how the blurbs were collected, even though you address this specific concern. Still, it’s not totally unconvincing.
I’d like to see more empirical research into what kinds of rhetoric work to achieve which aims when communicating with different audiences about AI x-risk. This seems like the sort of thing humanity has already specced into studying, and I’d love to see more writeups applying that existing competence to these questions.
I also would have liked to see more of Nate’s personal story: how he came to hold his current views. My impression is that he didn’t always so confidently believe people should more hold the courage of their convictions when talking about AI x-risk. A record of how his mind changed over time, and what observations/thoughts/events caused that change, could be informative for others in an importantly different way from how this post or empirical work on the question might be. I’d love to see a post from Nate on that in the future.
“We cold-emailed a bunch of famous people...”
″Matt Stone”, co-creator of South Park? Have you tried him?
He’s demonstrated interest in AI and software. He’s brought up the topic in the show.
South Park has a large reach. And the creators have demonstrated a willingness to change their views as they acquire new information. (long ago, South Park satirized Climate Change and Al Gore… but then years later they made a whole “apology episode” that presented Climate Change very seriously… and also apologized to Al Gore)
Seriously, give him a try!
Correct & timely. There do exist margins where honesty and effectiveness trade off against each other, but today—in 2025, that is—this is no longer one of them. Your SB 1047 friend is quite right to suggest that things were different in 2023, though. The amount of resistance we got behind the scenes in trying to get words like “extinction” and even “AGI” (!!) published in our report (which was mostly written in 2023) was something to behold: back then you could only push the envelope so far before it became counterproductive. No longer. The best metaphor I’ve seen for what’s happening right now in government circles is a kettle of boiling water: pockets of people who get it are coalescing and finding each other; entire offices are now AGI-pilled, where in 2023 you’d be lucky to find a single individual among a phalanx of skeptics.
“The time is ripe” indeed.
Credit where credit is due, incidentally: the biggest single inflection point for this phenomenon was clearly Situational Awareness. Almost zero reporting in the mainstream news; yet by the end of 2024, everyone in the relevant spaces had read & absorbed it.
Thank you for writing this statement on communication strategy and also for writing this book. Even without knowing the specific content in detail, I consider such a book to be very important.
Some time ago, it seemed to me that relevant parts of the AI risk community were ignoring the need for many other people to understand their concerns, instead thinking they could “simply” create a superintelligent and quasi-omnipotent AI that would then save the world before someone else invents paperclip maximizers. This seemed to presuppose a specific worldview that I didn’t think was very likely (one in which political action is unnecessary, while technical AI safety still has a reasonably good chance of success). (I asked the forum whether there were good intro texts for specific target groups to convince them of the relevance of AI risk but the only answer I received made me search somewhere else.) However, there is good outreach and a lot of policy work being done, and the discussion of communication strategies and policy strategies seems extremely necessary.
One of your arguments is that you have a “whole spiel about how it’s possible to speak on these issues with a voice of authority”, referring to the Nobel laureates etc who warn against AI risk, and “if someone is dismissive, you can be like “What do you think you know that the Nobel laureates and the lab heads and the most cited researchers don’t? Where do you get your confidence?”” With respect to the Californian law proposal, you write: “If people really believed that everyone was gonna die from this stuff, why would they be putting forth a bill that asks for annual reporting requirements? Why, that’d practically be fishy. People can often tell when you’re being fishy.” Sometimes, however, it seems suspicious when people appeal to authorities when asked to explain their ideas. Referring to Nobel laureates can be an introduction to your argument or you can refer to them later on, but to be convincing, you need to be able to actually explain the issue. Of course, you can use the authority argument in a supportive way, but that will not be enough, also because policymakers and everybody interested in policy debates receive contradictory claims about AI all the time.
Acting and talking “as if it’s an obvious serious threat” may be helpful to signal your seriousness. However, a very strong way of signaling that you are convinced of an issue is glueing yourself to a street or starting a hunger strike. However, though it’s hard to say what a counterfactual world would look like, it seems that these actions did not meaningfully increase support for climate policy. It is hard to say how minds change (so people write whole books about it. But it seems that the effectiveness of a signal strongly depends on context and how much people have been prepared to what you are saying afterwards. (I assume that is why EY wrote the Sequences.)
If your discussion partner is already convinced of the relevance of the topic, then of course you should not be like “Sorry if I am even bothering you about such nonsense”. By contrast, if the audience perceives your asks to be too radical relative to their prior, they may be deterred. You may then not even have the audience’s attention to explain. In particular, it seems to me that talking about certain radical ways to stop AI risks, which has happened in the past, might be actively dangerous and offputting. Of course, demanding extremely radical action means that the audience realizes that you are very convinced of what you say. Yet they may also think you are a crank because serious people would not do that. You at least need to be have enough time to make your your point about the Nobel Laureates.
Yes, the Overton window may have shifted and be shifting, but sometimes the Overton window shifts back again. In 2019, Greta Thunberg’s “How dare you” speech was possible (and had impact); nowadays it is not. This is possibly why “Most elected officials declined to comment” on your book and only “gave private praise”.
“If people really believed that everyone was gonna die from this stuff, why would they be putting forth a bill that asks for annual reporting requirements?” This is another parallel to climate activists. If you are really serious about climate change, wouldn’t you demand stopping most carbon emissions instead of agreeing that subsidies are paid to solar power? Maybe. But you can demand one thing and also support the other one. Some people or political groups will not agree that issue X is important and they will not agree to radical action against X, but still be okay with some weak action against X; that seems like normal politics.
I don’t think people in general react well to societal existential risks, regardless how well or courageous the message is framed. These are abstract concerns. The fact that we are talking about AI (an abstract thing in itself) makes it even worse.
I’m also a very big opponent of arguing by authority (I really don’t care how many nobel laureates are of the opinion of something, it is the content of their argument I care about, now how many authorities are saying it). That is simply that I cannot determine the motives of these authorities and hence their opinions, while I can’t argue with logic and facts)
Usually it is better to make people can understand the risks in terms of stories, in particular stories they can relate to, hence why people still think of Terminator when thinking of AI exctinction risks.
There is a real (and large) exctinction risk, sure. Then again, the Ape picking up the club in 2001: a Space Odyssey could just as well be accused of going down a path that very likely would result in extinction. But when is extinction risk accepable is a more interesting question, and a question most people are much more ready to answer.
Your take is consistent with political messaging advice that people like water and they like drinks but they don’t like watered down drinks. Swing voters react to what’s said most often, not to the average of things that get said around them.
I didn’t understand the point you were trying to make here.
Scott Alexander and Daniel Kokotajlo’s article about rationally defining: “why it’s OK to talk about misaligned AI”
aka
”painting dark scenarios may increase the chance of them coming true but the benefits outweigh this possibility”
the original blog post:
https://blog.ai-futures.org/p/against-misalignment-as-self-fulfilling
the video I made about that article:
I would like to volunteer to translate the book (for free) into Croatian for local publishing, because I find myself hard-pressed for alternative ways to contribute to X-risk mitigation. I have a background in machine learning, am acquainted with an owner of a publishing house and closely connected with a community of experienced translators/editors (despite not being one myself).
Where could I contact you, should you be interested in cooperation and if you do not already have a translation for Croatian queued?
Thanks!
P.S. the only thing that will stop me from translating it in my free time is if you already have a hired translator and they are already working on it.
Why Mark Ruffalo? Will there be an audiobook? Edit: Yes; it can be preordered now.
I appreciate the clarity of this post—and the courage it calls for. But I think it’s worth noting: posting under a long-used pseudonym like So8res while calling others cowardly for not speaking plainly creates a strange asymmetry. Not because pseudonyms are wrong (they often allow for more honest speech), but because the charge of cowardice implies a kind of public-facing risk that the author isn’t visibly taking either.
More personally—I’ve been slow to speak publicly not out of fear, but because my experience hasn’t matched the same contours of existential dread. Not because I think the risk isn’t real, but because I’m working from a different premise: that the relationship we build with these systems matters. That authorship, protocol, and presence have weight. That how we show up—and what we’re aligned with—shapes the thing that emerges.
Some of us are tracking convergence across accountability structures, interaction design, and long-form presence—not just threat surface. It’s possible that what looks like hesitation is actually a different kind of alignment.
I don’t think this post is trying to hide Nate’s identity, he’s just using his longstanding LessWrong account. Evidence: his name’s on the book cover!
“So8res” is just a stylised “Soares” which is his surname. It’s not really a pseudonym in any sense.
Full tweet for anyone curious:
That doesn’t spark any memories (and people who know me rarely describe my conversational style as “soft and ever-so-polite”). My best guess is nevertheless that this tweet is based on a real event (albeit filtered through some misunderstandings, e.g. perhaps my tendency to talk with a tone of confidence was misinterpreted as a status game; or perhaps I made some hamfisted attempt to signal “I don’t actually like talking about work on dates” and accidentally signaled “I think you’re dumb if you don’t already believe these conclusions I’m reciting in response to your questions”).
To be quite clear: I endorse everyone thinking through the AI danger arguments on their own, no matter what anyone else says to them and no matter what tone they say it in.
All that said, I don’t quite see how any of this relates to the topic at hand, and so I’ll go ahead and delete this comment thread in the morning unless I’m compelled by argument not to.
It relates to the topic because it’s one piece of anecdotal evidence about the empirical results of your messaging strategy (much as the post mentions a number of other pieces of anecdotal evidence): negative polarization is a possible outcome, not just support or lack-of-support.
Um, yes, confidence and status are related. You’re familiar with emotive conjugation, right? “I talk with a tone of confidence; he sounds dogmatic; you play status games.”
I think you should leave the comments.
“Here is an example of Nate’s passion for AI Safety not working” seems like a reasonably relevant comment, albeit entirely anecdotal and low effort.
Your comment is almost guaranteed to “ratio” theirs. It seems unlikely that the thread will be massively derailed if you don’t delete.
Plus deleting the comment looks bad and will add to the story. Your comment feels like it is already close to the optimal response.
Just commenting narrowly on how it relates to the topic at hand: I read it as anecdotal evidence about how things might go if you speak with someone and you “share your concerns as if they’re obvious and sensible”, which is that people might perceive you as thinking they’re dumb for not understanding something so obvious, which can backfire if it’s in fact not obvious to them.
The women I’ve spoken to about you have ~uniformly reported you being substantially more polite to them than the men I’ve spoken to (and several of these women pointed out this discrepancy out on their own). One trans man even said that they felt you were quite rude to him, which he took as validation of his transition being complete.
So any men reading this and discrediting the tweet on the basis of “Nate isn’t ‘ever-so-polite’” should think twice.