Promoting enmity and bad vibes around AI safety

Andrew_Critch9 Mar 2026 0:53 UTC

35 points

I’ve observed some people engaged in activities that I believe are promoting enmity in the course of their efforts to raise awareness about AI risk. To be frank, I think those activities are increasing AI risk, including but not limited to extinction risk. However, that’s a stronger claim than I intend to argue here. Rather, I’ll just be presenting a simple and harmful causal pathway and some strategies that can be used for mitigating it:

PromotingEnmity → Conflict → Catastrophe (PE→C→C)

(Enmity is not the same as conflict, which can sometimes can be constructive. Parties in conflict can be quite focussed on finding a mutually beneficial solution, even if that solution is difficult to find. By contrast, enemies do not generally pursue positive trade relations with each other. So, enmity is particularly relevant to watch out for when pursuing a positive future.)

Promoting enmity

Suppose groups X and Y are in a tense and dangerous relationship for some reason. If I say “Obviously X Leader and Y Leader hate and want to destroy each other”, I’m promoting the hypothesis that they’re enemies, and if they believe me, I might also be making it a bit more likely that they’ll become or remain enemies.

In short, promoting enmity means promoting enmity to attention in ways that make enmity more likely. It’s a kind of hyperstition.

The enmity doesn’t have to be toward or from the speaker, so it’s not necessarily like “hate speech” in that way. But as with hate speech, promoting enmity between groups is particularly consequential, and often avoidable. Even if you fastidiously avoid lying, even when you confidently believe enmity is present, you can still make choices to avoid promoting enmity, by deciding how much, how often, and where to bring it up.

Examples

Here are some increasingly intense examples of promoting enmity, with some intensity labels:

not promoting enmity: Alice is asked privately by a member of Group Z whether the leaders of Groups X and Y hate each other. She responds, “I’m not sure” or “I’d rather that be a question for them than for me.”
minimally promoting enmity: Alice once tells a colleague unconnected to X and Y, “I think X Leader and Y Leader basically hate each other.”
weakly promoting enmity: Alice tells a few colleagues connected to X and Y, “I think X Leader and Y Leader basically hate each other.”
moderately promoting enmity: In a group meeting involving X and Y, Alice says “Well, obviously X and Y want to destroy each other if they can.”
strongly promoting enmity: In a high-profile social media post, Alice says “X Leader, make no mistake, Y Leader hates you and wants to destroy you.”

If people are already convinced the enmity between X and Y is present, such that Alice’s promotion of it doesn’t have much marginal impact, I’m still calling the fourth level strong, because it’s strong relative to Alice’s other options for how much emphatically to assert the enmity.

Is anyone actually promoting enmity like this around AI?

I think a bunch of people are doing this to some degree. Hundreds maybe? Activism in particular seems prone to promoting enmity, because dramatic stories about conflict between enemies attract attention, and are thus quite sticky as means of “raising awareness”. Many of my observations here are from private or semi-private conversations with AI safety activists, which would not be polite to call out publically.

That said, to clarify I’m not simply imagining this pattern, here is a public tweet from Eliezer Yudkowsky where he claims to Secretary of War Pete Hegseth that AI company leaders would

“discard you like used toilet paper [if they could]”

https://x.com/allTheYud/status/2027560852048458120

Personally I don’t think the company leaders would do that. But irrespective of that, it’s interesting to note that Eliezer was met with very little criticism for promoting enmity here. I’m not sure why that is. There were posts disagreeing with him, but no highly ranked responses about how this was plausibly a bad thing for Eliezer to say even if he believes it. In particular, none of the top-ranked replies to the post were people saying, “Whoah there, are you sure it’s helpful to promote enmity between military leaders and AI developers like this?”.

In such situations, I think it’s important to consider the sorts of equilibria that speech encourages. This shouldn’t be the only important consderation when choosing speech, but it is a consideration.

How can promoting enmity increase AI risk?

Simply put, when groups of humans and/or machines have a high prior that they need to destroy each other in order to achieve their goals, they are more likely to do that than if they have a high prior on being able to find mutually beneficial arrangements. And, there are things you can do to increase or decrease that prior.

Can you moderate the promotion of enmity without escalating social violence?

Yep, I’m pretty sure that’s doable. Here are some example responses:

“I don’t think it’s helpful for the world when you promote enmity around AI like this; it pushes for bad equilibria between people and groups.”
“What you’re saying here seems more like it’s promoting enmity than trying to resolve or address conflict.”
“I think there are better ways to address and resolve conflicts between people, versus what you’ve said here, which seems more escalatory than helpful.”

Moderation vs tone-policing

One way moderation can backfire is if you personally escalate negative hyperstition or threats beyond what is already present in the conversation. “Tone policing” is a useful label for this.

On the other hand, if you try to gently moderate negative hyperstition, like promoting enmity, you might still be accused of tone-policing. In that case, you can at least offer the following pushback:

“Tone policing” seems exaggerated here. Actual policing involves a threat of escalatory violence like gunfire to inhibit behaviors that are often less violent than gunfire. I’m not threatening or even predicting escalatory social punishment for the promotion of enmity here. I’d agree I’m “tone moderating” or “tone dampening”, but not tone policing.

Closing thoughts

In simple terms, promoting enmity creates create a bad vibe around AI where groups of humans and/or AIs are more likely to hate each other and employ their capabilities to destroy each other and/or the world. And, there may be things we can do to moderate or de-escalate such bad vibes.

What’s a “bad vibe”? Here I just mean a heightened Bayesian posterior that other parties are acting in bad faith, i.e., are not open to peaceful coexistence or mutually beneficial relations. In purely utility-theoretic example, if Alice becomes convinced that Bob’s utility function is the negative of Alice’s, she will not have much hope in searching for Pareto-positive outcomes with Bob.

It’s hard to judge how much any given statement will promote enmity around AI, and whether the causal pathway «PromotingEnmity → Conflict → Catastrophe» is outweighed by other beneficial causal pathways from open discourse. But sometimes you can get the good without the bad. So, my goal in writing this post has been to draw a bit more attention to the potentially harmful effects of promoting enmity, and some ways to avoid or mollify those effects.

What links here?

Examples of self-fulfilling prophecies in AI alignment? by Chris Lakin (3 Mar 2025 2:45 UTC; 30 points)

Andrew_Critch9 Mar 2026 0:53 UTC

35 points

38 comments4 min readLW link

Self Fulfilling/Refuting Prophecies AI

TsviBT 9 Mar 2026 6:55 UTC
32 points
7
I think there’s an important thing here and I appreciate the post and would like to see more discussion. The simple stances that people seem to take (whether they involve promoting enmity or not) usually / always so far seem bad, or at least very much not done being figured out. (Related: https://www.lesswrong.com/posts/dENfZBhCzsR8ggfpt/escalation-and-perception-1 )

Anyway, I want to push back on the basic point in this post. First, I think we can agree that enmity doesn’t have to be symmetric, and often isn’t. Second, I think that if A is treating B as an enemy and B is not treating A as an enemy, this may be bad for B compared to B treating A more as an enemy. I think that people who are promoting enmity would often claim to be in this position. In other words, they believe they are on team B, and currently team B is making a mistake by not sufficiently treating A as an enemy. While it seems true that promoting symmetric enmity pretty much always comes with a big cost (though it could in theory come with benefits too), which is approximately what I think you were saying, it doesn’t seem true that promoting one-way enmity is necessarily coming with a big cost compared to the alternative.

In the background, I think enmity is multidimensional; even bitter enemies who generally seek to degrade the other’s agency in general might play by various rules (e.g. banning chemical weapons), or be open to reconciliation in some longer time frame. So there’s escalation / deescalation dynamics on many dimensions, and there’s a delicate balance of how to avoid enmity as much as possible without opening yourself to exploitation ( https://www.lesswrong.com/posts/34mDRmAbfkaMfoAcR/a-prayer-for-engaging-in-conflict ).
- Lorxus 9 Mar 2026 21:47 UTC
  5 points
  2
  Parent
  
  While it seems true that promoting symmetric enmity pretty much always comes with a big cost (though it could in theory come with benefits too), which is approximately what I think you were saying, it doesn’t seem true that promoting one-way enmity is necessarily coming with a big cost compared to the alternative.
  
  On my model, the large additional cost that it imposes is half the point. If you’re in a situation of one-sided enmity, and you would rather not be, then an obvious thing to do is destructively seize leverage and better your BATNA by making that particular tragedy symmetrical. Then, your counterparty which was previously treating you as an enemy with impunity will now have a reason to come to the table (and you win) or be forced to engage in open public warfare when previously they may have been deniably catty (and you win by means of everyone losing).
  
  On this model, it is if anything much worse when serious one-sided enmity gets started, or when one of the parties behaves so as to be indistinguishable from an enemy, or acts rashly and then tries to avoid paying repair costs. It’s also especially bad when battle lines are unclear, such that from one reasonable point of view, person P is obviously on team T, and from another reasonable point of view, they’re equally obviously not.
- Thane Ruthenis 9 Mar 2026 17:10 UTC
  2 points
  0
  Parent
  While it seems true that promoting symmetric enmity pretty much always comes with a big cost
  I’m not sure about that at all. Suppose that:
  - Group A is doing something that’s going to lead to harmful consequences for Group B.
  - Group A mistakenly believes the consequences would be positive instead.
  - Group B would be more easily roused to action if it were fueled by enmity towards Group A, rather than by tone-neutral disagreement.
  Then promoting even symmetric enmity between the groups would be beneficial for Group B.
  (Incidentally, note that even if neither party is actively malicious towards the other, “promoting enmity” wouldn’t necessarily involve lying. Feelings of enmity can be fueled by pointing at Group A’s negligence (or whatever flaws are causing them to mistakenly pursue the harmful-to-B activity), rather than by ascribing malicious intent to them. To a significant extent, it’s about framing.)
  - TsviBT 9 Mar 2026 19:35 UTC
    4 points
    0
    Parent
    Well, the parenthetical says “(though it could in theory come with benefits too)”; you could be making a good case for that happening in practice. My statement is just agreeing with the OP that there is a big cost pathway there—even in your example, there’s a bunch of bad stuff that happens from increased symmetric enmity; e.g. it would probably become harder to help A see the error of their ways, the B team would probably end up doing a bunch of enemy stuff purely for the sake of enmity which doesn’t actually help much, A might become B’s enemy and then successfully smack down B, both sides would be spending resources on conflict rather than doing anything in the intersection of ~all human values (such as for example A spending more resources on thinking and getting a better understanding of how their plans are bad for the world / for B), etc. I think the OP is trying to say this narrow point, e.g.:
    
    [...] I think those activities are increasing AI risk, including but not limited to extinction risk. However, that’s a stronger claim than I intend to argue here. Rather, I’ll just be presenting a simple and harmful causal pathway [...]
- Andrew_Critch 10 Mar 2026 20:53 UTC
  0 points
  −2
  Parent
  I of course agree that warning people about their enemies can be helpful to the people being warned.
  
  I don’t agree that’s what Eliezer is doing with Amodei and Hegseth. Across multiple posts, he is promoting enmity toward both of them. Here he is saying that Hegseth’s position is even worse than Amodei’s: https://x.com/allTheYud/status/2027357747960766554?s=20
  
  It’s of course logically consistent to say two people have bad positions and that one is worse. The strange thing is to be doing that while performatively “helping” Hegseth by “warning” him that AI leaders would “discard him like used toilet paper”. Taken in that context, the effect is more like symmetrically creating enmity between the two camps, as opposed to encouraging a positive resolution to the conflict.
  
  I’ll also reiterate that Eliezer is far from alone in this pattern of promoting enmity between leaders in and around AI. He merely offers an easy-to-analyze public example of the pattern.
  - TsviBT 11 Mar 2026 6:14 UTC
    3 points
    0
    Parent
    Ok, thanks for the clarification. I think I see a bit more what you’re getting at.
    
    But, rereading Yudkowsky’s tweet, I think your description
    
    performatively “helping” Hegseth by “warning” him that AI leaders would “discard him like used toilet paper”
    
    is not accurate. I think the group being addressed in the tweet is “political leaders of the world”. Admittedly, the tweet is confusingly worded in this regard, and I had to reread it. In the beginning, he says
    
    You have proven that you stand between AI labs and the nice thing they were getting for all their hard work.
    
    and later
    
    Sam Altman does not now look more powerful because you crushed his competitor. He looks less important because you, politicians, crushed his competitor, and did so in a way that made clear that Altman would have to take the orders of any Trump appointee as well.
    
    I think he’s using that as a rhetorical device to express “there’s a conflict here between [AI leaders as a group] and [political leaders as a group]”, because (he claims) that’s how AI leaders are taking it. In this light, I think he’s trying to alert B (political leaders in general) that A (AI leaders) are their enemy.
Chris Lakin 9 Mar 2026 1:39 UTC
12 points
0
Can you moderate the promotion of enmity without escalating social violence?
If anyone can recall successful instances of this happening in AI, could you reply here with links? Would love to share and celebrate the role models
TsviBT 11 Mar 2026 5:46 UTC
4 points
2
Why is this so downvoted (in addition to upvoted)? It makes clearly stated interesting points on an important topic. Argue or update.
dr_s 9 Mar 2026 9:23 UTC
3 points
1

Simply put, when groups of humans and/or machines have a high prior that they need to destroy each other in order to achieve their goals, they are more likely to do that than if they have a high prior on being able to find mutually beneficial arrangements. And, there are things you can do to increase or decrease that prior.

To be fair, I don’t think that Eliezer’s tweet to someone called the Secretary of War is the biggest contribution to anyone’s impression that destroying each other is a way to accomplish goals. I am not sure what the issue is and I think Yud’s position is roughly correct, though I also worry about getting the admin into this because my model of them is that they would just be the kind of guys who want the eldritch power for themselves, confident that they’re just Born Different and won’t be eaten by it like everyone else.
- ErickBall 9 Mar 2026 16:14 UTC
  0 points
  0
  Parent
  It seems like the point of his post was specifically to promote enmity, which is usually bad. In this case maybe he thinks it’s good because if AI companies and the government are fighting each other it could increase the likelihood the government does something to slow down AI capabilities? I can’t rule that out, but it seems like some 4d chess that I wouldn’t want to meddle in. Things can change quickly, and none of us fully understand the consequences of our words and actions. So maybe what he’s saying is true, but saying true things in a confrontational way is not always helpful. In this case I think if his words have any meaningful effect, it will likely be to galvanize governments to limit safety measures by AI developers, in the name of sovereignty.
  - dr_s 9 Mar 2026 23:12 UTC
    2 points
    1
    Parent
    I’m not sure what are the safe plays at this point, especially if you think that existential risk is on the line. But I also don’t think almost any of them are like, completely devoid of “enmity”. Enmity makes people act hard and fast. And it would in fact be legitimate to feel enmity for someone who is risking your life or trying to take away your position of power by scheming. I don’t think it’s good to be some kind of Machiavellian schemer as a first resort but I also don’t think you can or should confront problems on this magnitude and with these stakes while tip toeing around to make sure that you never suggest anyone should be anyone else’s enemy. At some point, you call a spade a spade. If you can’t bring yourself to saying that maybe the people who want to do the thing that kills everyone are the enemy, no one is going to believe you actually think you’re really as concerned as you say you are.
Max H 9 Mar 2026 15:43 UTC
2 points
0
Can you moderate the promotion of enmity without escalating social violence?
Yep, I’m pretty sure that’s doable.
Hmm, it seems like a more important question is whether it’s possible to do so without promoting and optimizing for something other than (individual or collective) truthseeking, at least sometimes. I don’t think it’s always possible to do so.
To use a different example than the one you picked (AI leaders <-> DoW leadership), consider the relationship between Dario Amodei and Sam Altman:
https://x.com/sama/status/2019139174339928189
https://www.cnbc.com/2026/02/19/openai-sam-altman-anthropic-dario-amodei-india-ai-summit.html
I think it is indeed obvious to anyone who has been paying close attention and knows the history that their relationship very likely goes beyond mere conflict and competition. So to avoid “promoting enmity” on this topic / relationship, I would need to think about the considerations in this post, and potentially change my words and actions based on that thinking, before speaking in a group setting or posting on social media about it.
I think these considerations will necessarily be distortionary in many cases, and that this distortion is bad for at least two reasons:
- On consequentialist grounds: this relationship is relevant to many consequential discussions, so it is good for there to be widespread, public and frank discussion of it. Prominent policymakers attend “global summits” on AI where there is a background assumption that leaders of AI companies will be able to work together at least as well as normal business leaders, and various AI technical governance people make and promote plans that assume some level of cooperation between leaders of these companies. But I think these assumptions are false in many cases, and therefore that anyone who attends these summits on that premise or writes plans that depend on them is making a mistake.
- On deontological / “vibes-based” grounds: undistorted truthseeking is a really important sacred value to a lot of people, for good reason. Optimizing for anything other than truthseeking, even a little, can quickly go wrong and turn people against you, even if you still (ostensibly / vibe-ily) place a lot of value on truthseeking and intellectual integrity. (Here I am thinking specifically of EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem.)
clone of saturn 9 Mar 2026 1:55 UTC
1 point
−3
People who are attempting to cause serious harm need to be stopped. If ~~someone is currently attempting murder~~ a mad scientist is performing life-threatening experiments on people without their consent, it’s not reasonable to look for mutually beneficial arrangements with them, they need to be restrained and put in prison. I’m not open to peaceful coexistence with people who insist on building something that will likely destroy the human race. No compromise is possible, they simply can’t be allowed to build it. If they won’t stop voluntarily, they absolutely are enemies.
- the gears to ascension 9 Mar 2026 4:57 UTC
  13 points
  3
  Parent
  It seems to me that you’re making a bucket error between “someone trying to murder is on the other side of a conflict”, and “someone trying to murder is going to persistently keep trying to murder even if they obtain the things they say they want right now, because they at-least-somewhat-terminally value murder”. Perhaps there are ways to say things in language that is conditionally sedate and does not attribute the behavior you want to stop to an identity feature that the accused person is likely to internalize? For example, if someone is told “by participating in [x common thing], you are a murderer”, they seem more likely to consider themselves morally licensed to do other things that are called murder by angry people online. The argument isn’t “they’re not doing something that should be stopped”, which is what I see you as responding to; it’s “try not to write to their identity slot or to others’ identity slot for that person, when accusing them of bad behavior”.
  
  “Mass manslaughterer” seems more accurate than “mass murderer” anyway, and might be lower on the scale OP describes.
  - clone of saturn 9 Mar 2026 6:06 UTC
    7 points
    4
    Parent
    Pushing toward ASI isn’t actually a common thing, only a tiny fraction of people are doing that. I think it’s unlikely, but if my words cause someone to quit pushing toward ASI but feel morally licensed to do other bad things, I think I’d still consider that a win, since pushing toward ASI is one of the worst things a person can do. I think there are people out there who at-least-somewhat-terminally value murder, but are prevented from committing murder by moral disapprobation and threat of punishment, so it’s important not to push those things outside the Overton window for fear of causing bad vibes.
    - the gears to ascension 9 Mar 2026 6:41 UTC
      14 points
      10
      Parent
      I’m more worried about the case where your words not only don’t stop them from pushing towards ASI, but make them feel that in their pursuit of ASI, since they are now intended by you to consider themselves a bad person for doing it, they should think of themselves as being intentionally evil, and take other intentionally evil actions; or, someone else who would have tried to get them to stop by talking to them, now thinks of them as impossible to talk to, and treats them as beyond reach of communication and request. There’s a space between “say they’re doing a bad thing” and “say they’re inherently a kind of person who is inclined to do bad things”, or “say they’re impossible to pressure via ordinary means and must be pressured via unusual means”. I am not asking you to say what they’re doing is fine, and I would understand if the split that those-who-agree-with-OP are asking you to make wasn’t a split you were previously treating as notable.
      - TsviBT 9 Mar 2026 7:04 UTC
        10 points
        8
        Parent
        Agreed. Cf. https://www.lesswrong.com/posts/CYTwRZtrhHuYf7QYu/a-case-for-courage-when-speaking-of-ai-danger?commentId=pLH6dxnTrTz56BQYj
        
        Also, for the record, I’d volunteer my time to talk with anyone who is currently doing capabilities research at an AI research group, or who is seriously considering doing that, and try to explain why they shouldn’t but in a way that is kind and open and understanding. (I don’t think I have a legible track record of doing this, but I would unaccountably claim that there’s a significant chance I’d be good at this for a substantial subset of such people.)
        Lorxus 9 Mar 2026 21:40 UTC
        3 points
        0
        Parent
        
        Also, for the record, I’d volunteer my time to talk with anyone who is currently doing capabilities research at an AI research group, or who is seriously considering doing that, and try to explain why they shouldn’t but in a way that is kind and open and understanding. (I don’t think I have a legible track record of doing this, but I would unaccountably claim that there’s a significant chance I’d be good at this for a substantial subset of such people.)
        
        There’s this one Upton Sinclair quote I think about a lot in this context. I imagine you’ve seen it?
        TsviBT 9 Mar 2026 21:47 UTC
        7 points
        1
        Parent
        I wrote the wiki entry on it :) https://www.lesswrong.com/w/sinclair-s-razor
        Lorxus 11 Mar 2026 6:18 UTC
        1 point
        0
        Parent
        Chewing it over more, I think you may have neglected to consider Newcomblike self-deception as a possible factor in Sinclair’s razor. It’s not necessary for the person to be lying about what they believe, or for them to have consciously convinced themselves of that lie. They can just have a big convenient cognitive blind spot.
        TsviBT 11 Mar 2026 6:44 UTC
        2 points
        1
        Parent
        https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=WsrmFhuysbmJ3xiPm
        TsviBT 11 Mar 2026 6:23 UTC
        2 points
        0
        Parent
        Well, I meant that to be included under “has really convinced themselves”, where you’re proposing that they could have convinced themselves unconsciously. (Which I agree happens, via a bunch of little ugh fields and piecemeal distorted-world construction.)
        
        Feel free to make an edit to clarify though, it’s a wiki!
- Forza 15 Mar 2026 16:21 UTC
  9 points
  −3
  Parent
  I’ve always disliked the idea of people who assess risk differently being called “evil.” There’s a big difference between “I want to kill everyone, so I’m creating technology X” and “I disagree with the risk assessment of this technology and want to make the world a better place, so I’m creating it.”
  I think the lesswrong community has become too radicalized in considering itself so epistemologically right that it calls people evil who simply disagree with them about the probability of p(doom).
  UPD: The problem with this radicalization of the community is also that it repels many AI researchers who do not consider themselves evil and are less inclined to change their point of view on the risks posed by people who consider them evil and are probably not averse to killing them for disagreeing with them.
  - the gears to ascension 15 Mar 2026 22:36 UTC
    4 points
    −2
    Parent
    You are responding to one person who received heavy disagreement in comments.
    - Andrew_Critch 16 Mar 2026 1:16 UTC
      10 points
      10
      Parent
      I actually think there are more people like that on LessWrong; the disagreement score on both their comment and this post has been going up and down a lot, so I think there is high variance. That is, I think there is a genuinely divisive conflict on LessWrong as to whether radicalization to the degree of calling AI developers “murderers” is good thing. (My position: it’s clearly a hyperbolic and thus irresponsible and/or bad-faith way to use the term ‘murder’.)
      - clone of saturn 16 Mar 2026 9:50 UTC
        11 points
        0
        Parent
        I did not call anyone a murderer in this thread. I did ask about it a month ago but replies convinced me it wasn’t appropriate. Although I see how even using it as an analogy could cause confusion. I’ll edit my comment.
    - aphyer 16 Mar 2026 1:55 UTC
      2 points
      0
      Parent
      I personally strong-downvoted the parent comment. When I remove my strong-downvote from it, I discover that the combined rest of LessWrong voted that comment to a total of +18 karma and +2 agreement as of me writing this.
      I do in fact consider it somewhat alarming that the combined rest of LessWrong on net agrees with this comment and wants to reward the author for posting it.
  - clone of saturn 15 Mar 2026 21:55 UTC
    0 points
    1
    Parent
    Why are you asking me to put my own desire not to be killed along with everyone I care about at the bottom of my priorities list, below protecting the feelings of AI researchers? That is an insane request and I’m obviously not willing to do it.
    - the gears to ascension 16 Mar 2026 3:54 UTC
      4 points
      2
      Parent
      I’d answer differently to Forza: I wouldn’t ask you to move it on your priorities list, I’d ask you to recognize that properly understood, prioritizing it means you do care about the feelings of AI researchers, and that you are making a mistake to treat their behavior as opaque. I’d like you to be trying to get them to stop more competently, which I don’t think involves telling them to feel like murderers separately from convincing them of the problem, because humans have known mental immune responses to being told how to feel in ways that are not justified by evidence they can directly process. I don’t mean to request a general update about all of your behavior, but I think your comments since my last ones don’t show evidence of having understood why I replied to you the way I did, or why I believe that the common memetic anger-pattern you are exhibiting here is a dominated strategy.
      
      also, I explicitly do not claim that there is no memetic response warranted, nor that you can’t be mad. I just want you to recognize that people who might, in the full accounting of things, in fact qualify as ending up becoming the cause of mass death down the line, are in fact currently justified in being unsure whether that’s the case, and so being verbally confident at them is not a move that would be expected to change their behavior. Your pattern seems like one whose nearest effective variant is mass-movement-building, and I do think there are forms of that which can be effective; I think those will be the most effective if they’re highly palatable to contributors at labs while also not sacrificing moral clarity or factual-justification-to-an-uninformed-mind. I think you’re currently trying only for moral clarity, and moral clarity that results in autoimmune responses is in my view an unambiguously dominated strategy.
      
      see also btfc’s comments on soft language
      - clone of saturn 16 Mar 2026 9:37 UTC
        7 points
        7
        Parent
        I’m not talking to AI researchers. I don’t think there’s any world where every AI researcher can talked into quitting, there will always be people lining up around the block to screw over humanity for 8+-figure amounts of money. This is why I think they have to be stopped. Talking an individual researcher into quitting may be good for that individual’s conscience, but only lengthens timelines by the difference between that person’s competence and their replacement, which probably isn’t enough to be worth much.
        
        I don’t see why movement-building would be more effective if it’s highly palatable to contributors at labs, it seems more like the opposite is true, but feel free to explain.
    - Forza 15 Mar 2026 23:29 UTC
      −1 points
      −5
      Parent
      because you might be wrong, or people might genuinely not understand that you’re right? There are many people who believe that their lives are in danger of a lot of things: vaccines against covid, GMOs, climate change, etc. They also sincerely believe that they are right and can consider scientists evil because they endanger their lives and the lives of their family. But the problem is that if they start equating any people who disagree with them with the evil that deliberately wants to destroy them, it radicalizes them and promoting enmity. Is this a rational approach when every person, confident that technology N can kill him, starts openly hating the scientists who develop it?
      - clone of saturn 16 Mar 2026 9:24 UTC
        5 points
        1
        Parent
        This is drawing a false equivalence between AI risk and random crackpot beliefs, which is dishonest. You won’t find accomplished scientists saying vaccines against covid, GMOs, or climate change have a substantial likelihood of killing everyone. Also it’s not relevant whether someone who’s going to kill me is evil and I’m not talking about that, I want them to be stopped from killing me whether they’re evil or not, and whether they can be talked out of it or not.
- MichaelDickens 9 Mar 2026 15:23 UTC
  6 points
  2
  Parent
  I would rather say that you should stop them in whatever way is most effective. If a peaceful compromise gets them to stop, do that. If forcing them to stop works better, do that. There is nothing morally wrong with trying to stop someone who’s trying to kill you, but that doesn’t mean it’s your best strategy.
  - clone of saturn 9 Mar 2026 23:38 UTC
    3 points
    0
    Parent
    Totally agree.
- Ben Pace 9 Mar 2026 20:48 UTC
  3 points
  1
  Parent
  If someone is currently attempting murder, it’s not reasonable to look for mutually beneficial arrangements with them, they need to be restrained and put in prison.
  This is locally true, but also I want to point out similar situations where the reasoning isn’t quite the same:
  1. If you’re at war with a country, of course you’re behaving as though “they need to be restrained and put in prison” or otherwise met with force. But I think often diplomacy and occasionally finding mutually-beneficial treaties is a good thing to do as well.
  2. Often, political leaders are pushing for and against different policies on topics like war, terrorism, healthcare, foreign aid and more. Changes in policies here definitely cause deaths, often in their thousands (e.g. reallocating money from current life-saving efforts into R&D can look like this on paper), but it’s critical to the health of a nation for people to be able to disagree and talk about these things and advocate for different positions without facing force or violence, even if you know that a certain idea will have horrendous consequences.
  I’m not open to peaceful coexistence with people who insist on building something that will likely destroy the human race.
  I myself would support laws imprisoning people selling out humanity and getting rich in the process (by building AGI via ML scaling), but I think violence is a really important civilizational schelling line and very dangerous to cross, and, while I admit I am still confused about the lines here, I am uncomfortable reading this. Your comment pattern matches to me with a pretty naive escalation across ethical lines, rather than a considered and mature one. You’re vaguely threatening violence almost reactively, rather than making any case that it would improve things or that this is one of the rare times where it’s okay to cross the line. This rhymes for me with a childish, impotent desire to have power over people who are doing terrible things, which is understandable, but doesn’t just any means of doing it.
  - clone of saturn 10 Mar 2026 0:33 UTC
    5 points
    11
    Parent
    I was vague because I don’t think it’s actually prudent to threaten anyone at this time, but I do think it’s important to defend the possibility of talking about it. Of course I know it’s possible to be counterproductively aggressive, but I guess I’m getting a little edgy because my sense is that almost all people reading LW err on the side of being way too conflict-avoidant. Being ready and willing to fight (including nonviolent resistance) can have advantages even if no fighting actually occurs, but it requires, among other things, being able to identify enemies.
- Karl Krueger 9 Mar 2026 2:28 UTC
  −1 points
  −1
  Parent
  People who are attempting to cause serious harm need to be stopped.
  This is such a weird phrasing, because it attributes the need to the person it’s proposing to stop, instead of to the people who want them stopped.
  Consider: If Julius Caesar is attempting to cause serious harm to the Gauls, it’s not Caesar who needs Caesar to be stopped; it’s the Gauls who need Caesar to be stopped.
  - clone of saturn 9 Mar 2026 3:29 UTC
    6 points
    1
    Parent
    I used the passive voice because identifying who does the stopping isn’t directly relevant to the topic of whether it’s good or bad to promote enmity.