Promoting enmity and bad vibes around AI safety

I’ve observed some people engaged in activities that I believe are promoting enmity in the course of their efforts to raise awareness about AI risk. To be frank, I think those activities are increasing AI risk, including but not limited to extinction risk. However, that’s a stronger claim than I intend to argue here. Rather, I’ll just be presenting a simple and harmful causal pathway and some strategies that can be used for mitigating it:

PromotingEnmity → Conflict → Catastrophe (PE→C→C)

(Enmity is not the same as conflict, which can sometimes can be constructive. Parties in conflict can be quite focussed on finding a mutually beneficial solution, even if that solution is difficult to find. By contrast, enemies do not generally pursue positive trade relations with each other. So, enmity is particularly relevant to watch out for when pursuing a positive future.)

Promoting enmity

Suppose groups X and Y are in a tense and dangerous relationship for some reason. If I say “Obviously X Leader and Y Leader hate and want to destroy each other”, I’m promoting the hypothesis that they’re enemies, and if they believe me, I might also be making it a bit more likely that they’ll become or remain enemies.

In short, promoting enmity means promoting enmity to attention in ways that make enmity more likely. It’s a kind of hyperstition.

The enmity doesn’t have to be toward or from the speaker, so it’s not necessarily like “hate speech” in that way. But as with hate speech, promoting enmity between groups is particularly consequential, and often avoidable. Even if you fastidiously avoid lying, even when you confidently believe enmity is present, you can still make choices to avoid promoting enmity, by deciding how much, how often, and where to bring it up.

Examples

Here are some increasingly intense examples of promoting enmity, with some intensity labels:

  1. not promoting enmity: Alice is asked privately by a member of Group Z whether the leaders of Groups X and Y hate each other. She responds, “I’m not sure” or “I’d rather that be a question for them than for me.”

  2. minimally promoting enmity: Alice once tells a colleague unconnected to X and Y, “I think X Leader and Y Leader basically hate each other.”

  3. weakly promoting enmity: Alice tells a few colleagues connected to X and Y, “I think X Leader and Y Leader basically hate each other.”

  4. moderately promoting enmity: In a group meeting involving X and Y, Alice says “Well, obviously X and Y want to destroy each other if they can.”

  5. strongly promoting enmity: In a high-profile social media post, Alice says “X Leader, make no mistake, Y Leader hates you and wants to destroy you.”

If people are already convinced the enmity between X and Y is present, such that Alice’s promotion of it doesn’t have much marginal impact, I’m still calling the fourth level strong, because it’s strong relative to Alice’s other options for how much emphatically to assert the enmity.

Is anyone actually promoting enmity like this around AI?

I think a bunch of people are doing this to some degree. Hundreds maybe? Activism in particular seems prone to promoting enmity, because dramatic stories about conflict between enemies attract attention, and are thus quite sticky as means of “raising awareness”. Many of my observations here are from private or semi-private conversations with AI safety activists, which would not be polite to call out publically.

That said, to clarify I’m not simply imagining this pattern, here is a public tweet from Eliezer Yudkowsky where he claims to Secretary of War Pete Hegseth that AI company leaders would

“discard you like used toilet paper [if they could]”

https://​​x.com/​​allTheYud/​​status/​​2027560852048458120

Personally I don’t think the company leaders would do that. But irrespective of that, it’s interesting to note that Eliezer was met with very little criticism for promoting enmity here. I’m not sure why that is. There were posts disagreeing with him, but no highly ranked responses about how this was plausibly a bad thing for Eliezer to say even if he believes it. In particular, none of the top-ranked replies to the post were people saying, “Whoah there, are you sure it’s helpful to promote enmity between military leaders and AI developers like this?”.

In such situations, I think it’s important to consider the sorts of equilibria that speech encourages. This shouldn’t be the only important consderation when choosing speech, but it is a consideration.

How can promoting enmity increase AI risk?

Simply put, when groups of humans and/​or machines have a high prior that they need to destroy each other in order to achieve their goals, they are more likely to do that than if they have a high prior on being able to find mutually beneficial arrangements. And, there are things you can do to increase or decrease that prior.

Can you moderate the promotion of enmity without escalating social violence?

Yep, I’m pretty sure that’s doable. Here are some example responses:

  • “I don’t think it’s helpful for the world when you promote enmity around AI like this; it pushes for bad equilibria between people and groups.”

  • “What you’re saying here seems more like it’s promoting enmity than trying to resolve or address conflict.”

  • “I think there are better ways to address and resolve conflicts between people, versus what you’ve said here, which seems more escalatory than helpful.”

Moderation vs tone-policing

One way moderation can backfire is if you personally escalate negative hyperstition or threats beyond what is already present in the conversation. “Tone policing” is a useful label for this.

On the other hand, if you try to gently moderate negative hyperstition, like promoting enmity, you might still be accused of tone-policing. In that case, you can at least offer the following pushback:

“Tone policing” seems exaggerated here. Actual policing involves a threat of escalatory violence like gunfire to inhibit behaviors that are often less violent than gunfire. I’m not threatening or even predicting escalatory social punishment for the promotion of enmity here. I’d agree I’m “tone moderating” or “tone dampening”, but not tone policing.

Closing thoughts

In simple terms, promoting enmity creates create a bad vibe around AI where groups of humans and/​or AIs are more likely to hate each other and employ their capabilities to destroy each other and/​or the world. And, there may be things we can do to moderate or de-escalate such bad vibes.

What’s a “bad vibe”? Here I just mean a heightened Bayesian posterior that other parties are acting in bad faith, i.e., are not open to peaceful coexistence or mutually beneficial relations. In purely utility-theoretic example, if Alice becomes convinced that Bob’s utility function is the negative of Alice’s, she will not have much hope in searching for Pareto-positive outcomes with Bob.

It’s hard to judge how much any given statement will promote enmity around AI, and whether the causal pathway «PromotingEnmity → Conflict → Catastrophe» is outweighed by other beneficial causal pathways from open discourse. But sometimes you can get the good without the bad. So, my goal in writing this post has been to draw a bit more attention to the potentially harmful effects of promoting enmity, and some ways to avoid or mollify those effects.