The Social Alignment Problem

TLDR: I think public outreach is a very hopeful path to victory. More importantly, extended large-scale conversation around questions such as whether public outreach is a hopeful path to victory would be very likely to decrease p(doom).

You’re a genius mechanical engineer in prison. You stumble across a huge bomb rigged to blow in a random supply closet. You shout for two guards passing by, but they laugh you off. You decide to try to defuse it yourself.

This is arguably a reasonable response, given that this is your exact skill set. This is what you were trained for. But after a few hours of fiddling around with the bomb, you start to realize that it’s much more complicated than you thought. You have no idea when it’s going to go off, but you start despairing that you can defuse it on your own. You sink to the floor with your face in your hands. You can’t figure it out. Nobody will listen to you.

Real Talking To The Public has never been tried

Much like the general public has done with the subject of longevity, I think many people in our circle have adopted an assumption of hopelessness toward public outreach social alignment, before a relevant amount of effort has been expended. In truth, there are many reasons to expect this strategy to be quite realistic, and very positively impactful too. A world in which the cause of AI safety is as trendy as the cause of climate change, and in which society is as knowledgeable about questions of alignment as it it about vaccine efficacy (meaning not even that knowledgeable), is one where sane legislation designed to slow capabilities and invest in alignment becomes probable, and where capabilities research is stigmatized and labs find access to talent and resources harder to come by.

I’ve finally started to see individual actors taking steps towards this goal, but I’ve seen a shockingly small amount of coordinated discussion about it. When the topic is raised, there are four common objections: They Won’t Listen, Don’t Cry Wolf, Don’t Annoy the Labs, and Don’t Create More Disaster Monkeys.

They won’t listen/​They won’t understand

I cannot overstate how clearly utterly false this is at this point.

It’s understandable that this has been our default belief. I think debating e/​accs on Twitter has broken our brains. The experience of explaining again and again why something smarter than you that doesn’t care about you is dangerous, and being met with these arguments, is a soul-crushing experience. It made sense to expect that if it’s this hard to explain to a fellow computer enthusiast, then there’s no hope of reaching the average person. For a long time I avoided talking about it with my non-tech friends (let’s call them “civilians”) for that reason. However, when I finally did, it felt like the breath of life. My hopelessness broke, because they instantly vigorously agreed, even finishing some of my arguments for me. Every single AI safety enthusiast I’ve spoken with who has engaged with civilians has had the exact same experience. I think it would be very healthy for anyone who is still pessimistic about convincing people to just try talking to one non-tech person in their life about this. It’s an instant shot of hope.

The truth is, if we were to decide that getting the public on our side is our goal, I think we would have one of the easiest jobs any activists social alignment researchers have ever had.

Far from being closed to the idea, civilians in general literally already get it. It turns out, Terminator and the Matrix have been in their minds this whole time. We assumed they’d been inoculated against serious AI risk concern—turns out, they walked out of the theaters thinking “wow, that’ll probably happen someday”. They’ve been thinking that the entire time we’ve been agonizing about nobody understanding us. And now, ChatGPT has taken that “someday” and made it feel real.

At this point AI optimists are like the Black Knight from Monty Python. You can slice apart as many of their arguments as you want but they can’t be killed – however, you can just go around them. We’re spending all our time and effort debating them and getting nowhere, when we could just go around them to the hosts of civilians perfectly willing to listen.

The belief is already there. They just haven’t internalized it, like a casual Christian casually sinning even though their official internal belief is that they’re risking being tortured literally forever. They just need the alief.

A month ago, there had only been a handful of attempts at social alignment from us. Rob Miles has been producing accessible, high-quality content for half a decade. A petition was floated to shut down Bing, which we downvoted into oblivion. There was the Bankless podcast. There was the 6-month open letter, and then the Time opinion piece and several podcast appearances. This wasn’t that much effort as PR pushes go, and yet it accomplished a very appreciable news cycle that hasn’t yet ended (although there were unforced errors in messaging that more coordination likely could have avoided).

Additionally, it seems to me that the incentives of almost all relevant players already align with being open to the message of slowing progress (beyond the free-bingo-square incentive of not wanting to die).

  • Governments are eternally paranoid of any threats to their power. They have a monopoly on violence, and it shouldn’t take a five star general to realize that a person, company, or state armed with a superhumanly intelligent adviser is one of the only realistic threats they face. It’s an obvious national security risk. They’re also motivated to follow the will of the people.

  • Huge numbers of civilians are currently in extreme danger of their jobs being abstracted away to language model x, let alone if capabilities continue progressing as they have been. This wave of automation will be unique because instead of low-income workers it will be the ones with the most money to contribute to political campaigns. There will be a short delay, but the looming threat alone should get people riled up in a very rare way, not to speak of when it actually starts happening in earnest.

  • Legacy companies without an AI lead are standing on the precipice of being disrupted out of existence. The climate change cause fought against trillions of dollars, because they were trying to change the status quo, a status quo that at the time made up all the world’s most valuable companies. Here, we’re more accurately said to be working to prevent the status quo from changing, meaning it seemse there’s more likely to be lobby-ready money on our side than theirs. There will be plenty of money on the other side also but I expect the situation to be an inversion of climate change.

(Tangent: I think it’s worth mentioning here that stigmatization also seems very relevant to the problem of Chinese AI enthusiasm. China has invested many resources into mitigating climate change risk, in order to improve its global reputation. A future where AI capabilities research carries a heavy moral stigma globally and China decides to disinvest as a result isn’t entirely unrealistic. China has the additional incentive here that American companies are clearly ahead, and a global pause would benefit China, just as it would benefit smaller companies wanting a chance to catch up. China would then be incentivized to avoid disincentivizing an American pause.)

Don’t cry wolf/​Preserve dry powder

The question of whether now is the time to seriously go public is a valid one. But the question assumes that at some point in the future it will be the correct time. This almost mirrors the AI risk debate itself: even if the crucial moment is in the future, it doesn’t make sense to wait until then to start preparing. A public-facing campaign can take months to plan and hone, and it seems like it makes sense to start preparing one now, even if we decide that now isn’t the correct moment.

We need to avoid angering the labs

Practically speaking I’ve seen no evidence that the very few safety measures labs have taken have been for our benefit. Possibly, to some small extent, they’ve been for PR points because of public concern we’ve raised, but certainly not out of any loyalty or affection for us. The opportunity to regulate them or impose bottlenecks on access to talent and resources via stigmatization of the field of capabilities research seems much larger than the expected benefit of hoping that they’ll hold back because we’ve been polite.

Don’t create more idiot disaster monkeys

It’s true that we’re mostly in this situation because certain people heard about the arguments for risk and either came up with terrible solutions to them or smelled a potent fount of personal power. A very valid concern I’ve heard raised is that something similar could happen with governments, which would be an even worse situation than the one we’re in.

It seems unlikely that AI capabilities can advance much further without governments and other parties taking notice of their potential. If we could have a choice between them realizing the potential without hearing about the risks, or realizing the potential via hearing about the risks, the latter seems preferable. The more the public is convinced of the risk, the more incentivized governments are to act as though they are, too. Additionally, there doesn’t seem to be an alternative. Unaligned superintelligence approaches by default unless something changes.


Without concerted effort from us, there are two possible outcomes. Either the current news cycle fizzles out like the last ones did, or AI risk goes truly mainstream but we lose all control over the dialogue. If it fizzles out, there’s always a chance to start another one after the next generation of AI and another doom-dice roll, assuming we won’t just say the same thing then. But even then, much of our dry powder will be gone and our time much shorter. It’s hard to say how bad losing control over the dialogue could be; I don’t know how asinine the debate around this could get. But if we believe that our thinking about this topic tends to be more correct than the average person, retaining control over it should have a positive expected value.

Realistically, the latter failure appears much much more likely. I’m fairly certain that this movement is in the process taking off with or without us. There are a few groups already forming that are largely unaffiliated with EA/​rationalism but are very enthusiastic. They’ve mostly heard of the problem through us, but they’re inviting people who haven’t, who will invite more people who haven’t. I’ve started to see individuals scared out of all reason, sounding more and more unhinged, because they have no guidance and nowhere to get it, at least until they find these groups. A very realistic possible future includes a large AI safety movement that we have no influence over, doing things we would never have sanctioned for goals we disagree with. Losing the ability to influence something once it gets sufficiently more powerful than you; why does that sound familiar?

My Bigger Point: We Lack Coordination

You probably disagree with many things I’ve said, which brings me to my main point: questions like these haven’t been discussed enough for there to be much prior material to reference, let alone consensuses reached. I could be wrong about a lot of what I suggested; maybe going public is the wrong move, or maybe now isn’t the right time; but I wouldn’t know it because there is no extended conversation around real-world strategy. The point has been raised a couple times before that actions taken by individuals in our circle have been very uncoordinated. Every time this is raised, some people agree and a handful of comment chains are written, but then the conversation fizzles out and nothing results.

One very annoying practical consequence of this lack of coordination is that I never have any idea what prominent figures like Eliezer are thinking. It would have been extremely useful for example to know how his meeting with Sam Altman had gone, or if he considers the famous tweet to be as indicative of personal cruelty as it seems, but I had to watch a podcast for the former and still don’t know the latter. It would have been useful for his TIME article to have been proof-read by many people. It would currently be extremely useful to know what if any dialogue he’s having with Elon Musk (probably none, but if he is, this changes the gameboard). I’m not wishing I could personally ask these questions; I’m wishing there were public record of somebody asking him, after deeming them important datapoints for strategy. In general there seems to be no good way to cooperate with AI safety leadership.

I don’t like saying the phrase “we should”, but it is my strong belief that a universe in which a sizable portion of our dialogue and efforts is dedicated to ongoing, coordinated real-world strategizing is ceteris paribus much safer. It seems clear that this will be the case at some point. Even most outreach-skeptics say only that now is too soon. But starting now can do nothing but maximize time available.

To avoid passing the buck and simply hoping this time is different, I’ve set up the subreddit r/​AISafetyStrategy to serve as a dedicated extended conversation about strategy for now, funded it with $1000 for operations, and am building a dedicated forum to replace it with. I realize unilateral action like this is considered a little gauche on here. To be clear, I think these actions are very suboptimal – I would much prefer something with equivalent function to be set up with the approval and input of everyone here, and I hope something is created that supercedes my thing. Even simply adding a “strategy” tag to LessWrong would probably be better. But until that something better, feel free to join and contribute your strategy questions and ideas.