Which side of the AI safety community are you in?
In recent years, I’ve found that people who self-identify as members of the AI safety community have increasingly split into two camps:
Camp A) “Race to superintelligence safely”: People in this group typically argue that “superintelligence is inevitable because of X”, and it’s therefore better that their in-group (their company or country) build it first. X is typically some combination of “Capitalism”, “Molloch”, “lack of regulation” and “China”.
Camp B) “Don’t race to superintelligence”: People in this group typically argue that “racing to superintelligence is bad because of Y”. Here Y is typically some combination of “uncontrollable”, “1984”, “disempowerment” and “extinction”.
Whereas the 2023 extinction statement was widely signed by both Camp B and Camp A (including Dario Amodei, Demis Hassabis and Sam Altman), the 2025 superintelligence statement conveniently separates the two groups – for example, I personally offered all US Frontier AI CEO’s to sign, and none chose to do so. However, it would be oversimplified to claim that frontier AI corporate funding predicts camp membership – for example, someone from one of the top companies recently told me that he’d sign the 2025 statement were it not for fear of how it would impact him professionally.
The distinction between Camps A and B is also interesting because it correlates with policy recommendations: Camp A tends to support corporate self-regulation and voluntary commitments without strong and legally binding safety standards akin to those in force for pharmaceuticals, aircraft, restaurants and most other industries. In contrast, Camp B tends to support such binding standards, akin to those of the FDA (which can be viewed as a strict ban on releasing medicines that haven’t yet undergone clinical trials and been safety-approved by independent experts). Combined with market forces, this would naturally lead to new powerful yet controllable AI tools, to do science, cure diseases, increase productivity and even aspire for dominance (economic and military) if that’s desired – but not full superintelligence until it can be devised to meet the agreed-upon safety standards – and it remains controversial whether this is even possible.
In my experience, most people (including top decision-makers) are currently unaware of the distinction between A and B and have an oversimplified view: You’re either for AI or against it. I’m often asked: “Do you want to accelerate or decelerate? Are you a boomer or a doomer?” To facilitate a meaningful and constructive societal conversation about AI policy, I believe that it will be hugely helpful to increase public awareness of the differing visions of Camps A and B. Creating such awareness was a key goal of the 2025 superintelligence statement. So if you’ve read this far, I’d strongly encourage you to read it and, if you agree with it, sign it and share it. If you work for a company and worry about blowback from signing, please email me at mtegmark@gmail.com and say “I’ll sign this if N others from my company do”, where N=5, 10 or whatever number you’re comfortable with.
Finally, please let me provide an important clarification about the 2025 statement. Many have asked me why it doesn’t define its terms as carefully as a law would require. Our idea is that detailed questions about how to word laws and safety standards should be tackled later, once the political will has formed to ban unsafe/unwanted superintelligence. This is analogous to how detailed wording of laws against child pornography (who counts as a child, what counts as pornography, etc.) got worked out by experts and legislators only after there was broad agreement that we needed some sort of ban on child pornography.
I think there is some way that the conversation needs to advance, and I think this is roughly carving at some real joints and it’s important that people are tracking the distinction.
But
a) I’m generally worried about reifying the groups more into existence (as opposed to trying to steer towards a world where people can have more nuanced views). This is tricky, there are tradeoffs and I’m not sure how to handle this. But...
b) this post title and framing particular is super leaning into the polarization and I wish it did something different.
(like, even specifically resolving the lack-of-nuance this post complains about, requires distinguishing between “never build ASI” and “don’t build ASI until it can be done safely”, which isn’t covered in the Two Sides)
The main split is about whether racing in the current regime is desirable, so both “never build ASI” and “don’t build ASI until it can be done safely” fall within the scope of camp B. Call these two subcamps B1 and B2. I think B1 and B2 give the same prescriptions within the actionable timeframe.
Some people likely think
Those people might give different prescriptions to the “never build ASI” people, like not endorsing actions that would tank the probability of ASI ever getting built. (Although in practice I think they probably mostly make the same prescriptions at the moment.)
I agree that some people have this preference ordering, but I don’t know of any difference in specific actionable recommendations that would be given by “don’t until safely” and “don’t ever” camps.
In practice, bans can be lifted, so “never” is never going to become an unassailable law of the universe. And right now, it seems misguided to quibble over “Pause for 5, 10, 20 years”, and “Stop for good”, given the urgency of the extinction threat we are currently facing. If we’re going to survive the next decade with any degree of certainty, we need an alliance between B1 and B2, and I’m happy for one to exist.
On this point specifically, those two groups are currently allied, though they don’t always recognize it. If sufficiently-safe alignment is found to be impossible or humanity decides to never build ASI, there would stop being any difference between the two groups.
This is well-encapsulated by the differences between Stop AI and PauseAI. At least from PauseAI’s perspective, both orgs are currently on exactly the same team.
Pause AI is clearly a central member of Camp B? And Holly signed the superintelligence petition.
Yes, my comment was meant to address the “never build ASI” and “don’t build ASI until it can be done safely” distinction, which Raemon was pointing out does not map onto Camp A and Camp B. All of ControlAI, PauseAI, and Stop AI are firmly in Camp B, but have different opinions about what to do once a moratorium is achieved.
One thing I meant to point toward was that unless we first coordinate to get that moratorium, the rest is a moot point.
I don’t like polarization as such, but I also don’t like all of my loved ones being killed. I see this post and the open statement as dissolving a conflationary alliance that groups people who want to (at least temporarily) prevent the creation of superintelligence with people who don’t want to do that. Those two groups of people are trying to do very different things that I expect will have very different outcomes.
I don’t think the people in Camp A are immoral people just for holding that position[1], but I do think it is necessary to communicate: “If we do thing A, we will die. You must stop trying to do thing A, because that will kill everyone. Thing B will not kill everyone. These are not the same thing.”
In general, to actually get the things that you want in the world, sometimes you have to fight very hard for them, even against other people. Sometimes you have to optimize for convincing people. Sometimes you have to shame people. The norms of discourse that are comfortable for me and elevate truth-seeking and that make LessWrong a wonderful place are not always the same patterns as those that are most likely to cause us and our families to still be alive in the near future.
Though I have encountered some people in the AI Safety community who are happy to unnecessarily subject others to extreme risks without their consent after a naive utilitarian calculus on their behalf, which I do consider grossly immoral.
I personally would not sign this statement because I disagree with it, but I encourage any OpenAI employee that wants to sign to do so. I do not believe they will suffer any harmful professional consequences. If you are at OpenAI and want to talk about this, feel free to slack me. You can also ask colleagues who signed the petition supporting SB1047 if they felt any pushback. As far as I know, no one did.
I agree that there is a need for thoughtful regulations for AI. The reason I personally would not sign this statement is because it is vague, hard to operationalize, and attempts to make it as a basis for laws will (in my opinion) lead to bad results.
There is no agreed upon definition of “superintelligence” let alone a definition of what it means to work on developing it as separate from developing AI in general. A “prohibition” is likely to lead to a number of bad outcomes. I believe that for AI to go well, transparency will be key. Companies or nations developing AI in secret is terrible for safety, and I believe this will be the likely outcome of any such prohibition.
My own opinions notwithstanding, other people are entitled to their own, and no one at OpenAI should feel intimidated from signing this statement.
As I said below, I think people are ignoring many different approaches compatible with the statement, and so they are confusing the statement with a call for international laws or enforcement (as you said, “attempts to make it as a basis for laws”), which is not mentioned. I suggested some alternatives in that comment:
”We didn’t need laws to get the 1975 Alisomar moratorium on recombinant DNA research, or the email anti-abuse (SPF/DKIM/DMARC) voluntary technical standards, or the COSPAR guidelines that were embraced globally for planetary protection in space exploration, or press norms like not naming sexual assault victims—just strong consensus and moral suasion. Perhaps that’s not enough here, but it’s a discussion that should take place which first requires clear statement about what the overall goals should be.”
This is good!
My guess is that their hesitance is also linked to potential future climates, though, and not just the current climate, so I don’t expect additional signees to come forward in response to your assurances.
I’d split things this way:
Group A) “Given that stopping the AI race seems nearly impossible, I focus on ensuring humanity builds safe superintelligence”
Group B) “Given that building superintelligence safely under current race dynamics seems nearly impossible, I focus on stopping the AI race”
Group C) “Given deep uncertainty about whether we can align superintelligence under race conditions or stop the race itself, I work to ensure both strategies receive enough resources.”
C is fake, it’s part of A, and A is fake, it’s washing something which we don’t yet understand and should not pretend to understand.
You make a valid point. Here’s another framing that makes the tradeoff explicit:
Group A) “Alignment research is worth doing even though it might provide cover for racing”
Group B) “The cover problem is too severe. We should focus on race-stopping work instead”
I think we’re just trying to do different things here… I’m trying to describe empirical clusters of people / orgs, you’re trying to describe positions, maybe? And I’m taking your descriptions as pointers to clusters of people, of the form “the cluster of people who say XYZ”. I think my interpretation is appropriate here because there is so much importance-weighted abject insincerity in publicly stated positions regarding AGI X-risk that it just doesn’t make much sense to focus on the stated positions as positions.
Like, the actual people at The Curve or whatever are less “I will do alignment, and will be against racing, and alas, this may provide some cover” and more “I will do fake alignment with no sense that I should be able to present any plausible connection between my work and making safe AGI, and I will directly support racing”. All the people who actually do the stated thing are generally understood to be irrelevant weirdos. The people who say that are being insincere, and in fact support racing.
I was trying to map out disagreements between people who are concerned enough about AI risk.
Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.
Um, no, you responded to the OP with what sure seems like a proposed alternative split. The OP’s split is about
I think you are making an actual mistake in your thinking, due to a significant gap in your thinking and not just a random thing, and with bad consequences, and I’m trying to draw your attention to it.
...but it’s not fake, it’s just confused according to your expectations about the future—and yes, some people may say it dishonestly, but we should still be careful not to deny that people can think things you disagree with, just because they conflict with your map of the territory.
That said, I don’t see as much value in dichotomizing the groups as others seem to.
What? Surely “it’s fake” is a fine way to say “most people who would say they are in C are not actually working that way and are deceptively presenting as C”? It’s fake.
If you said “mostly bullshit” or “almost always disengenious” I wouldn’t argue, but would still question whether it’s actually a majority of people in group C, which I’m doubtful of, but very unsure about—but saying it is fake would usually mean it is not a real thing anyone believes, rather than meaning that the view is unusual or confused or wrong.
Closely related to: You Don’t Exist, Duncan.
I guess we could say “mostly fake”, but also there’s important senses in which “mostly fake” implies “fake simpliciter”. E.g. a twinkie made of “mostly poison” is just “a poisonous twinkie”. Often people do, and should, summarize things and then make decisions based on the summaries, e.g. “is it poison, or no” --> “can I eat it, or no”. My guess is that the conditions under which it would make sense for you to treat someone as genuinely holding position C, e.g. for purposes of allocating funding to them, are currently met by approximately no one. I could plausibly be wrong about that, I’m not so confident. But that is the assertion I’m trying to make, which is summarized imprecisely as “C is fake”, and I stand by my making that assertion in this context. (Analogy: It’s possible for me to be wrong that 2+2=4, but when I say 2+2=4, what I’m asserting / guessing is that 2+2=4 always, everywhere, exactly. https://www.lesswrong.com/s/FrqfoG3LJeCZs96Ym/p/ooypcn7qFzsMcy53R )
There’s a huge difference between the types of cases, though. A 90% poisonous twinkie is certainly fine to call poisonous[1], but a 90% male groups isn’t reasonable to call male. You said “if most people who would say they are in C are not actually working that way and are deceptively presenting as C,” that seems far like the latter than the former, because “fake” implies the entire thing is fake[2].
Though so is a 1% poisonous twinkie; perhaps the example should be a meal that is 90% protein would be a “protein meal” without implying there is no non-protein substance present.
There is a sense where this isn’t true; if 5% of an image of a person is modified, I’d agree that the image is fake—but this is because the claim of fakeness is about the entirety of the image, as a unit. In contrast, if there were 20 people in a composite image, and 12 of them were AI-fakes and 8 were actual people, I wouldn’t say the picture is “of fake people,” I’d need to say it’s a mixture of fake and real people. Which seems like the relevant comparison if, as you said in another comment, you are describing “empirical clusters of people”!
The OP is about two “camps” of people. Do you understand what camps are? Hopefully you can see that this indeed does induce the analog of “because the claim of fakeness is about the entirety of the image”. They gain and direct funding, consensus, hiring, propaganda, vibes, parties, organizations, etc., approximately as a unit. Camp A is a 90% poison twinkie. The fact that you are trying to not process this is a problem.
I like the first clause in the 2025 statement. If that were the whole thing, I would happily sign it. However having lived in California for decades, I’m pretty skeptical that direct democracy is a good way of making decisions, and would not endorse making a critical decision based on polls or a vote. (See also: brexit.)
I did sign the 2023 statement.
The two necessary pieces are that it won’t doom us and that people want it to happen. Obviously we shouldn’t build it if it might doom us and I additionally think that we shouldn’t build it if people say that that would be bad according to their values. There are all kinds of valid reasons why people might want ASI to exist or not exist once existential risk is removed from the equation.
I don’t think this implies direct democracy, just some mechanism by which humanity at large has some collective say in whether ASI happens or not. “But does humanity even want this” has to at least be a consideration in some form.
I agree that the statement doesn’t require direct democracy but that seems like the most likely way to answer the question “do people want this”.
Here’s a brief list of things that were unpopular and broadly opposed that I nonetheless think were clearly good:
smallpox vaccine
seatbelts, and then seatbelt laws
cars
unleaded gasoline
microwaves (the oven, not the radiation)
Generally I feel like people sometimes oppose things that seem disruptive and can be swayed by demagogues. There’s a reason that representative democracy works better than direct democracy. (Though it has obvious issues as well.)
As another whole class of examples, I think people instinctively dislike free speech, immigration, and free markets. We have those things because elites took a strong stance based on better understanding of the world.
I support democractic input, and especially understanding people’s fears and being responsive to them. But I don’t support only doing things that people want to happen. If we had followed that rule for the past few centuries I think the world would be massively worse off.
I lean more towards the Camp A side, but I do understand and think there’s a lot of benefit to the Camp B side. Hopefully I can, as a more Camp A person, help explain to Camp B dwellers why we don’t reflexively sign onto these kinds of statements.
I think that Camp B has a bad habit of failing to model the Camp A rationale, based on the conversations I see in in Twitter discussions between pause AI advocates and more “Camp A” people. Yudkowsky is a paradigmatic example of the Camp B mindset, and I think it’s worth noting that a lot of people in the public readership of his book found the pragmatic recommendations therein to be extremely unhelpful. Basically, they (and I) see Yudkowsky’s plan as calling for a mass-mobilization popular effort against AI. But his plans, both in IABIED and his other writings, fail to grapple at all with the existing political situation in the United States, or with the geopolitical difficulties involved in political enforcement of an AI ban.
Remaining in this frame of “we make our case for [X course of action] so persuasively that the world just follows our advice” does not make for a compelling political theory on any level of analysis. Looking at the nuclear analogy used in IABIED, if Yudkowsky had advocated for a pragmatic containment protocol like the New START nuclear weapons deal or the Iran-US JCPOA deal, then we (the readers) could see that the Yudkowsky/Camp B side had thought deeply about the complexity of using political power to achieve actions in the full messiness of the real world. But Yudkowsky leaves the details of how a multinational treaty far more widely scoped than any existing multinational agreement would be worked out as an exercise for the reader! When Russia is actively involved in major war with a European country and China is preparing for an semi-imminent invasion of an American ally, the (intentionally?) vague calls for a multinational AI ban ring hollow. Why is there so little Rat brainpower devoted to the pragmatics of how AI safety could be advanced within the global and national political contexts?*
There are a few other gripes that I (speaking in my capacity as a Camp A denizen) have with the Camp B doctrine. Beyond inefficacy/unenforcability, the idea that the development of a superintelligence is a “one-shot” action without the ability to fruitfully learn from near-ASI non-ASI models seems deeply implausible. Also various staples of the Camp B platform — orthogonality and goal divergence out of distribution, notably — seem pretty questionable, or at least undersupported by existing empirical and theoretic work by the MIRI/PauseAI/Camp B faction.
*I was actually approached by an SBF representative in early 2022, who told me that SBF was planning on buying enough American congressional votes via candidate PAC donations that EA/AI safetyists could dictate US federal policy. This was by far the most serious AI safety effort I’ve personally witnessed come out of the EA community, and one of only a few that connected the AI safety agenda to the “facts on the ground” of the American political system.
As someone who was there, I think the portrayal of the 2020-2022 era efforts to influence policy is strawmanned, but I agree that it was the first serious attempt to engage politically by the community—and was an effort which preceded SBF in lots of different ways—so it’s tragic (and infuriating) that SBF poisoned the well by backing it and having it collapse. And most of the reason there was relatively little done by the existential risk community on pragmatic political action in 2022-2024 was directly because of that collapse!
But there are times when it does work!
only adding one bit of sidedness seems like it is insufficient to describe the space of opinions in ways where further description is still needed. however, adding even a single bit of incorrect sidedness, or where one of the sides is one that we’d hope people don’t take, seems like it could have predictable bad consequences. hopefully the split is worth it. I do see this direction in the opinion space, I’m not sure if this split is enough part-of-the-original-generating-process-of-opinions to be worth it though.
I was “let’s build it before someone evil”, I’ve left that particular viewpoint behind since realizing how hard aligning it is. my thoughts on how to align it still tend to be aimed at trying to make camp A not kill us all, because I suspect camp B will fail, and our last line of defense will be maybe we get to figure out safety in time for camp A to not be delusional; I do see some maybe not hopeless paths but I’m pretty pessimistic that we get enough of our ducks in a row in time to hit the home run below par before the death star fires. but to the degree camp B has any shot at success, I participate in it too.
It was empirically infeasible (for the general AGI x-risk technical milieu) to explain this to you faster than you trying it for yourself, and one might have reasonably expected you to have been generally culturally predisposed to be open to having this explained to you. If this information takes so much energy and time to be gained, that doesn’t bode well for the epistemic soundness of whatever stance is currently being taken by the funder-attended vibe-consensus. How would you explain this to your past self much faster?
I agree this distinction is very important, thank you for highlighting it. I’m in camp B and just signed the statement.
I strongly support the idea that we need consensus building before looking at specific paths forward—especially since the goal is clearly far more widely shared than the agreement about what strategy should be pursued.
For example, contra Dean Bell’s unfair strawman, this isn’t a back-door to insist on centralized AI development, or even necessarily a position that requires binding international law! We didn’t need laws to get the 1975 Alisomar moratorium on recombinant DNA research, or the email anti-abuse (SPF/DKIM/DMARC) voluntary technical standards, or the COSPAR guidelines that were embraced globally for planetary protection in space exploration, or press norms like not naming sexual assault victims—just strong consensus and moral suasion. Perhaps that’s not enough here, but it’s a discussion that should take place which first requires clear statement about what the overall goals should be.
This is also why I think the point about lab employees, and making safe pathways for them to speak out, is especially critical; current discussions about whistleblower protections don’t go far enough, and while group commitments (“if N others from my company”) are valuable, private speech on such topics should be even more clearly protected. And one reason for the inability to get consensus of lab employees is because there isn’t currently common knowledge within labs about how many of the people think that the goal is the wrong one, and the incentives for the labs to get investment are opposed to those that would allow employees to have options for voice or loyalty, instead of exit—which explains why, in general, only former employees have spoken out.
Camp A for all the reasons you listed. I think the only safe path forward is one of earnest intent for mutual cooperation rather than control. Not holding my breath though.
I would have labelled Camp A as the control camp (control of the winner over everyone else) and Camp B as the mutual cooperation camp (since ending the race is the fruit of cooperation between nations). If we decide to keep racing for superintelligence, what does the finish line look like, in your view?
I think @rife is talking either about mutual cooperation betwen safety advocates and capabilities researchers, or mutual cooperation between humans and AIs.
I’m mostly “camp D[oomer]: we can’t stop, and we can’t control it, and this is bad, and so the future will be bad”; but the small portion of my hope that things go well relies on cooperation being solved better than it ever has been before. otherwise, I think faster and faster brain-sized-meme evolution is not going to go well for anything we care about. achieving successful, stable, defensible inter-meme cooperation seems … like a tall order, to put it mildly. the poem on my user page is about this, as are my pinned comments. to the degree camp A[ccelerate safety] can possibly make camp D wrong, I’m on board with helping figure out how to solve alignment in time, and I think camp B[an it]/B[uy time] are useful and important.
but like, let’s get into the weeds and have a long discussion in the comments here, if you’re open to it: how do we achieve a defensibly-cooperative future that doesn’t end up wiping out most humans and AIs? how do we prevent an AI that is willing to play defect-against-all, that doesn’t want anything like what current humans and AI want, from beating all of us combined? which research paths are you excited about to make cooperation durable?
I’ll point to a similarly pessimistic but divergent view on how to mange the likely bad transition to an AI future that I co-authored recently;
I think both of these camps are seeing real things. I think:
We should try to stop the race, but be clearsighted about the forces we’re up against and devise plans that have a chance of working despite them.
The race only ends with the winner (or the AI they develop) gaining total power over all of humanity. No thanks. B is the only option, difficult as it may be to achieve.
For me the linked site with the statement doesn’t load. And this was also the case when I first tried to access it yesterday. Seems less than ideal.
Part of the implementation choices might be alienating—the day after I signed I saw in the announcement email yesterday “Let’s take our future back from Big Tech.” and maybe a lot of people, who work at large tech companies, who are on the fence don’t like that brand of populism.
Shared on the EA Forum, with some commentary on the state of the EA Community (I guess the LessWrong rationality community is somewhat similar?)
I am in a part of the A camp which wants us to keep pushing for superintelligence but with more of an overall percentage of funds/resources invested into safety.
Don’t compare to FDA, compare to IAEA.