Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)


The movement to reduce AI x-risk is overly purist. This is leading to a lot of sects to maintain each individual sect’s platonic level of purity and is actively (greatly) harming the cause.

How the Safety Sects Manifest

  • People suggest not publishing AI research

  • More recently, Jan and his team leaving OpenAI

  • Less recently, Paul Christiano leaving OpenAI to form METR[1]

  • Even less recently, Anthropic forming off of OpenAI

  • A suggestion to blacklist anyone who decided to give $30 million (a paltry sum of money for a startup) to OpenAI.

I think these were all legitimate responses to a perceived increase in risk, but ultimately did or will do more harm than good. Disclaimer: I am the least sure that the formation Anthropic increases p(doom) but I speculate, post AGI, it will be seen as such.

The Safetyists Played Their Hands Too Early

To a fundamentalist, it’s unethical to ignore the causes of those actions, but the world is a messy and unpredictable place. It isn’t possible to get anything done without cooperating with some actors who may be deceitful or even harmful. As an example, most corporations are filled with people who don’t care about the mission and would hop for a higher paying job. Despite this apparent mess of conflicting incentives, most corporations are very good at making a lot of money. Maybe it isn’t possible to align incentives for non-monetary goals but I doubt this. (Paying an employee more hurts the company’s profits).

The ideal response to each of these examples is to wait until we’re far closer to AGI to ring the alarm bells. If prediction markets are right, we still have ~8 years until we have something that meets their relatively weak definition of AGI. There is no momentum in being 8 years early, and instead the doom claims lose credibility the same way the Earth going underwater predictions of the 70s fell flat.[2] This behavior has happened with GPT-2 as well.

I get race conditions are a factor in those decisions, but hardware is probably the key limiting factor, and they are already follow an exponential curve.[3] If there is no global race to make AGI, it’s far more likely that Google builds a bunch of bigger datacenters to train their content and ads algorithms. Then someone at DeepMind stumbles across it with little international scrutiny. Google leadership realizes this will make them a lot of money, then races to use it without the world being prepared at all. (Or meta does this exact thing, their datacenter built to compete with TikTok is training llama3-400b)

The Various Safety Sects Will Continue To Lose Relevance

If Jan and Ilya don’t end up joining DeepMind or if AGI does not come within 1-2 years, I will consider it a net increase in p(doom) that they can’t compromise on their safety beliefs to actually make an impact. I predict Anthropic will lose relevance. They will likely never have access to the amount of compute DeepMind or OpenAI will. They are valued at ~1/​5th of the amount OpenAI is valued at so I’m guessing whatever amount OpenAI raised is significantly more than what they have raised.[4] It is looking increasingly clear that the “safer” group of players have nowhere near as much compute as “unsafe” group of players. The “unsafe” group will likely reach AGI first. Will Anthropic hold themselves to their charter clause? No one knows, but I highly doubt it. I think the founder’s egos will rationalize not needing to trigger the charter clause until it’s too late.

Safetyism Can’t Exist Without Strong Backers

Sidenote: Recently there was this comment. I think this viewpoint is a good example of what I’m arguing against. It will be impossible to do anything without money, compute, or clout. So to sum this post up, if your alignment plan doesn’t involve OpenAI, DeepMind, or Anthropic solving it, it won’t work.

  1. ^

    I claim that historically METR will ultimately have had little to no positive impact on catastrophic risks for AI. In fact, Paul’s appointment at NIST was allegedly met with a “revolt” from some employees, which if true, is very sad. I doubt this would happen if he was still associated with OpenAI in some capacity. Clout matters.

  2. ^

    This is a highly charitable comparison as the claimed negative impacts of climate change actually were happening at the time. There was lots of in-your-face evidence with smog from coal plants and what not.

  3. ^

    Ignore Nvidia presenting reducing precision as a “gain”.

  4. ^

    OpenAI’s recent raise was not disclosed, however I assume they will have lower rates for the Stargate datacenter.

    PS:
    I think overall this is a positive interpretation of these sect splits. A more negative interpretation of Anthropic could be that safetyism is a rationalization for wanting to create their own company to enrich themselves. Jan and Ilya’s departures could just be mainly due to a loss of internal influence after a failed coup which was really driven by a desire to not productize.