[RETRACTED] It’s time for EA leadership to pull the short-timelines fire alarm.

[EDIT 4/​10/​2022: This post was rash and ill-conceived, and did not have clearly defined goals nor meet the vaguely-defined ones. I apologize to everyone on here; you should probably update accordingly about my opinions in the future. In retrospect, I was trying to express an emotion of exasperation related to the recent news I later mention, which I do think has decreased timelines broadly across the ML world.

While I stand by my claims on roughly-human AGI probability, I no longer stand by my statement that “we should pull the fire-alarm”. That is unlikely to lead to the calculated concerted effort we need to maximize our odds of successful coordination. Nor is it at all clear, given the timeline mechanism I described here, that AGI built in this way would be able to quickly FOOM, the primary point of concern for such a fire alarm.

I’ve left the rest of the post here as a record.


Based on the past week’s worth of papers, it seems very possible (>30%) that we are now in the crunch-time section of a short-timelines world, and that we have 3-7 years until Moore’s law and organizational prioritization put these systems at extremely dangerous levels of capability.[1]

The papers I’m thinking about:

It seems altogether possible that it would not take long, given these advances and a moderate amount of iteration, to create an agentic system capable of long-term decision-making.

If you want to think of this as the public miscalibrated Bayesian-updating of one person, you should feel free to do that. If this was a conclusion you reached independently, though, I want to make sure we coordinate.

For those who haven’t grappled with what actual advanced AI would mean, especially if many different organizations can achieve it:

  • No one knows how to build an AI system that accomplishes goals, that also isn’t going to prevent you from turning it off. It’s an unsolved research problem. Researchers have been trying for decades, but none of them think they’ve succeeded yet.

  • Unfortunately, for most conceivable goals you could give an AI system, the best way to achieve that goal (taken literally, which is the only thing computers know how to do) is to make sure it can’t be turned off. Otherwise, it might be turned off, and then (its version of) the goal is much less likely to happen.

  • If the AI has any way of accessing the internet, it will copy itself to as many places as it can, and then continue doing whatever it thinks it’s supposed to be doing. At this point, it becomes quite likely that we cannot limit its impact, which is likely to involve much more mayhem, possibly including making itself smarter and making sure that humans aren’t capable of creating other AIs that could turn it off. There’s no off button for the internet.

  • Most AI researchers do not believe in ~AGI, and thus have not considered the technical details of reward-specification for human-level AI models. Thus, it is as of today very likely that someone, somewhere, will do this anyway. Getting every AI expert in the world, and those they work with, to think through this is the single most important thing we can do.

  • It is functionally impossible to build a complex system without ever getting to iterate (which we can’t do without an off-switch), and then get lucky and it just works. Every human invention ever has required trial and error to perfect (e.g. planes, computer software). If we have no off-switch, and the system just keeps getting smarter, and we made anything other than the perfect reward function (which, again, no one knows how to do), the global consequences are irreversible.

  • Do not make it easier for more people to build such systems. Do not build them yourself. If you think you know why this argument is wrong, please please please post it here or elsewhere. Many people have spent their lives trying to find the gap in this logic; if you raise a point that hasn’t previously been refuted, I will personally pay you $1,000.

If this freaks you out, I’m really sorry. I wish we didn’t have to be here. You have permission to listen to everyone else, and not take this that seriously yet. If you’re asking yourself “what can I do”, there are people who’ve spent decades coming up with plans, and we should listen to them.

From my vantage point, the only real answers at this point seem like mass public within-expert advocacy (with as a first step, going through the AI experts who will inevitably be updating on this information) to try and get compute usage restrictions in place, since no one wants anyone else to accidentally deploy an un-airgapped agentic system with no reliable off-switch, even if they think they themselves wouldn’t make that mistake.

Who, in practice, pulls the EA-world fire alarm? Is it Holden Karnofsky? If so, who does he rely on for evidence, and/​or what’s stopping those AI alignment-familiar experts from pulling the fire alarm?

The EA community getting on board and collectively switching to short-timelines-AI-public-advocacy efforts seems pretty critical in this situation, to provide talent for mass advocacy among AI experts and their adjacent social/​professional networks. The faster and more emphatically it occurs, the more of a chance we stand at propagating the signal to ~all major AI labs (including those in the US, UK, and China).

Who do we need to convince within EA/​EA leadership of this? For those of you reading this, do you rate it as less than 30% that we are currently within a fast takeoff, and if not are you basing that on the experts you’d trust having considered the past week’s evidence?

(Crying wolf isn’t really a thing here; the societal impact of these capabilities is undeniable and you will not lose credibility even if 3 years from now these systems haven’t yet FOOMed, because the big changes will be obvious and you’ll have predicted that right.)

EDIT: if anyone adjacent to such a person wants to discuss why the evidence seems very strong, what needs to be done within the next few weeks/​months, please do DM me.

  1. ^

    Whether this should also be considered “fast takeoff”, in the sense of recursive self-improvement, is less clear. However, with human improvement alone it seems quite possible we will get to extremely dangerous systems, with no clear deployment limitations. [This was previously the title of the post; I used the term incorrectly.]