Let’s build a fire alarm for AGI

Epistemic status: This uses as a premise the post “There is no fire alarm for AGI” by Eliezer Yudkowsky. It will make little sense to you if you are unfamiliar with that essay or if you disagree with it.

Fear of embarrassment has been empirically shown to stop people from reacting to serious threats. A fire alarm creates common knowledge of the possible presence of a serious danger and provides an excuse to react that saves us from embarrassment. Eliezer said in 2017 that there is no such thing for artificial general intelligence. This seems to continue to be true. Let’s stop accepting that state of affairs, and do something about it—let’s build such a fire alarm.

This fire alarm has to have three traits:

  1. It needs to detect the imminence of AGI. We won’t know until it’s too late how well its detection mechanism will have worked. So this will have to be designed on the best assumptions we have.

  2. It needs to be loud: very public, very media-friendly, with as many established and friction-free channels to multipliers as possible.

  3. It needs to be extremely easy to understand: a boolean or an integer or something similarly primitive. Details can be offered for the tiny but important minority that wants them, but we need to understand that almost all who receive the alarm won’t care about details.

The Doomsday Clock by the Bulletin of the Atomic Scientists provides a useful template. It takes many complicated details about nuclear weapons development and proliferation, international relations and arms control treaties, and simplifies them into a single number, the Minutes to Midnight. You’ve heard of them—and that’s the point. This alarm is 76 years old and still succeeds at getting news coverage.

There already is ARC Evals, which does a similar thing: they develop tools for AI labs to check whether their models are becoming dangerous. This is not the fire alarm proposed here, because it is not directed at the general public, does not create common knowledge because it is not loud enough, and therefore does not provide an excuse to seriously worry. Nevertheless, this is important work that can be built on.

It seems to me obvious that the format should be an expected time: the year, maybe the month, when to expect… what exactly? What constitutes the moment the aliens are landing? If your answer is longer than a single sentence, it is wrong.

We’re not great at publicity. But recent developments have obviously raised much concern, including in the wider public, there’s a lot of confusion and therefore much demand for guidance. The situation is so favorable that the required level of publicity skill is clearly lower than it has ever been.

But at the same, present time, there’s no discernible competition in the lane of extremely simple communication about AGI risk, because this subject is dominated by people who read and write very complicated things for a living. That’s not going to last. The vacuum of extremely simple communication about AGI risk will be filled eventually, who knows by whom and with what agenda. We’d better be there first.

This is not a project, or even a finished draft of a project, that I just want people who donate their efforts to. I’d love to contribute, but I don’t have the resources to coordinate such a thing. I hope that in this community, volunteers can find this idea, and each other, and set up some impromptu project, and do what nerds do best: build important new tech. Ideally, LessWrong and/​or the Alignment Research Center can be to this what the Bulletin of Atomic Scientists is to the Doomsday Clock.

I expect the bottleneck will be media-savvy and influential people, but surely some of us can go find such people.

Over the last few months, the number of concerned people who might be willing to help constrain AGI risk has grown massively. But they aren’t (usually) machine learning developers. Building this alarm, and amplifying it as much as possible, gives them an opportunity to help. It is high time for us to offer that.

What do you think? Are you in?

No comments.