A warm-up for the AI governance project

Here is one possible world we could be living in.

Imagine that the majority of the world’s population knows that unaligned AGI is coming. This majority includes most of the world’s heads of governments, all of respectable scientists, most of journalists, all your friends, your close family, your distant aunt from Canada and your neighbour next door. The topic would sneak in casual conversations over beer, and you could overhear people on the street discussing their fears of being turned into paperclips.

Furthermore, imagine that the AI alignment project turned out to be easy. And, not just easy in theory, easy in the most ordinary sense of the world, like taking trash out, or microwaving pizza, and then even simpler. Easy to the point that you could explain a solution to a five year old, and they would understand the spirit of it, if not the details, perfectly well. Let’s, for the sake of argument, imagine that the solution to the alignment problem was for us to repeat, collectively, one quadrillion times, to the mirror, “AGI, AGI, please be safe and turn out fine”, while patting ourselves three times on the forehead.

If this all sounds unrealistic—possibly too optimistic even—let’s dream bigger! Picture a website with a big red counter saying “AGI safety X% done”, that anyone in the world could look at. Kind of like a Doomsday Clock, but with real measurements backing it up real, and the fact that it is real being the common knowledge.

The counter would show the number of times people all over the world said the phrase (and patted themselves doing it). It would be frequently featured in the news, governments around the world would talk about it in their communications, the pope would sometimes mention it at the end of his homilies, some cities would even build giant interactive billboards displaying the current value of the counter in real time.

The counter would not only have a numerical value, it would have a range of sub-meters, each pointing out how people in all countries around the world are doing, down to the granularity of a city or more. There would not be any taboo talking to people about this—conversely, it would be actively status-increasing to show off your tells-and-pats. People could, and would, shame each other for not doing it enough.

Imagine that the counter you pictured before had a pretty clear timeline until AGI. The timeline would be perfectly sized—not short enough as to render any action meaningless, nor long enough for us to discount it entirely.

Implementing the solution would not imply any major drawbacks. Everyone would be equally able to say the words. Forget about us “sacrificing our music and our non-numerical names” in the process, or challenging our ethical norms, or going against worldviews. The solution would not only be good in the circumstances, but would, as if by happy accident, make the world a better place in a broad sense. Maybe it turned out that saying the words lessens the chance of throat cancer, and physical activity of patting yourself increases blood circulation.

And if this all sounds too easy, one last thing comes to mind. It would not only be people who could say the magic spell! We could build robots to help us with the task. Constructing them to say the words properly, and pat, would not even be that hard—in fact, there would be hundreds of companies building and pitching them in an open market.

In this world, would pursuing AI-safety-focused policy be easy?


That analogy with global warming is, on a visceral level, what makes me really sceptical about AI governance. I can picture a few arguments against it:

  • Maybe the problem has to be really bad on a personal level, and not just in a statistical, far-(geographically, or timewise)-removed way. But then, there is the Fable of Dragon, which makes it personal (although less tractable), and still, people do not work on it very hard.

  • Maybe the decision makers do not have skin in the game, with floods causing famines in Pakistan compared to AGI personally dismembering their own children?

  • Maybe the structure of the problem is much less distributed, and thus amenable to targeted interventions that governments would be more capable of?

  • Maybe I am being too naive about the (relative) easiness of the global warming problem? The technical solution is not that straightforward, there is some reasonable opposition to the project, there are squabbles about cost-benefit calculations?

I agree that these are reasonable points, and I certainly made the first part look like it does more for the fun-of-imagining-it reasons than factual accuracy. Still, I am not completely convinced by any of the arguments against, and I feel that they somehow don’t refute the central premise. What am I missing?

(Incidentally, I think one big unintended benefit of working hard on global warming is making it a test ground, a trial run, for what AI governance might require. It might even be the case that pursuing that, instead of taking on the hard case first, is a better strategy? But then, of course, is the question of timelines...)