Against Muddling Through

If Anyone Builds It, Everyone Dies has been attracting reviews and commentary like a big sticky ball rolling over a newsstand. Last week, I set out to respond to a few of those reactions. Long story short, many of my responses turned out to have prerequisites, and now I’m writing a sequence. Oops.

Centrally, in this sequence I attempt to address my concerns with modern AI research in general, and in particular with plans that rely on AIs doing most of the alignment work.

It seems to me that allowing AI research to proceed along its current path predictably gets us all killed. I hope I’m wrong, but I’ve done my best to explain why I’m convinced of this.

Many of the arguments I make were already made by others (and I try to cite those arguments explicitly when I know of them). Consequently, there will be some rehashing of points previously made, now with more Joe. I choose to spell out these points both to facilitate discussion and to lay my own thoughts bare for analysis.

My main goal is to break a layered argument into a bunch of bite-sized chunks, to make it easier to find individual cruxes. I’d like it to be as easy as possible to say, “I agree with the main thrust of this post, but not that one,” and maybe argue the details in the comments to one specific post.

To facilitate this, each post will contain an Afterword, in which I attempt to identify some possible cruxes for myself and others. I won’t try to pass any single person’s intellectual Turing test, but I can at least highlight what feels load-bearing to me and note a few areas where I expect disagreement.

I encourage readers to pick what feels like your most cruxy disagreement with a headline claim.

In order, the arguments are:

…and they culminate in a prediction that we won’t get docile, brilliant AIs before we solve alignment.

  1. ^

    This part is half an argument and half an admission that I don’t understand what people could possibly mean by “intent alignment” that isn’t just a euphemism for solving alignment wholesale.

  2. ^

    This may be the post on which people will crux the hardest, and one I’m very interested to see discussion about. It comes with a slight caveat; see the post for details.

Good is a smaller tar­get than smart

Good­ness is harder to achieve than competence

LLMs are badly misaligned