Against Muddling Through

3 Oct 2025 20:59 UTC

If Anyone Builds It, Everyone Dies has been attracting reviews and commentary like a big sticky ball rolling over a newsstand. Last week, I set out to respond to a few of those reactions. Long story short, many of my responses turned out to have prerequisites, and now I’m writing a sequence. Oops.

Centrally, in this sequence I attempt to address my concerns with modern AI research in general, and in particular with plans that rely on AIs doing most of the alignment work.

It seems to me that allowing AI research to proceed along its current path predictably gets us all killed. I hope I’m wrong, but I’ve done my best to explain why I’m convinced of this.

Many of the arguments I make were already made by others (and I try to cite those arguments explicitly when I know of them). Consequently, there will be some rehashing of points previously made, now with more Joe. I choose to spell out these points both to facilitate discussion and to lay my own thoughts bare for analysis.

My main goal is to break a layered argument into a bunch of bite-sized chunks, to make it easier to find individual cruxes. I’d like it to be as easy as possible to say, “I agree with the main thrust of this post, but not that one,” and maybe argue the details in the comments to one specific post.

To facilitate this, each post will contain an Afterword, in which I attempt to identify some possible cruxes for myself and others. I won’t try to pass any single person’s intellectual Turing test, but I can at least highlight what feels load-bearing to me and note a few areas where I expect disagreement.

I encourage readers to pick what feels like your most cruxy disagreement with a headline claim.

In order, the arguments are:

Good is a smaller target than smart
Goodness is harder to achieve than competence
LLMs are badly misaligned
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Intent alignment doesn’t seem possible without fully solving alignment^[1]
Alignment progress doesn’t compensate for higher capabilities^[2]
Labs lack the tools to course-correct

…and they culminate in a prediction that we won’t get docile, brilliant AIs before we solve alignment.

^
This part is half an argument and half an admission that I don’t understand what people could possibly mean by “intent alignment” that isn’t just a euphemism for solving alignment wholesale.
^
This may be the post on which people will crux the hardest, and one I’m very interested to see discussion about. It comes with a slight caveat; see the post for details.

Good is a smaller target than smart

Joe Rogero3 Oct 2025 21:04 UTC

21 points

0 comments2 min readLW link

Goodness is harder to achieve than competence

Joe Rogero3 Oct 2025 21:32 UTC

22 points

0 comments3 min readLW link

LLMs are badly misaligned

Joe Rogero5 Oct 2025 14:00 UTC

6 points

2 comments3 min readLW link

Against Muddling Through

Good is a smaller tar­get than smart

Good­ness is harder to achieve than competence

LLMs are badly misaligned

Good is a smaller target than smart

Goodness is harder to achieve than competence