A case for capabilities work on AI as net positive

Edit: Changed the title.

Or, why I no longer agree with the standard LW position on AI anymore.

In a sense, this is sort of a weird post compared to what LW usually posts on AI.

A lot of this is going to depend on some posts that changed my worldview on AI risk, and they will be linked below:

Deceptive alignment skepticism sequence, especially the 2nd post in the sequence is here:


Evidence of the natural abstractions hypothesis in action:



Summary: The big updates I made was that deceptive alignment was way more unlikely than I thought, and given that deceptive alignment was a big part of my model of how AI risk would happen (about 30-60% of my probability mass was on that failure mode), that takes a big bite out of the probability mass of extinction enough to make increasing AI capabilities having positive expected value. Combine this with the evidence that at least some form of the natural abstractions hypothesis is being borne out by empirical evidence, and I now think the probabilities of AI risk have steeply declined to only 0.1-10%, and all of that probability mass is plausibly reducible to ridiculously low numbers by going to the stars and speeding up technological progress.

In other words, I now believe a significant probability, on the order of 50-70%, that alignment is solved by default.

EDIT: While I explained why I increased my confidence in alignment by default in response to Shiminux, I now believe that for now I was overconfident on the precise probabilities on alignment by default.

What implications does this have, if this rosy picture is correct?

The biggest implication is that technological progress looks vastly positive, compared to what most LWers and the general public think.

This also implies a purpose shift for Lesswrong. For arguably 20 years, the site was focused on AI risk, though it arguably exploded with LLMs and actual AI capabilities being released.

What it will shift to is important, but assuming that this rosy model of alignment is correct, then I’d argue a significant part of the field of AI Alignment should and can change purpose to something else.

As for Lesswrong, I’d say we should probably focus more on progress studies like Jason Crawford and inadequate equilibria and how to change them.

I welcome criticism and discussion of this post, due to it’s huge implications for LW.