The case for Doing Something Else (if Alignment is doomed)

(Related to What an Actually Pessimistic Containment Strategy Looks Like)

It seems to me like there are several approaches with an outside chance of preventing doom from AGI. Here are four:

  1. Convince a significant chunk of the field to work on safety rather than capability

  2. Solve the technical alignment problem

  3. Rethink fundamental ethical assumptions and search for a simple specification of value

  4. Establish international cooperation toward Comprehensive AI Services, i.e., build many narrow AI systems instead of something general

Furthermore, these approaches seem quite different, to the point that some have virtually no overlap in a Venn-diagram. #1 is entirely a social problem, #2 a technical and philosophical problem, #3 primarily a philosophical problem, and #4 in equal parts social and technical.

Now suppose someone comes to you and says, “Hi. I’m working on AI safety, which I think is the biggest problem in the world. There are several very different approaches for doing this. I’m extremely confident (99%+) that the approach I’ve worked on the most and know the best will fail. Therefore, my policy recommendation is that we all keep working on that approach and ignore the rest.

I’m not saying the above describes Eliezer, only that the ways in which it doesn’t are not obvious. Presumably Eliezer thinks that the other approaches are even more doomed (or at least doomed to a degree that’s sufficient to make them not worth talking about), but it’s unclear why that is or why we can be confident in it given the lack of effort that has been extended so far.

Take this comment as an example:

How about if you solve a ban on gain-of-function research [before trying the policy approach], and then move on to much harder problems like AGI? A victory on this relatively easy case would result in a lot of valuable gained experience, or, alternatively, allow foolish optimists to have their dangerous optimism broken over shorter time horizons.

This reply makes sense if you are already convinced that policy is a dead end and just want to avoid wasting resources on that approach. If policy can work, it sounds like a bad plan since we probably don’t have time to solve the easier problem first, especially not if one person has to do it without the combined effort of the community. (Also, couldn’t we equally point to unsolved subproblems in alignment, or alignment for easier cases, and demand that they be solved before we dare tackle the hard problem?)

What bothers me the most is that discussion of alternatives has not even been part of the conversation. Any private person is, of course, free to work on whatever they want (and many other researchers are less pessimistic about alignment), but I’m specifically questioning Miri’s strategy, which is quite influential in the community. No matter how pessimistic you are about the other approaches, surely there has to be some probability for alignment succeeding below which it’s worth looking at alternatives. Are we 99% sure that value isn’t simple and that the policy problem is unsolvable even for a shift to narrow systems? 99.99%? What is the point at which it begins to make sense to advocate for work on something else (which is perhaps not even on the list)? It’s possible that Miri should stick to alignment regardless because of comparative advantage, but the messaging could have been “this alignment thing doesn’t seem to work; we’ll keep at it but the rest of you should do something else”, and well it wasn’t.