Takeoff speeds have a huge effect on what it means to work on AI x-risk
The slow takeoff hypothesis predicts that AGI emerges in a world where powerful but non-AGI AI is already a really big deal. Whether AI is a big deal right before the emergence of AGI determines many super basic things about what we should think our current job is. I hadn’t fully appreciated the size of this effect until a few days ago.
In particular, in a fast takeoff world, AI takeover risk never looks much more obvious than it does now, and so x-risk-motivated people should be assumed to cause the majority of the research on alignment that happens. In contrast, in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.
In the fast takeoff world, we’re much more like a normal research field–we want some technical problem to eventually get solved, so we try to solve it. But in the slow takeoff world, we’re basically in a weird collaboration across time with the more numerous, non-longtermist AI researchers who will be in charge of aligning their powerful AI systems but who we fear won’t be cautious enough in some ways or won’t plan ahead in some other ways. Doing technical research in the fast takeoff world basically just requires answering technical questions, while in the slow takeoff world your choices about research projects are closely related to your sociological predictions about what things will be obvious to whom when.
I think that these two perspectives are extremely different, and I think I’ve historically sometimes had trouble communicating with people who held the slow takeoff perspective because I didn’t realize we disagreed on basic questions about the conceptualization of the question. (These miscommunications persisted even after I was mostly persuaded of slow takeoffs, because I hadn’t realized the extent to which I was implicitly assuming fast takeoffs in my picture of how AGI was going to happen.)
As an example of this, I think I was quite confused about what genre of work various prosaic alignment researchers think they’re doing when they talk about alignment schemes. To quote a recent AF shortform post of mine:
Something I think I’ve been historically wrong about:
A bunch of the prosaic alignment ideas (eg adversarial training, IDA, debate) now feel to me like things that people will obviously do the simple versions of by default. Like, when we’re training systems to answer questions, of course we’ll use our current versions of systems to help us evaluate, why would we not do that? We’ll be used to using these systems to answer questions that we have, and so it will be totally obvious that we should use them to help us evaluate our new system.
Similarly with debate—adversarial setups are pretty obvious and easy.
In this frame, the contributions from Paul and Geoffrey feel more like “they tried to systematically think through the natural limits of the things people will do” than “they thought of an approach that non-alignment-obsessed people would never have thought of or used”.
It’s still not obvious whether people will actually use these techniques to their limits, but it would be surprising if they weren’t used at all.
I think the slightly exaggerated slogan for this update of mine is “IDA is futurism, not a proposal”.
My current favorite example of the thinking-on-the-margin version of alignment research strategy is in this comment by Paul Christiano.