Manhattan project for aligned AI

One possible thing that I imagine might happen, conditional on an existential catastrophe not occurring, is a Manhattan project for aligned AGI. I don’t want to argue that this is particularly likely or desirable. The point of this post is to sketch the scenario, and briefly discuss some implications for what is needed from current research.

Imagine the following scenario: It is only late that top AI scientists take the existential risk of AGI seriously, and there hasn’t yet been a significant change in the effort put into AI safety relative to our current trajectory. At some point, there is a recognition among AI scientists and relevant decision-makers that AGI will be developed soon by one AI lab or another (within a few months/​years), and that without explicit effort there is a large probability of catastrophic results. A project is started to develop AGI:

  • It has an XX B$ or XXX B$ budget.

  • Dozens of the top AI scientists are part of the project, and many more assistants. People you might recognize or know from top papers and AI labs join the project.

  • A fairly constrained set of concepts, theories and tools are available that give a broad roadmap for building aligned AGI.

  • There is a consensus understanding among management and the research team that without this project, AGI will plausibly be developed relatively soon, and that without explicitly understanding how to build the system safely it will pose an existential risk.

It seems to me that it is useful to backchain from this scenario to see what is needed, assuming that this kind of alignment Manhattan project is indeed what should happen.

Firstly, my view is that if this Manhattan project would start in intellectual conditions similar to today’s, there wouldn’t be very many top AI scientists significantly motivated to work on the problem, and it would not be taken seriously. Even very large sums of money would not suffice, since there wouldn’t be enough of a common understanding about what the problem is for it to work.

Secondly, it seems to me that there isn’t enough of a roadmap for building aligned AGI for such a project to succeed in a short time-frame of months to years. I expect some people to disagree with this, but looking at current rates of progress in our understanding of AI safety, and my model of the practical parallelizability of conceptual progress, I am skeptical that the problem can be solved in a few years even by a group of 40 highly motivated and financed top AI scientists. It is plausible that this will look different closer to the finish line, but I am skeptical.

On this model, I have in mind basically two kinds of work that contribute to good outcomes. This is not a significant change relative to my prior view, but in my mind it constrains the motivation behind such work to some degree:

  • Research that makes the case for AGI x-risk clearer, and constrains how we believe the problem occurs, in order to make it eventually easier to convince top AI scientists that working in such an alignment Manhattan project is reasonable, and to make sure there is a team that’s on the same page as to what the problem is.

  • Research that constrains the roadmap for building aligned AGI. I’m thinking mostly of conceptual/​theoretical/​empirical work that helps us converge to an approach that can then be developed/​refined and scaled by a large effort over a short time period.

I suspect this mostly shouldn’t change my general picture of what needs to be done, but it does shift my emphasis somewhat.