When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

This is a question I’m puzzling over, and my current answer is that when it comes to decisions about AI alignment strategy, I will put more planning weight on median futures where we survive, making my effective timelines longer for some planning purposes, but not removing urgency.

I think that in most worlds where we manage to build aligned AGI systems, we managed to do this in large part because we bought more time to solve the alignment problem, probably via one of two mechanisms:

  • 🤝 coordination to hold off building AGI

  • 🦾 solving AI alignment in a limited way, and using a capabilities-limited AGI to negotiate for more time to effort to develop aligned AGI

I think we are likely to buy >5 years of time via one or both of these routes in >80% of worlds where we successfully build aligned AGI.

I have less of a good estimate about how long my AI timelines are for the median future | eventual AGI alignment success. 20 years? 60 years? I haven’t thought about it enough to give a good estimate, but I think at least 10 years. Though, I think time bought for additional AI alignment work is not equally useful.
[1] 🧐

Implications for me:

  • 🗺 For strategy purposes, I should plan according to a distribution of worlds around my median timeline | eventual AI alignment success. For me, that’s like ~30 years (it’s always 30 years 💀), though I’ve only done a very cursory estimate of this and plan to think about it more.

  • ⌛️ There’s still a lot of urgency! My timelines are NOT distributed around ~30 years! If we get that time, it’s mostly because we BOUGHT it. So there’s urgency to work towards coordination on buying time and/​or figuring out how to build aligned-and-corrigible-enough-AGI to buy more time and shepherd alignment research.

Personal considerations

  • 🪣 I do have a bucket list, and for the purposes of “things I’d really like to do before I die”, I go with my estimated lifespan in my median world

Terms and assumptions:

  • 🤖 By AGI I mean general intelligence with significantly greater control /​ optimization power than human civilization

  • 🦾 By capabilities-limited AGI, I mean a general intelligence with significantly greater capabilities than humans in some domains, but corrigible enough not to self improve /​ seize power to accomplish arbitrary goals

  • 😅 My timelines are 70% chance < 20 years

  • 💀 I’m assuming AGI + no alignment = human extinction

  • 🏆 Solving the alignment problem = building aligned AGI

I’d love to hear how other people are answer this question for themselves, and any thoughts /​ feedback on how I’m thinking about it. 🦜

This post is also on the EA Forum

  1. ^

    I think the time bought by solving AI alignment in a limited way & using that to buy time, compared to the time obtained through human coordination efforts, is more likely to be a greater proportion of the time in the median world where we eventually solve alignment. However, I also think my own efforts are less important (though potentially still important) in the use-AI-to-buy-time world. So it’s hard to know how to weight it, so I’m not distinguishing much between these types of additional time right now.