Aspiring AI safety researchers should ~argmax over AGI timelines

Epistemic status: This model is mostly based on a few hours of dedicated thought, and the post was written in 30 min. Nevertheless, I think this model is probably worth considering.

Many people seem to be entering the AI safety ecosystem, acquiring a belief in short timelines and high P(doom), and immediately dropping everything to work on AI safety agendas that might pay off in short-timeline worlds. However, many of these people might not have a sufficient “toolbox” or research experience to have much marginal impact in short timelines worlds.

Rather than tell people what they should do on the object level, I sometimes tell them:

  1. Write out your credences for AGI being realized in 2027, 2032, and 2042;

  2. Write out your plans if you had 100% credence in each of 2027, 2032, and 2042;

  3. Write out your marginal impact in lowering P(doom) via each of those three plans;

  4. Work towards the plan that is the argmax of your marginal impact, weighted by your credence in the respective AGI timelines.

Some further considerations

  • If you are risk averse over your marginal impact, you should maybe avoid a true argmax approach and instead choose a plan that pays out some marginal impact in the three timeline scenarios. For example, some shovel-ready, short-timeline AI safety research agendas may help prepare you for long-timeline AI safety research more than others. Consider blending elements of your plans in the three timeline scenarios (the “~” in “~argmax”). Perhaps you have side constraints on your minimal impact in the world where AGI is realized in 2027?

  • Your immediate plans might be similar in some scenarios. If so, congratulations, you have an easier decision! However, I suspect most aspiring AI safety researchers without research experience should have different plans for different AGI timeline scenarios. For example, getting a Ph.D. in a top lab probably makes most people much better at some aspects of research and working in emerging tech probably makes most people much better at software engineering and operations.

  • You should be wary of altering your timeline credences in an attempt to rationalize your preferred plan or highest-probability timeline scenario. However, don’t be afraid to update your credences over AGI timelines or your expected marginal impact in those worlds! Revisit your plan often and expect them to change (though hopefully not in predictable ways, as this would make you a bad Bayesian).

  • Consider how the entire field of AI talent might change if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong? Does this change your plan?

  • If you want to develop more finely-grained estimates over timelines (e.g., 2023, 2024, etc.) and your marginal impact in those worlds, feel free to. I prefer to keep the number of options manageable.

  • Your marginal impact might also change with respect to the process by which AGI is created in different timeline worlds. For example, if AGI arrives in 2023, I imagine that the optimal mechanistic interpretability researcher might not have as high an impact as they would if AGI arrived some years later, when interpretability has potentially had time to scale.