Why I expect successful (narrow) alignment

Link post

Summary

I be­lieve that ad­vanced AI sys­tems will likely be al­igned with the goals of their hu­man op­er­a­tors, at least in a nar­row sense. I’ll give three main rea­sons for this:

  1. The tran­si­tion to AI may hap­pen in a way that does not give rise to the al­ign­ment prob­lem as it’s usu­ally con­ceived of.

  2. While work on the al­ign­ment prob­lem ap­pears ne­glected at this point, it’s likely that large amounts of re­sources will be used to tackle it if and when it be­comes ap­par­ent that al­ign­ment is a se­ri­ous prob­lem.

  3. Even if the pre­vi­ous two points do not hold, we have already come up with a cou­ple of smart ap­proaches that seem fairly likely to lead to suc­cess­ful al­ign­ment.

This ar­gu­ment lends some sup­port to work on non-tech­ni­cal in­ter­ven­tions like moral cir­cle ex­pan­sionor im­prov­ing AI-re­lated policy, as well as work on spe­cial as­pects of AI safety like de­ci­sion the­ory or worst-case AI safety mea­sures.