RFC: Philosophical Conservatism in AI Alignment Research

I’ve been op­er­at­ing un­der the in­fluence of an idea I call philo­soph­i­cal con­ser­vatism when think­ing about AI al­ign­ment. I am in the pro­cess of sum­ma­riz­ing some of the spe­cific stances I take and why I take them be­cause I be­lieve oth­ers would bet­ter serve the pro­ject of al­ign­ment re­search by do­ing the same, but in the mean­time I’d like to re­quest com­ments on the gen­eral line of think­ing to see what oth­ers think. I’ve for­mat­ted the out­line of the gen­eral idea and rea­sons for it with num­bers so you can eas­ily com­ment on each state­ment in­de­pen­dently.

  1. AI al­ign­ment is a prob­lem with bi­modal out­comes, i.e. most of the prob­a­bil­ity dis­tri­bu­tion is clus­tered around suc­cess and failure with very lit­tle area un­der the curve be­tween these out­comes.

  2. Thus, all else equal, we would rather be ex­tra cau­tious and miss some paths to suc­cess than be in­suffi­ciently cau­tious and hit a path to failure.

  3. One re­sponse to this is what Yud­kowsky calls se­cu­rity mind­set by al­lud­ing to Sch­neier’s con­cept of the same name.

  4. Another is what I call philo­soph­i­cal con­ser­vatism. The ideas are re­lated and ad­dress re­lated con­cerns but in differ­ent ways.

  5. Philo­soph­i­cal con­ser­vatism says you should make the fewest philo­soph­i­cal as­sump­tions nec­es­sary to ad­dress­ing AI al­ign­ment and that each as­sump­tion should be max­i­mally par­si­mo­nious and be the as­sump­tion that is least con­ve­nient for ad­dress­ing al­ign­ment if it were true when there is non­triv­ial un­cer­tainty over whether a similar, more con­ve­nient as­sump­tion holds.

  6. This is a strat­egy that re­duces the chance of false pos­i­tives in al­ign­ment re­search but makes the prob­lem pos­si­bly harder, more costly, and less com­pet­i­tive to solve.

  7. For ex­am­ple, we should as­sume there is no dis­cov­er­ably cor­rect ethics or metaethics the AI can learn since, al­though it would make the prob­lem eas­ier if this were true, there is non­triv­ial un­cer­tainty around this and so the as­sump­tion which makes it less likely that al­ign­ment pro­jects fail is to as­sume that ethics and metaethics are not solv­able.

  8. Cur­rent al­ign­ment re­search pro­grams do not seem to op­er­ate with philo­soph­i­cal con­ser­vatism be­cause they ei­ther leave philo­soph­i­cal is­sues rele­vant to al­ign­ment un­ad­dressed, make un­clear im­plicit philo­soph­i­cal as­sump­tions, or ad­mit be­ing hope­ful that helpful as­sump­tions will prove true and ease the work.

  9. The al­ign­ment pro­ject is bet­ter served by those work­ing on it us­ing philo­soph­i­cal con­ser­vatism be­cause it re­duces the risks of false pos­i­tives and spend­ing time on re­search di­rec­tions that are more likely than oth­ers to fail if their philo­soph­i­cal as­sump­tions do not hold.