Very different, very adequate outcomes

Let be the util­ity func­tion that—some­how—ex­presses your prefer­ences[1]. Let be the util­ity func­tion ex­presses your he­do­nis­tic plea­sure.

Now imag­ine an AI is pro­grammed to max­imise . If we vary in the range of to , then we will get very differ­ent out­comes. At , we will gen­er­ally be he­do­nically satis­fied, and our prefer­ences will be fol­lowed if they don’t cause us to be un­happy. At , we will ac­com­plish any prefer­ence that doesn’t cause us huge amounts of mis­ery.

It’s clear that, ex­trap­o­lated over the whole fu­ture of the uni­verse, these could lead to very differ­ent out­comes[2]. But—and this is the cru­cial point—none of these out­comes are re­ally that bad. None of them are the dis­asters that could hap­pen if we picked a ran­dom util­ity . So, for all their differ­ences, they reside in the same neb­u­lous cat­e­gory of “yeah, that’s an ok out­come.” Of course, we would have prefer­ences as to where lies ex­actly, but few of us would risk the sur­vival of the uni­verse to yank around within that range.

What hap­pens when we push to­wards the edges? Push­ing to­wards seems a clear dis­aster: we’re happy, but none of our prefer­ences are re­spected; we ba­si­cally don’t mat­ter as agents in­ter­act­ing with the uni­verse any more. Push­ing to­wards might be a dis­aster: we could end up always mis­er­able, even as our prefer­ences are fully fol­lowed. The only thing pro­tect­ing us from that fate is the fact that our prefer­ences in­clude he­do­nis­tic plea­sure; but this might not be the case in all cir­cum­stances. So mov­ing to the edges is risky in the way that mov­ing around in the mid­dle is not.

In my re­search agenda, I talk about ad­e­quate out­comes, given a choice of pa­ram­e­ters, or ac­cept­able ap­prox­i­ma­tions. I mean these terms in the sense of the ex­am­ple above: the out­comes may vary tremen­dously from one an­other, given the pa­ram­e­ters or the ap­prox­i­ma­tion. Nev­er­the­less, all the out­comes avoid dis­asters and are clearly bet­ter than max­imis­ing a ran­dom util­ity func­tion.

  1. This be­ing a some­what naive form of prefer­ence util­i­tar­i­anism, along the lines of “if the hu­man choose it, then its ok”. In par­tic­u­lar, you can end up in equil­ibriums where you are mis­er­able, but un­will­ing to choose not to be (see for ex­am­ple, some forms of de­pres­sion). ↩︎

  2. This fails to be true if prefer­ence and he­do­nism can be max­imised in­de­pen­dently; eg if we could take an effec­tive happy pill and still fol­low all our prefer­ences. I’ll fo­cus on the situ­a­tion where there are true trade­offs be­tween prefer­ence and he­do­nism. ↩︎