Stuart_Armstrong comments on The Preference Utilitarian’s Time Inconsistency Problem

Stuart_Armstrong 15 Jan 2010 10:13 UTC
2 points
0
I believe you can strip the AI of any preferences towards human utility functions with a simple hack.

Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.

Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.

Note the importance of the “current” expected utility in this setup; an AI will decide whether to industrialise a primitive tribe based on their current utility; if it does industrialise them, it will base its subsequent decisions on their new, industrialised utility.
- arbimote 16 Jan 2010 5:11 UTC
  4 points
  0
  Parent
  
  Add a term granting a large disutility for deaths, and this should do the trick.
  
  What if death isn’t well-defined? What if the AI has the option of cryonically freezing a person to save their life—but then being frozen, that person does not have any “current” utility function, so the AI can then disregard them completely. Situations like this also demonstrate that more generally, trying to satisfy someone’s utility function may have an unavoidable side-effect of changing their utility function. These side-effects may be complex enough that the person does not forsee them, and it is not possible for the AI to explain them to the person.
  
  I think your “simple hack” is not actually that simple or well-defined.
  - Stuart_Armstrong 18 Jan 2010 13:17 UTC
    0 points
    0
    Parent
    It’s simple, it’s well defined—it just doesn’t work. Or at least, work naively the way I was hoping.
    
    The original version of the hack—on one-shot oracle machines—worked reasonably well. This version needs more work. And I shouldn’t have mentioned deaths here; that whole subject requires its own seperate treatment.
- JustinShovelain 15 Jan 2010 17:23 UTC
  3 points
  0
  Parent
  What keeps the AI from immediately changing itself to only care about the people’s current utility function? That’s a change with very high expected utility defined in terms of their current utility function and one with little tendency to change their current utility function.
  
  Will you believe that a simple hack will work with lower confidence next time?
  - Stuart_Armstrong 18 Jan 2010 13:18 UTC
    0 points
    0
    Parent
    
    Will you believe that a simple hack will work with lower confidence next time?
    
    Slightly. I was counting on this one getting bashed into shape by the comments; it wasn’t so in future, I’ll try and do more of the bashing myself.
- timtyler 15 Jan 2010 18:34 UTC
  0 points
  0
  Parent
  You meant “any preferences towards MODIFYING human utility functions”.
  - Stuart_Armstrong 18 Jan 2010 12:25 UTC
    0 points
    0
    Parent
    Yep