Manfred comments on Value Loading

Manfred 23 Oct 2012 7:13 UTC
4 points
0
Could you define the “Cake or Death problem” and given an example of a decision-making system that falls prey to it?

First nitpick: Since the sum on i (i just being some number I’m using to number utility functions) of u_i(w)·p(C(u_i)|w) is a function only dependent on w, it’s really just a complicatedly-written utility function. I think you want u_i(w)·p(C(u_i)|w, e) - that would allow the agent to gain some sort of evidence about its utility function. Also, since C(u_i) is presumably supposed to represent a fixed logical thingamabob, to be super-precise we could talk about some logical uncertainty measure over whether the utility function is correct, M(u_i, w, e), rather than a probability—but I think we don’t have to care about that.

Second nitpick: To see what happens, let’s assume our agent has figured out its utility function—it now picks the action with the largest sum on w of p(w|e, a)·u(w), where “w” is a world describing present, past and future, and u(w) is its one true utility function. This happens to look a lot like an evidential decision theory (EDT) agent, which runs into known problems. For example, if there was a disease that had low utility but made you unable to punch yourself in the face, this fact makes an EDT agent want to punch itself in the face so it could increase the probability it didn’t have the disease.
- Stuart_Armstrong 23 Oct 2012 11:22 UTC
  4 points
  0
  Parent
  I’ll post the “cake or death” problem in a post soon.
  - David_Gerard 23 Oct 2012 12:47 UTC
    4 points
    0
    Parent
    This one?
    
    (Remember: always give your esoteric philosophical conundra good names.)
    - Manfred 23 Oct 2012 20:23 UTC
      0 points
      0
      Parent
      Oh, okay, thanks. So, shallowly speaking, you just needed to multiply the utilities of the strategies “don’t ask and pick cake” and “don’t ask and pick death” by 0.5.
    - Stuart_Armstrong 23 Oct 2012 13:52 UTC
      0 points
      0
      Parent
      Yep! :-)