Stuart_Armstrong comments on Proper value learning through indifference

Stuart_Armstrong 20 Jun 2014 9:43 UTC
1 point
0
Every value loading agent I’ve considered (that pass the naive cake-or-death problem, at least) can be considered equivalent to a UDT agent.

I’m just not sure it’s a useful way of thinking about it, because the properties that we want—“conservation of moral evidence” and “don’t manipulate your own moral changes”—are not natural UDT properties, but dependent on a particular way of conceptualising a value loading agent. For instance, the kid that doesn’t ask whether eating cookies is bad, has a sound formulation as a UDT agent, but this doesn’t seem to capture what we want.

EDIT: this may be relevant http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/
- Wei Dai 21 Jun 2014 0:26 UTC
  4 points
  0
  Parent
  It seems to me that there are natural ways to implement value loading as UDT agents, with the properties you’re looking for. For example, if the agent values eating cookies in universes where its creator wants it to eat cookies, and values not eating cookies in universes where its creator doesn’t want it to eat cookies (glossing over how to define “creator wants” for now), then I don’t see any problems with the agent manipulating its own moral changes or avoiding asking whether eating cookies is bad. So I’m not seeing the motivation for coming up with another decision theory framework here...