avturchin comments on Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

avturchin 11 Jul 2018 16:21 UTC
1 point
0
There is one more fact that is ignored in IRL: that humans often have contradicting values. For example, if I want a cake very much, but also have a strong inclination for dieting, I will do nothing. So I have two values, which exactly compensate each other and have zero effects on behaviour. Observing only behaviour will not give a clue about them. More complex examples are possible, where contradicting values create inconsistent behaviour, and it is very typical for biological humans.
- Gordon Seidoh Worley 16 Jul 2018 3:49 UTC
  3 points
  0
  Parent
  I’m not sure IRL actually ignores this, although in such a case the value learning agent may never converge on a consistent policy.