DanielFilan comments on Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

DanielFilan 11 Jul 2018 3:14 UTC
LW: 4 AF: 3
0
AF
I think that this might not end up being a problem if the value learning agent can communicate with Alice (e.g. in the context of CIRL). If they don’t get any info from moral philosophers, then they should probably maximise something like the expectation of her utility function for the same reason that Alice does. If they do get info, they can just give Alice that info, see what she does, and act accordingly. I think the real problem comes in in the realistic case where Alice isn’t handling moral uncertainty perfectly, so the value learning agent shouldn’t actually maximise the weighted sum of the utility functions she’s uncertain over.