paulfchristiano comments on Agents That Learn From Human Behavior Can’t Learn Human Values That Humans Haven’t Learned Yet

paulfchristiano 12 Jul 2018 1:37 UTC
LW: 10 AF: 5
0
AF
If Alice received further information (from a moral philosopher, maybe), she’d start maximizing a specific one of those utility functions instead.
This is the key fact about Alice’s behavior, which distinguishes it from Bob’s behavior, so the question is whether an AI can learn that fact.
Of course the AI could if it ever observed Alice in a situation where she learned anything about morality.
Or any case that has any mutual information with how Alice would respond to moral facts. (For a sufficiently smart reasoner that includes everything—e.g. watching Alice eat breakfast gives you lots of general information about her brain, which in turn lets you make better predictions about how she would behave in other cases.)
And of course the AI would tend to create situations where Alice learned moral facts, since that’s a very natural response to uncertainty about how she’d respond to moral facts.
So overall it seems like you’d have to restrict the behavior of the IRL agent quite far before this becomes a problem.
What links here?
- Alignment Newsletter #15: 07/16/18 by Rohin Shah (16 Jul 2018 16:10 UTC; 42 points)