AdamGleave comments on Following human norms

AdamGleave 1 Feb 2019 21:00 UTC
LW: 15 AF: 5
0
AF
I feel like there are three facets to “norms” v.s. values, which are bundled together in this post but which could in principle be decoupled. The first is representing what not to do versus what to do. This is reminiscent of the distinction between positive and negative rights, and indeed most societal norms (e.g. human rights) are negative, but not all (e.g. helping an injured person in the street is a positive right). If the goal is to prevent catastrophe, learning the ‘negative’ rights is probably more important, but it seems to me that most techniques developed could learn both kinds of norms.
Second, there is the aspect of norms being an incomplete representation of behaviour: they impose some constraints, but there is not a single “norm-optimal” policy (contrast with explicit reward maximization). This seems like the most salient thing from an AI standpoint, and as you point out this is an underexplored area.
Finally, there is the issue of norms being properties of groups of agents. One perspective on this is that humans are realising their values through constructing norms: e.g. if I want to drive safely, it is good to have a norm to drive on the left or right side of the road, even though I may not care which norm we establish. Learning norms directly therefore seems beneficial to neatly integrate into human society (it would be awkward if e.g. robots drive on the left and humans drive on the right). If we think the process of going from values to norms is both difficult and important for multi-agent cooperation, learning norms also lets us sidestep a potentially thorny problem.
What links here?
- [AN #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety by Rohin Shah (6 Nov 2019 18:10 UTC; 26 points)
- Rohin Shah 2 Feb 2019 18:52 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Yeah, agreed with all of that, thanks for the comment. You could definitely try to figure out each of these things individually, eg. learning constraints that can be used with Constrained Policy Optimization is along the “what not to do” axis, and a lot of the multiagent RL work is looking at how we can get some norms to show up with decentralized training. But I feel a lot more optimistic about research that is trying to do all three things at once, because I think the three aspects do interact with each other. At least, the first two feel very tightly linked, though they probably can be separated from the multiagent setting.