Yeah, agreed with all of that, thanks for the comment. You could definitely try to figure out each of these things individually, eg. learning constraints that can be used with Constrained Policy Optimization is along the “what not to do” axis, and a lot of the multiagent RL work is looking at how we can get some norms to show up with decentralized training. But I feel a lot more optimistic about research that is trying to do all three things at once, because I think the three aspects do interact with each other. At least, the first two feel very tightly linked, though they probably can be separated from the multiagent setting.
Yeah, agreed with all of that, thanks for the comment. You could definitely try to figure out each of these things individually, eg. learning constraints that can be used with Constrained Policy Optimization is along the “what not to do” axis, and a lot of the multiagent RL work is looking at how we can get some norms to show up with decentralized training. But I feel a lot more optimistic about research that is trying to do all three things at once, because I think the three aspects do interact with each other. At least, the first two feel very tightly linked, though they probably can be separated from the multiagent setting.