Rohin Shah comments on [AN #62] Are adversarial examples caused by real but imperceptible features?

Rohin Shah 31 Aug 2019 5:33 UTC
LW: 4 AF: 2
0
AF
Is it rather that the model space might not have the capacity to correctly imitate the human?
There are lots of reasons that a robot might be unable to learn the correct policy despite the action space permitting it:
- Not enough model capacity
- Not enough training data
- Training got stuck in a local optimum
- You’ve learned from robot play data, but you’ve never seen anything like the human policy before
etc, etc.
Not all of these are compatible with “and so the robot does the thing that the human does 5% of the time”. But it seems like there can and probably will be factors that are different between the human and the robot (even if the human uses teleoperation), and in that setting imitating factored cognition provides the wrong incentives, while optimizing factored evaluation provides the right incentives.