jessicata comments on Heroin model: AI “manipulates” “unmanipulatable” reward

jessicata 27 Sep 2016 19:00 UTC
0 points
0
AF
1. Note that IRL is invariant to translating a possible utility function by a constant. So this kind of normalization doesn’t have to be baked into the algorithm.
2. This is true.
3. The most natural normalization procedure is to look at how the human is trying or not trying to affect the event X (as I said in the second part of my comment). If the human never tries to affect X either way, then the AI will normalize the utility functions so that the AI has no incentive to affect X either.