FactorialCode comments on “Designing agent incentives to avoid reward tampering”, DeepMind

FactorialCode 15 Aug 2019 2:43 UTC
LW: 3 AF: 2
0
AF
I think this is a good sign, this paper goes over many of the ideas that the RatSphere has discussed for years, and Deepmind is giving those ideas publicity. It also brings up preliminary solutions, of which, “Model Based Rewards” seems to go farthest in the right direction.(Although even the paper admits the idea’s been around since 2011)

However, the paper is still phrasing things in terms of additive reward functions, which don’t really naturally capture many kinds of preferences (such as those over possible worlds). I also feel that the causal influence diagrams, when unrolled for multiple time steps, needlessly complicate the issues being discussed. Most interesting phenomena in decision theory can be captured by simple 1 or 2 step games or decision trees. I don’t see the need to phrase things as multi-timestep systems. The same goes for presenting the objectives in terms of grid worlds.

Overall, the authors seem to still be heavily influenced by the RL paradigm. It’s a good start, we’ll see if the rest of the AI community notices.
- tom4everitt 19 Aug 2019 16:29 UTC
  LW: 7 AF: 4
  0
  AF Parent
  Thanks for the Dewey reference, we’ll add it.