tom4everitt comments on “Designing agent incentives to avoid reward tampering”, DeepMind

tom4everitt 19 Aug 2019 16:28 UTC
LW: 12 AF: 5
0
AF
Hey Steve,
Thanks for linking to Abram’s excellent blog post.
We should have pointed this out in the paper, but there is a simple correspondence between Abram’s terminology and ours:
Easy wireheading problem = reward function tampering
Hard wireheading problem = feedback tampering.
Our current-RF optimization corresponds to Abram’s observation-utility agent.
We also discuss the RF-input tampering problem and solutions (sometimes called the delusion box problem), which I don’t fit into Abram’s distinction.