Steven Byrnes comments on “Designing agent incentives to avoid reward tampering”, DeepMind

Steven Byrnes 14 Aug 2019 22:08 UTC
LW: 19 AF: 9
AF
Yeah, unless I’m missing something, this is the solution to the “easy problem of wireheading” as discussed at Abram Demski, Stable Pointers to Value II: Environmental Goals .

Still, I say kudos to the authors for making progress on exactly how to put that principle into practice.
What links here?
- Steven Byrnes's comment on Wei Dai’s Shortform by Wei Dai (1 Mar 2024 21:29 UTC; 15 points)
- tom4everitt 19 Aug 2019 16:28 UTC
  LW: 12 AF: 5
  AF Parent
  Hey Steve,
  Thanks for linking to Abram’s excellent blog post.
  We should have pointed this out in the paper, but there is a simple correspondence between Abram’s terminology and ours:
  Easy wireheading problem = reward function tampering
  Hard wireheading problem = feedback tampering.
  Our current-RF optimization corresponds to Abram’s observation-utility agent.
  We also discuss the RF-input tampering problem and solutions (sometimes called the delusion box problem), which I don’t fit into Abram’s distinction.