Yeah, unless I’m missing something, this is the solution to the “easy problem of wireheading” as discussed at Abram Demski, Stable Pointers to Value II: Environmental Goals .
Still, I say kudos to the authors for making progress on exactly how to put that principle into practice.
Hey Steve,
Thanks for linking to Abram’s excellent blog post.
We should have pointed this out in the paper, but there is a simple correspondence between Abram’s terminology and ours:
Easy wireheading problem = reward function tampering
Hard wireheading problem = feedback tampering.
Our current-RF optimization corresponds to Abram’s observation-utility agent.
We also discuss the RF-input tampering problem and solutions (sometimes called the delusion box problem), which I don’t fit into Abram’s distinction.
Yeah, unless I’m missing something, this is the solution to the “easy problem of wireheading” as discussed at Abram Demski, Stable Pointers to Value II: Environmental Goals .
Still, I say kudos to the authors for making progress on exactly how to put that principle into practice.
Hey Steve,
Thanks for linking to Abram’s excellent blog post.
We should have pointed this out in the paper, but there is a simple correspondence between Abram’s terminology and ours:
Easy wireheading problem = reward function tampering
Hard wireheading problem = feedback tampering.
Our current-RF optimization corresponds to Abram’s observation-utility agent.
We also discuss the RF-input tampering problem and solutions (sometimes called the delusion box problem), which I don’t fit into Abram’s distinction.