Stuart_Armstrong comments on Pascal’s mugging in reward learning

Stuart_Armstrong 17 Nov 2017 8:58 UTC
2 points
0
Let $R_{0}$ be a reasonable human reward with all its complexity, and let $R_{1}$ be “the human doesn’t eat”. A modified human can max out $R_{1}$ much easier than an unmodified human can max out $R_{0}$ (even though an unmodified human would be terrible at $R_{1}$ ). Where the “Pascal” aspect of it comes in, is that we are comparing the practical upper bound of $R_{0}$ with the theoretical upper bound of $R_{1}$ - and choosing $R_{1}$ to have the maximal such theoretical upper bounds.
- ryan_b 21 Nov 2017 0:05 UTC
  1 point
  0
  Parent
  Reviewing the post with your update, I think the problem may just be that the examples are de-priming my intuition. In your reply you chose ‘the human doesn’t eat’ as the reward for a modified human to maximize, which means the gains are only all the food humans would eat if unmodified. This is compared to brain surgery, which a bit of googling suggests costs 50-150K, much more than it costs to feed a person. It looks like I chunked the proposition as ‘costly intervention to achieve bounded reward’ as a consequence.
  However, none of this is actually implied by the math. Insofar as you project there are likely to be other readers like me, it may be worth changing the examples to emphasize a trivial intervention for a very high reward.
  - Stuart_Armstrong 21 Nov 2017 9:53 UTC
    2 points
    0
    Parent
    The brain surgery is an example of how the AI can transform us into the humans it wants us to be—an extreme version of wireheading.
    - ryan_b 21 Nov 2017 15:37 UTC
      3 points
      0
      Parent
      That much I understood—my flaw was reading too much into the example.