paulfchristiano comments on Implicit extortion

paulfchristiano 14 Apr 2018 1:01 UTC
6 points
0
That claim is only plausible if you use a very carefully constructed reward function.
- Said Achmiz 14 Apr 2018 1:24 UTC
  2 points
  0
  Parent
  I’m not quite sure how to make sense of this reply, and it feels like there is an implication here that I’m not parsing; could you elaborate? Presumably, the idea is that our reward function is indeed “carefully constructed” by evolution. (Note that I’m trying to extrapolate from memory of past discussions; folks who have actually made the “humans are reinforcement learners” claim should please feel free to jump in here!)
  - paulfchristiano 14 Apr 2018 2:27 UTC
    11 points
    0
    Parent
    If you model a human as an RL agent, then a lot of the work is being done by a very carefully constructed reward function. You can tell since humans do a lot of stuff that an RL agent basically wouldn’t do (like “die for a cause”). You can bake an awful lot into a carefully constructed reward function—for example, you can reward the agent whenever it takes actions that are optimal according to some arbitrary decision theory X—so it’s probably possible to describe a human as an RL agent but it doesn’t seem like a useful description.
    At any rate, once the reward function is doing a lot of the optimization, the arguments in this post don’t really apply. Certainly an RL agent can have a heuristic like vindictiveness if you just change the reward function.
    - Said Achmiz 14 Apr 2018 2:45 UTC
      2 points
      0
      Parent
      That makes sense, thank you.