Rafael Harth comments on Humans can be assigned any values whatsoever…

Rafael Harth 1 Sep 2020 17:25 UTC
6 points
0
Two points about understandability:
Let $R$ be a possible human reward function, and R the set of such rewards.
This made me think that R was the set of all possible values the reward functions could take or something like that, but it’s actually the set of reward functions.
Secondly, writing $p (1), p (2)$ is somewhat confusing because the $p (\cdot)$ notation makes one think that $p$ is applied to something (since it’s a function), but in fact it’s just to count them. Writing $p_{1}, p_{2}$ would avoid the issue.
I don’t find it surprising that a simplicity prior doesn’t work since human rationality doesn’t seem to be particularly simple. I do have the intuition that the problem is extremely hard.