Let R be a possible human reward function, and R the set of such rewards.
This made me think that R was the set of all possible values the reward functions could take or something like that, but it’s actually the set of reward functions.
Secondly, writing p(1),p(2) is somewhat confusing because the p(⋅) notation makes one think that p is applied to something (since it’s a function), but in fact it’s just to count them. Writing p1,p2 would avoid the issue.
I don’t find it surprising that a simplicity prior doesn’t work since human rationality doesn’t seem to be particularly simple. I do have the intuition that the problem is extremely hard.
Two points about understandability:
This made me think that R was the set of all possible values the reward functions could take or something like that, but it’s actually the set of reward functions.
Secondly, writing p(1),p(2) is somewhat confusing because the p(⋅) notation makes one think that p is applied to something (since it’s a function), but in fact it’s just to count them. Writing p1,p2 would avoid the issue.
I don’t find it surprising that a simplicity prior doesn’t work since human rationality doesn’t seem to be particularly simple. I do have the intuition that the problem is extremely hard.