Sorry for the format here, and I still try to figure out how to use markdown in the comment.

I find difficulty understanding inferences about parameters $ \alpha,\beta,\gamma $ in the “Example:regret” part.

Take the fully rational planner `p`

for example.

Since the human will say `h`

following `s`

, the different between reward functions for `h`

and `-h`

is non-negative, which implies that: $ (\beta R(h)+\gamma R(h|s)) - (\beta R(\sim h)+\gamma R(\sim h|s)) \geq 0 $

Then it is concluded that $ \beta R(h-\sim h)+\gamma R(h-\sim h|s)\geq0$

Similarly, from the human will say $ \sim h$ following `i`

, we have $ \beta R(h-\sim h)+\delta R(h-\sim h|i)\leq0$

It seems that more information about the reward function is need in order to arrive at the final model with $ (p,R(\alpha,\beta,\gamma,\delta)|\gamma\geq-\beta\geq\delta) $

It seems on the MacOS Catalina Version 10.15, Safari Version 13.0.2, the Ctrl-Cmd-4 does not work either. I have tried every possible combination (Ctrl,Cmd,4,M) on both Safari and Chrome but none of them works. Do you have any ideas?