Stuart_Armstrong comments on A probabilistic off-switch that the agent is indifferent to

Stuart_Armstrong 25 Sep 2018 21:39 UTC
LW: 5 AF: 3
0
AF
Yep, that’s better. There’s still the risk of subagents being created—when the agent thinks that $f^{- 1} (y) \neq 0$ , almost certainly, but not completely certainly. Then it might create a $u$ -maximising subagent and then self-terminate.

That means that this design, like most indifference designs, is reflectively consistent but not reflectively stable.
What links here?
- A probabilistic off-switch that the agent is indifferent to by Ofer (25 Sep 2018 13:13 UTC; 11 points)
- Ofer's comment on A probabilistic off-switch that the agent is indifferent to by Ofer (26 Sep 2018 11:27 UTC; 1 point)
- Ofer 25 Sep 2018 22:31 UTC
  LW: 5 AF: 3
  0
  AF Parent
  Wow, I agree!
  Let us modify the utility for the case $f^{- 1} (y) = 0$ to:
  $u^{*} (h) = {\begin{matrix} 0 & h contains "self-terminate" action u (h) & otherwise \end{matrix}$
  Meaning: no utility can be gained via subagents if the agent “jumps ship” (i.e. self-terminates to gain utility in case $f^{- 1} (y) \neq 0$ ).
  - Stuart_Armstrong 26 Sep 2018 8:44 UTC
    LW: 3 AF: 2
    0
    AF Parent
    Interesting. I’ll think of whether this works and can be generalised (it doesn’t make it reflectively stable—creating u-maximising subagents is still allowed, and doesn’t directly hurt the agent—but might improve the situation).
    What links here?
    A probabilistic off-switch that the agent is indifferent to by Ofer (25 Sep 2018 13:13 UTC; 11 points)