davidad comments on You can still fetch the coffee today if you’re dead tomorrow

davidad 9 Dec 2022 18:28 UTC
LW: 2 AF: 2
0
AF
To the first point, I think this problem can be avoided with a much simpler assumption than that the shutdown criterion forbids all posthumous influence. Essentially, the assumption I made explicitly, which is that there exists a policy which achieves shutdown with probability 1. (We might need a slightly stronger version of this assumption: it might need to be the case that for any action, there exists an action which has the same external effect but also causes a shutdown with probability 1.) This means that the agent doesn’t need to build itself any insurance policy to guarantee that it shuts down. I think this is not a terribly inaccurate assumption; of course, in reality, there are cosmic rays and a properly embedded and self-aware agent might deduce that none of its future actions are perfectly reliable, even though a model-free RL agent would probably never see any evidence of this (and it wouldn’t be any worse at folding the laundry for it). Even with a realistic $ϵ$ probability of shutdown failing, if we don’t try to juice $1 - 1 / C$ so high that it exceeds $1 - ϵ$ , my guess is there would not be enough incentive to justify the cost of building a successor agent just to raise that from $1 - ϵ$ to $1$ .
- TekhneMakre 9 Dec 2022 18:53 UTC
  LW: 4 AF: 2
  0
  AF Parent
  Essentially, the assumption I made explicitly, which is that there exists a policy which achieves shutdown with probability 1.
  Oops, I missed that assumption. Yeah, if there’s such a policy, and it doesn’t trade off against fetching the coffee, then it seems like we’re good. See though here, arguing briefly that by Cromwell’s rule, this policy doesn’t exist. https://arbital.com/p/task_goal/
  Even with a realistic $ϵ$ probability of shutdown failing, if we don’t try to juice $1 - 1 / C$ so high that it exceeds $1 - ϵ$ , my guess is there would not be enough incentive to justify the cost of building a successor agent just to raise that from $1 - ϵ$ to $1$ .
  Hm. So this seems like you’re making an additional, very non-trivial assumption, which is that the AI is constrained by costs comparable to / bigger than the costs to create a successor. If its task has already been very confidently achieved, and it has half a day left, it’s not going to get senioritis, it’s going to pick up whatever scraps of expected utility might be left.
  
  I wonder though if there’s synergy between your proposal and the idea of expected utility satisficing: an EU satisficer with a shutdown clock is maybe anti-incentivized from self-modifying to do unbounded optimization, because unbounded optimization is harder to reliably shut down? IDK.
  - davidad 9 Dec 2022 19:29 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Yes, I think there are probably strong synergies with satisficing, perhaps lexicographically minimizing something like energy expenditure once the $E U$ maximum is reached. I will think about this more.