TekhneMakre comments on You can still fetch the coffee today if you’re dead tomorrow

TekhneMakre 9 Dec 2022 18:53 UTC
LW: 4 AF: 2
0
AF
Essentially, the assumption I made explicitly, which is that there exists a policy which achieves shutdown with probability 1.
Oops, I missed that assumption. Yeah, if there’s such a policy, and it doesn’t trade off against fetching the coffee, then it seems like we’re good. See though here, arguing briefly that by Cromwell’s rule, this policy doesn’t exist. https://arbital.com/p/task_goal/
Even with a realistic $ϵ$ probability of shutdown failing, if we don’t try to juice $1 - 1 / C$ so high that it exceeds $1 - ϵ$ , my guess is there would not be enough incentive to justify the cost of building a successor agent just to raise that from $1 - ϵ$ to $1$ .
Hm. So this seems like you’re making an additional, very non-trivial assumption, which is that the AI is constrained by costs comparable to / bigger than the costs to create a successor. If its task has already been very confidently achieved, and it has half a day left, it’s not going to get senioritis, it’s going to pick up whatever scraps of expected utility might be left.

I wonder though if there’s synergy between your proposal and the idea of expected utility satisficing: an EU satisficer with a shutdown clock is maybe anti-incentivized from self-modifying to do unbounded optimization, because unbounded optimization is harder to reliably shut down? IDK.
- davidad 9 Dec 2022 19:29 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Yes, I think there are probably strong synergies with satisficing, perhaps lexicographically minimizing something like energy expenditure once the $E U$ maximum is reached. I will think about this more.