johnswentworth comments on You can still fetch the coffee today if you’re dead tomorrow

johnswentworth 9 Dec 2022 17:02 UTC
LW: 25 AF: 12
8
AF
I only skimmed the post, so apologies if you addressed this problem and I missed it.
Problem: even if the AI’s utility function is time-bounded, there may still be other agents in the environment whose utility functions are not time-bounded, and those agents will be willing to trade short-term resources/assistance for long-term resources/assistance. So, for instance, the 10-minute laundry-folding robot might still be incentivized to create a child AI which persists for a long time and seizes lots of resources, in order to trade those future resources to some other agent who can help fold the laundry in the next 10 minutes.
- davidad 9 Dec 2022 17:56 UTC
  LW: 15 AF: 9
  0
  AF Parent
  That’s true! Thanks for pointing this out; I added a subsection about it to the post. There are probably also a bunch of other cases I haven’t thought of that provide stories for how the environment directly rewards actions that go against the spirit of the shutdown criterion (besides imitation and this one, which I might call “trade”). This construction does nothing to counteract such incentives. Rather, it just avoids the way that being an infinite-horizon RL agent systematically creates new ones.
  - particlemania 25 Dec 2022 21:20 UTC
    LW: 4 AF: 3
    0
    AF Parent
    As an addendum, it seems to me that you may not necessarily need a ‘long-term planner’ (or ‘time-unbounded agent’) in the environment. A similar outcome may also be attainable if the environment contains a tiling of time-bound agents who can all trade across each other in ways such that the overall trade network implements long term power seeking.