Algon comments on You can still fetch the coffee today if you’re dead tomorrow

Algon 9 Dec 2022 17:35 UTC
LW: 3 AF: 3
−4
AF
Isn’t this the same as the “seamless transition for reward maximizers” technique described in section 5.1 of Stuart and Xavier’s 2017 paper on utility indifference methods? It is a good idea, of course, and if you independantly invented it, kudos, but it seems like something that already exists.
- davidad 9 Dec 2022 17:48 UTC
  LW: 4 AF: 3
  2
  AF Parent
  I did explicitly disclaim against novelty, and I did invent this independently; the paper you linked is closely related, and I would like to upvote it as I think those results should also be better known, but I think the problem I solve in this post is different (and technically easier!) than the problems solved in that paper, including in section 5. The problem solved there asks for the optimal agent to act as if it’s an infinite-horizon optimal agent for $R_{1}$ (including whatever power-seeking would be instrumental for such an agent!) until the time bound causes it to switch into acting like the optimal agent for $R_{2}$ (and for all that to be reflectively stable). Here, I am not asking for the optimal agent to behave as if it has a longer time horizon than it really does.