Sune comments on You can still fetch the coffee today if you’re dead tomorrow

Sune 9 Dec 2022 20:57 UTC
0 points
−1
A similar objection is that you might accidentially define the utility function and time limit in such a way that the AI assigns positive probability to the hypothesis that it can later create a time machine and go back and improve the utility. Then once the time has passed, it will desparately try to invent a time machine, even if it thinks it is extremely unlikely to succed (this is using Bostrom’s way of thinking. Shard theory would not predict this).
- davidad 9 Dec 2022 21:08 UTC
  1 point
  −1
  Parent
  I disagree, for two reasons:
  1. The $τ_{1} \cdot (_{1} - R_{1} - - -)$ bound on how much there is to gain from creating a time machine and improving past utility is outweighed by the $τ_{1} \cdot (_{1} - R_{1} - - -) \cdot C$ reward from $R_{2}$ for shutting down.
  2. Every RL algorithm I’ve heard of implicitly bakes in an assumption that past utility is unmodifiable. I guess all bets are off with mesa-optimisers, but personally I’d bet against even mesa-optimisers in model-free RL behaving as if past utility is up for grabs.