First problem with this argument: there are no coherence theories saying that an agent needs to maintain the same utility function over time.
This seems pretty false to me. If you can predict in advance that some future you will be optimizing for something else, you could trade with future “you” and merge utility functions, which seems strictly better than not. (Side note: I’m pretty annoyed with all the use of “there’s no coherence theorem for X” in this post.)
As a separate note, the “further out” your goal is and the more that your actions are for instrumental value, the more it should look like world 1 in which agents are valuing abstract properties of world states, and the less we should observe preferences over trajectories to reach said states.
(This is a reason in my mind to prefer the approval-directed-agent frame, in which humans get to inject preferences that are more about trajectories.)
I agree that this problem is not a particularly important one, and explicitly discard it a few sentences later. I hadn’t considered your objection though, and will need to think more about it.
(Side note: I’m pretty annoyed with all the use of “there’s no coherence theorem for X” in this post.)
Mind explaining why? Is this more a stylistic preference, or do you think most of them are wrong/irrelevant?
the “further out” your goal is and the more that your actions are for instrumental value, the more it should look like world 1 in which agents are valuing abstract properties of world states, and the less we should observe preferences over trajectories to reach said states.
Also true if you make world states temporally extended.
This seems pretty false to me. If you can predict in advance that some future you will be optimizing for something else, you could trade with future “you” and merge utility functions, which seems strictly better than not. (Side note: I’m pretty annoyed with all the use of “there’s no coherence theorem for X” in this post.)
As a separate note, the “further out” your goal is and the more that your actions are for instrumental value, the more it should look like world 1 in which agents are valuing abstract properties of world states, and the less we should observe preferences over trajectories to reach said states.
(This is a reason in my mind to prefer the approval-directed-agent frame, in which humans get to inject preferences that are more about trajectories.)
I agree that this problem is not a particularly important one, and explicitly discard it a few sentences later. I hadn’t considered your objection though, and will need to think more about it.
Mind explaining why? Is this more a stylistic preference, or do you think most of them are wrong/irrelevant?
Also true if you make world states temporally extended.