Somewhat relatedly, “If I previously turned down some option X, I will not choose any option that I strictly disprefer to X” does feel to me like a grafted-on hack of a policy that breaks down in some adversarial edge case.
Maybe it’s airtight, I’m not sure. But if it is, that just feels like coherence with extra steps? Like, sure, you can pursue a strategy of incoherence which requires you to know the entire universe of possible trades you will make and then backchains inductively to make sure you never, ever are exploitable about this.
Or you could make your preferences explicit and be consistent in the first place. In a sense, I think that’s the simple, elegant thing that the weird hack approximates.
If you have coherent preferences, you get the hack for free. I think an agent with coherent preferences performs at least as well with the same assumptions (prescience, backchaining) on the same decision tree, and performs better if you relax one or more of those assumptions.
In practice, it pays to be the sort of entity that attempts to have consistent preferences about things whenever that’s decision-relevant and computationally tractable.
I chiefly advise against work that brings us closer to superintelligence. I aim this advice primarily at those who want to make sure AI goes well. For careers that do other things, and for those who aren’t aiming their careers for impact, this post mostly doesn’t apply. One can argue about secondary effects and such, but in general, mundane utility is a good thing and it’s fine for people to get paid for providing it.