And in fact, definition 1 turns out to have further problems. For example: I haven’t yet defined how a coherent agent is meant to choose between equally good options. One natural approach is to simply allow it to make any choice in those situations—it can hardly be considered irrational for doing so, since by assumption whatever it chooses is just as good as any other option. However, in that case any behaviour is consistent with the indifferent preference function (which rates all outcomes as equal). So even under definition 1, any sequence of actions is coherent. Now, I don’t think it’s very realistic that superintelligent AGIs will actually be indifferent about the effects of most of their actions, so perhaps we can just rule out preferences which feature indifference too often. But note that this adds an undesirable element of subjectivity to our definition.
For what it’s worth, under any continuous distribution over reward functions, only a measure zero subset of reward functions has more than one optimal trajectory from any state. So, it’s a little less subjective to rule out indifference (assume continuity and ignore measure zero events), but it still subjective and doesn’t deal with the other problems with defn 1.
For what it’s worth, under any continuous distribution over reward functions, only a measure zero subset of reward functions has more than one optimal trajectory from any state. So, it’s a little less subjective to rule out indifference (assume continuity and ignore measure zero events), but it still subjective and doesn’t deal with the other problems with defn 1.