I’m not sure if I agree that this idea with the dashed lines, of being unable to transition “directly,” is coherent or not. A more plausible structure seems to me like a transitive relation for the solid arrows. If A->B and B->C, then there exists an A->C.
Again, what does it mean to be unable to transition “directly?” You’ve explicitly said we’re ignoring path depencies and time, so if an agent can go from A to B, and then from B to C, I claim that this means there should be a solid arrow from A to C.
Of course, in real life, sometimes you have to sacrifice in the short term to reach a more preferred long term state. But by the framework we set up, this needs to be “brought into the diagram” (to use your phrasing.)
You’ve explicitly said we’re ignoring path depencies and time
I wasn’t very clear, but I meant this in a less restrictive sense than what you’re imagining.
I meant only that if you know the diagram, and you know the current state, you’re fully set up to reason what the agent ought to do (according to its preferences) in its next action.
I’m trying to rule out cases where the optimal action from a state A depends on some extra info beyond the simple fact that we’re in A, such as the trajectory we took to get there, or some number like “money” that hasn’t been included in the states yet is still supposed to follow the agent around somehow.
But I still allow that the agent may be doing time discounting, which would make A->B->C less desirable than A->C.
The setup is meant to be fairly similar to an MDP, although it’s deterministic (mainly for presentational simplicity), and we are given pairwise preferences rather than a reward function.
I’m not sure if I agree that this idea with the dashed lines, of being unable to transition “directly,” is coherent or not. A more plausible structure seems to me like a transitive relation for the solid arrows. If A->B and B->C, then there exists an A->C.
Again, what does it mean to be unable to transition “directly?” You’ve explicitly said we’re ignoring path depencies and time, so if an agent can go from A to B, and then from B to C, I claim that this means there should be a solid arrow from A to C.
Of course, in real life, sometimes you have to sacrifice in the short term to reach a more preferred long term state. But by the framework we set up, this needs to be “brought into the diagram” (to use your phrasing.)
I wasn’t very clear, but I meant this in a less restrictive sense than what you’re imagining.
I meant only that if you know the diagram, and you know the current state, you’re fully set up to reason what the agent ought to do (according to its preferences) in its next action.
I’m trying to rule out cases where the optimal action from a state A depends on some extra info beyond the simple fact that we’re in A, such as the trajectory we took to get there, or some number like “money” that hasn’t been included in the states yet is still supposed to follow the agent around somehow.
But I still allow that the agent may be doing time discounting, which would make A->B->C less desirable than A->C.
The setup is meant to be fairly similar to an MDP, although it’s deterministic (mainly for presentational simplicity), and we are given pairwise preferences rather than a reward function.