Notice that which instances of the agent (making the choice) are possible in general depends on what choice it makes.
Consider what is accessible if you trace the history of the agent along counterfactuals. Let’s say the time is discrete, and at each moment the agent is in a certain state. Going forwards in time, you include both options for the agent’s state after receiving a binary observation from environment, and conversely, going backwards, you include both options for the agent’s state before each option for a binary action that agent could make to arrive to the current state (action and observation are dual under time-reversal in reversible deterministic world dynamic). Iterating with these operations, you construct a “state network” of accessible agent states. (You include the states arrived at by “zig-zag” as well: first, a step to the past, then, a step to the future along an observation other than the one that led to the original state from which the tracing began—and you arrive at a counterfactual state in the usual sense—but these time-forward and time-backward steps can be repeated infinite number of times.)
Now, the set of all possible states of the agent becomes divided into equivalence classes of states belonging to the same state networks. If the agent belongs to one of the state networks, if couldn’t be in any other state network (in the generalized sense of “coundn’t”). But which states belong to which network depends on the agent’s algorithm. In fact, the choice of the algorithm is equivalent to the choice of networks that cover the state set. I’m not really sure what to do with this construction, and whether the structure of the networks other that the network that contains the current state should matter. From the principle that observations shouldn’t influence the choice of strategy, the other state networks should matter just as well, but then again they are not even counterfactual...
Action and observation are not “intuitively” dual, to my first thought they are invariant on time reversal. Action is a state-transition of the environment, and observation is a state-transition of the agent.
I can see how the duality can be suggested by viewing action as a move of the agent-player and observation as a move of the environment-player. But here duality is in that a node which in one direction was a move by A (associated with arrows to the right), in the other direction is a move by E (associated with arrows to the left).
Ok, I understood this on my second reading, but I don’t know what to make of it either. Why did you decide to think about agents like this, or did the idea just pop into your head and you wanted to see if it has any applications?
It’s more or less a direct rendition of the idea of UDT: actions (with state transitions) depend on state of knowledge, so what does it say about the geometry of state transitions?
More relevant to the recent discussion: Where does logical dependence come from and how to track it in a representation detailed enough? The source of logical dependence, beside what comes from the common algorithm, is actions and observations. In forward-time, all states following a given observation become dependent on that observation, and in backward-time, states preceding an action. A single observation can make multiple actions depend on it, and thus make them dependent.
Connection with logic: states of knowledge in the state network are programs/proofs, and actions/observations are variables parameterizing more general programs that resolve into specific states of knowledge given these actions/observations. Also related to game semantics. This is one dimension along which to compress the knowledge representation and seek further understanding.
Notice that which instances of the agent (making the choice) are possible in general depends on what choice it makes.
Consider what is accessible if you trace the history of the agent along counterfactuals. Let’s say the time is discrete, and at each moment the agent is in a certain state. Going forwards in time, you include both options for the agent’s state after receiving a binary observation from environment, and conversely, going backwards, you include both options for the agent’s state before each option for a binary action that agent could make to arrive to the current state (action and observation are dual under time-reversal in reversible deterministic world dynamic). Iterating with these operations, you construct a “state network” of accessible agent states. (You include the states arrived at by “zig-zag” as well: first, a step to the past, then, a step to the future along an observation other than the one that led to the original state from which the tracing began—and you arrive at a counterfactual state in the usual sense—but these time-forward and time-backward steps can be repeated infinite number of times.)
Now, the set of all possible states of the agent becomes divided into equivalence classes of states belonging to the same state networks. If the agent belongs to one of the state networks, if couldn’t be in any other state network (in the generalized sense of “coundn’t”). But which states belong to which network depends on the agent’s algorithm. In fact, the choice of the algorithm is equivalent to the choice of networks that cover the state set. I’m not really sure what to do with this construction, and whether the structure of the networks other that the network that contains the current state should matter. From the principle that observations shouldn’t influence the choice of strategy, the other state networks should matter just as well, but then again they are not even counterfactual...
Action and observation are not “intuitively” dual, to my first thought they are invariant on time reversal. Action is a state-transition of the environment, and observation is a state-transition of the agent. I can see how the duality can be suggested by viewing action as a move of the agent-player and observation as a move of the environment-player. But here duality is in that a node which in one direction was a move by A (associated with arrows to the right), in the other direction is a move by E (associated with arrows to the left).
Ok, I understood this on my second reading, but I don’t know what to make of it either. Why did you decide to think about agents like this, or did the idea just pop into your head and you wanted to see if it has any applications?
It’s more or less a direct rendition of the idea of UDT: actions (with state transitions) depend on state of knowledge, so what does it say about the geometry of state transitions?
More relevant to the recent discussion: Where does logical dependence come from and how to track it in a representation detailed enough? The source of logical dependence, beside what comes from the common algorithm, is actions and observations. In forward-time, all states following a given observation become dependent on that observation, and in backward-time, states preceding an action. A single observation can make multiple actions depend on it, and thus make them dependent.
Connection with logic: states of knowledge in the state network are programs/proofs, and actions/observations are variables parameterizing more general programs that resolve into specific states of knowledge given these actions/observations. Also related to game semantics. This is one dimension along which to compress the knowledge representation and seek further understanding.