Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.
Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.