Yeah, I agree that updateless-CDT needs to somehow label which nodes it controls.
You’re glossing over a second magical part, though:
and then it maximizes a utility function over histories of the causal model by following the utility-maximizing strategy,
How do you calculate the expected utility of following a strategy? How do you condition on following a strategy? That’s the whole point here. You obviously can’t just condition on taking certain values of the nodes you control, since a strategy takes different actions in different worlds; so, regular causal conditioning is out. You can try conditioning on the material cenditionals specifying the strategy, which falls on its face as mentioned.
That’s why I jumped to the idea that UCDT would use the conditioning-on-conditionals approach. It seems like what you want to do, to condition on a strategy, is change the conditional probabilities of actions given their parent nodes.
Also, I agree that conditioning-on-conditionals can work fine if combined with a magical locate-which-nodes-you-control step. Observation-counterfactuals are supposed to be a less magical way of dealing with the problem.
Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.
Yeah, I agree that updateless-CDT needs to somehow label which nodes it controls.
You’re glossing over a second magical part, though:
How do you calculate the expected utility of following a strategy? How do you condition on following a strategy? That’s the whole point here. You obviously can’t just condition on taking certain values of the nodes you control, since a strategy takes different actions in different worlds; so, regular causal conditioning is out. You can try conditioning on the material cenditionals specifying the strategy, which falls on its face as mentioned.
That’s why I jumped to the idea that UCDT would use the conditioning-on-conditionals approach. It seems like what you want to do, to condition on a strategy, is change the conditional probabilities of actions given their parent nodes.
Also, I agree that conditioning-on-conditionals can work fine if combined with a magical locate-which-nodes-you-control step. Observation-counterfactuals are supposed to be a less magical way of dealing with the problem.
Yeah, I agree that observation-counterfactuals are what you’d like the UCDT agent to be thinking of as a strategy—a mapping between information-states and actions.
The reason I used weird language like “state of magically labeled nodes that are parents of the controlled nodes” is just because of how it’s nontrivial to translate the idea of “information available to the agent” into a naturalized causal model. But if that’s what the agent is using to predict the world, I think that’s what things have to get cashed out into.