Your actions will all have to act like a phased array pushing the variables you care about in some direction.
The nice thing is that this should work even if you are a policy selected by a decision making algorithm, but you are not yourself a decision making algorithm anymore. There is no preference in any of the possible runs of the policy at that point, you don’t care about anything now, you only know what you must do here, and not elsewhere. But if all possible runs of the policy are considered altogether (in the updateless sense of maps from epistemic situations to action and future policy), the preference is there, in the shape of the whole thing across all epistemic counterfactuals. (Basically you reassemble a function from pairs (from, to) of things it maps, found in individual situations.)
I guess the at-a-distance part could make use of composition of an agent with some of its outer shells into a behavior that forgets internal interactions (within the agent, and between the agent and its proximate environment). The resulting “large agent” will still have basically the same preference, with respect to distant targets in environment, without a need to look inside the small agent’s head, if the large agent’s external actions in a sufficient range of epistemic situations can be modeled. (These large agents exist in each inidividual possible situation, they are larger than the small agent within the situation, and they can be compared with other variants of the large agent from different possible situations.)
Not clear what to do with dependence on the epistemic situation of the small agent. It wants to reduce to dependence on a situation in terms of the large agent, but that doesn’t seem to work. Possibly this needs something like the telephone theorem, with any relevant-in-some-sense dependence of behavior (of the large agent) on something becoming dependence of behavior on natural external observations (of the large agent) and not on internal noise (or epistemic state of the small agent).
The nice thing is that this should work even if you are a policy selected by a decision making algorithm, but you are not yourself a decision making algorithm anymore. There is no preference in any of the possible runs of the policy at that point, you don’t care about anything now, you only know what you must do here, and not elsewhere. But if all possible runs of the policy are considered altogether (in the updateless sense of maps from epistemic situations to action and future policy), the preference is there, in the shape of the whole thing across all epistemic counterfactuals. (Basically you reassemble a function from pairs (from, to) of things it maps, found in individual situations.)
I guess the at-a-distance part could make use of composition of an agent with some of its outer shells into a behavior that forgets internal interactions (within the agent, and between the agent and its proximate environment). The resulting “large agent” will still have basically the same preference, with respect to distant targets in environment, without a need to look inside the small agent’s head, if the large agent’s external actions in a sufficient range of epistemic situations can be modeled. (These large agents exist in each inidividual possible situation, they are larger than the small agent within the situation, and they can be compared with other variants of the large agent from different possible situations.)
Not clear what to do with dependence on the epistemic situation of the small agent. It wants to reduce to dependence on a situation in terms of the large agent, but that doesn’t seem to work. Possibly this needs something like the telephone theorem, with any relevant-in-some-sense dependence of behavior (of the large agent) on something becoming dependence of behavior on natural external observations (of the large agent) and not on internal noise (or epistemic state of the small agent).