But the perturbation of “change the environment, and then see what the new optimal policy is” is a rather unnatural one to think about; most ML people would more naturally think about perturbing an agent’s inputs, or its state, and seeing whether it still behaved instrumentally.
Ah. To clarify, I was referring to holding an environment fixed, and then considering whether, at a given state, an action has a high probability of being optimal across reward functions. I think it makes to call those actions ‘robustly instrumental.’
Ah. To clarify, I was referring to holding an environment fixed, and then considering whether, at a given state, an action has a high probability of being optimal across reward functions. I think it makes to call those actions ‘robustly instrumental.’