Richard_Ngo comments on Distinguishing claims about training vs deployment

Richard_Ngo 25 Feb 2021 17:09 UTC
LW: 4 AF: 2
0
AF
I think ‘robust instrumentality’ is basically correct for optimal actions, because there’s no question of ‘emergence’: optimal actions just are.
If I were to put my objection another way: I usually interpret “robust” to mean something like “stable under perturbations”. But the perturbation of “change the environment, and then see what the new optimal policy is” is a rather unnatural one to think about; most ML people would more naturally think about perturbing an agent’s inputs, or its state, and seeing whether it still behaved instrumentally.
A more accurate description might be something like “ubiquitous instrumentality”? But this isn’t a very aesthetically pleasing name.
- TurnTrout 25 Feb 2021 17:26 UTC
  LW: 4 AF: 3
  0
  AF Parent
  But the perturbation of “change the environment, and then see what the new optimal policy is” is a rather unnatural one to think about; most ML people would more naturally think about perturbing an agent’s inputs, or its state, and seeing whether it still behaved instrumentally.
  Ah. To clarify, I was referring to holding an environment fixed, and then considering whether, at a given state, an action has a high probability of being optimal across reward functions. I think it makes to call those actions ‘robustly instrumental.’
- TurnTrout 25 Feb 2021 17:28 UTC
  LW: 2 AF: 2
  0
  AF Parent
  A more accurate description might be something like “ubiquitous instrumentality”? But this isn’t a very aesthetically pleasing name.
  I’d considered ‘attractive instrumentality’ a few days ago, to convey the idea that certain kinds of subgoals are attractor points during plan formulation, but the usual reading of ‘attractive’ isn’t ‘having attractor-like properties.’