Richard_Ngo comments on Distinguishing claims about training vs deployment

Richard_Ngo 22 Feb 2021 15:34 UTC
LW: 4 AF: 3
0
AF
Can you elaborate? ‘Robust’ seems natural for talking about robustness to perturbation in the initial AI design (different objective functions, to the extent that that matters) and robustness against choice of environment.
The first ambiguity I dislike here is that you could either be describing the emergence of instrumentality as robust, or the trait of instrumentality as robust. It seems like you’re trying to do the former, but because “robust” modifies “instrumentality”, the latter is a more natural interpretation.
For example, if I said “life on earth is very robust”, the natural interpretation is: given that life exists on earth, it’ll be hard to wipe it out. Whereas an emergence-focused interpretation (like yours) would be: life would probably have emerged given a wide range of initial conditions on earth. But I imagine that very few people would interpret my original statement in that way.
The second ambiguity I dislike: even if we interpret “robust instrumentality” as the claim that “the emergence of instrumentality is robust”, this still doesn’t get us what we want. Bostrom’s claim is not just that instrumental reasoning usually emerges; it’s that specific instrumental goals usually emerge. But “instrumentality” is more naturally interpreted as the general tendency to do instrumental reasoning.
On switching costs: Bostrom has been very widely read, so changing one of his core terms will be much harder than changing a niche working handle like “optimisation daemon”, and would probably leave a whole bunch of people confused for quite a while. I do agree the original term is flawed though, and will keep an eye out for potential alternatives—I just don’t think robust instrumentality is clear enough to serve that role.
- TurnTrout 23 Feb 2021 2:00 UTC
  LW: 2 AF: 2
  0
  AF Parent
  The first ambiguity I dislike here is that you could either be describing the emergence of instrumentality as robust, or the trait of instrumentality as robust. It seems like you’re trying to do the former, but because “robust” modifies “instrumentality”, the latter is a more natural interpretation.
  One possibility is that we have to individuate these “instrumental convergence”-adjacent theses using different terminology. I think ‘robust instrumentality’ is basically correct for optimal actions, because there’s no question of ‘emergence’: optimal actions just are.
  However, it doesn’t make sense to say the same for conjectures about how training such-and-such a system tends to induce property Y, for the reasons you mention. In particular, if property Y is not about goal-directed behavior, then it no longer makes sense to talk about ‘instrumentality’ from the system’s perspective. e.g. I’m not sure it makes sense to say ‘edge detectors are robustly instrumental for this network structure on this dataset after X epochs’.
  (These are early thoughts; I wanted to get them out, and may revise them later or add another comment)
  EDIT: In the context of MDPs, however, I prefer to talk in terms of (formal) POWER and of optimality probability, instead of in terms of robust instrumentality. I find ‘robust instrumentality’ to be better as an informal handle, but its formal operationalization seems better for precise thinking.
  - Richard_Ngo 25 Feb 2021 17:09 UTC
    LW: 4 AF: 2
    0
    AF Parent
    I think ‘robust instrumentality’ is basically correct for optimal actions, because there’s no question of ‘emergence’: optimal actions just are.
    If I were to put my objection another way: I usually interpret “robust” to mean something like “stable under perturbations”. But the perturbation of “change the environment, and then see what the new optimal policy is” is a rather unnatural one to think about; most ML people would more naturally think about perturbing an agent’s inputs, or its state, and seeing whether it still behaved instrumentally.
    A more accurate description might be something like “ubiquitous instrumentality”? But this isn’t a very aesthetically pleasing name.
    - TurnTrout 25 Feb 2021 17:26 UTC
      LW: 4 AF: 3
      0
      AF Parent
      But the perturbation of “change the environment, and then see what the new optimal policy is” is a rather unnatural one to think about; most ML people would more naturally think about perturbing an agent’s inputs, or its state, and seeing whether it still behaved instrumentally.
      Ah. To clarify, I was referring to holding an environment fixed, and then considering whether, at a given state, an action has a high probability of being optimal across reward functions. I think it makes to call those actions ‘robustly instrumental.’
    - TurnTrout 25 Feb 2021 17:28 UTC
      LW: 2 AF: 2
      0
      AF Parent
      A more accurate description might be something like “ubiquitous instrumentality”? But this isn’t a very aesthetically pleasing name.
      I’d considered ‘attractive instrumentality’ a few days ago, to convey the idea that certain kinds of subgoals are attractor points during plan formulation, but the usual reading of ‘attractive’ isn’t ‘having attractor-like properties.’