Vika comments on Risks from Learned Optimization: Introduction

Vika 3 Jul 2019 13:55 UTC
LW: 10 AF: 7
AF
I’m confused about the difference between a mesa-optimizer and an emergent subagent. A “particular type of algorithm that the base optimizer might find to solve its task” or a “neural network that is implementing some optimization process” inside the base optimizer seem like emergent subagents to me. What is your definition of an emergent subagent?
- evhub 3 Jul 2019 18:28 UTC
  LW: 9 AF: 5
  AF Parent
  I think my concern with describing mesa-optimizers as emergent subagents is that they’re not really “sub” in a very meaningful sense, since we’re thinking of the mesa-optimizer as the entire trained model, not some portion of it. One could describe a mesa-optimizer as a subagent in the sense that it is “sub” to gradient descent, but I don’t think that’s the right relationship—it’s not like the mesa-optimizer is some subcomponent of gradient descent; it’s just the trained model produced by it.
  
  The reason we opted for “mesa” is that I think it reflects more of the right relationship between the base optimizer and the mesa-optimizer, wherein the base optimizer is “meta” to the mesa-optimizer rather than the mesa-optimizer being “sub” to the base optimizer.
  
  Furthermore, in my experience, when many people encounter “emergent subagents” they think of some portion of the model turning into an agent and (correctly) infer that something like that seems very unlikely, as it’s unclear why such a thing would actually be advantageous for getting a model selected by something like gradient descent (unlike mesa-optimization, which I think has a very clear story for why it would be selected for). Thus, we want to be very clear that something like that is not the concern being presented in the paper.
  What links here?
  - evhub's comment on Utility ≠ Reward by Vlad Mikulik (5 Sep 2019 21:04 UTC; 20 points)
  - Jan Kulveit 3 Jul 2019 20:44 UTC
    LW: 5 AF: 4
    AF Parent
    I don’t see why portion of a system turning into an agent would be “very unlikely”. In a different perspective, if the system lives in something like an evolutionary landscape, there can be various basins of attraction which lead to sub-agent emergence, not just mesa-optimisation.