Steven Byrnes comments on Risks from Learned Optimization: Introduction

Steven Byrnes 1 Jun 2019 13:01 UTC
LW: 11 AF: 5
0
AF
This paper replaces a normal feedforward image classifier with a mesa-optimizing one (build generative models of different possibilities and pick the one that best matches the data). The result was better and far more human-like than a traditional image classifier, e.g. the same examples are ambiguous to the model that are ambiguous to humans and vice-versa. I also understand that the human brain is very big into generative modeling of everything. So I expect that ML systems of the future will approach 100% mesa-optimizers, while non-optimizing feedforward NN’s will become rare. This post is a good framework and I’m looking forward to follow-ups!
- Rohin Shah 2 Jun 2019 22:24 UTC
  LW: 17 AF: 7
  0
  AF Parent
  I would not call that mesa-optimization and would not take it as evidence that mesa-optimization is the “default” for powerful ML systems. That paper has a model with subagents where each subagent does optimization. Ways in which this is a different thing:
  - Given an input, a mesa-optimizer would only run on that input once; in the case of this model there are 10 different optimizations happening in order to classify each digit.
  - The base objective is “correctly map an image of a digit to its label”; the objective of the dth optimizer in the model is “Evidence Lower Bound (ELBO) on the log likelihood of the image as evaluated by a generative model for the digit d”. The model optimizers’ objectives are not of the right type signature and don’t agree with the base objective on the training distribution, as would be the case with a mesa optimizer.
  Note that I do think mesa-optimization will be common; I just don’t think that that paper is evidence for the claim.