David Scott Krueger comments on What specific dangers arise when asking GPT-N to write an Alignment Forum post?

David Scott Krueger 29 Jul 2020 2:46 UTC
LW: 1 AF: 1
0
AF
No, and I don’t think it really matters too much… what’s more important is the “architecture” of the “mesa-optimizer”. It’s doing something that looks like search/planning/optimization/RL.
Roughly speaking, the simplest form of this model of how things works says: “Its so hard to solve NLP without doing agent-y stuff that when we see GPT-N produce a solution to NLP, we should assume that it’s doing agenty stuff on the inside… i.e. what probably happened is it evolved or stumbled upon something agenty, and then that agenty thing realized the situation it was in and started plotting a treacherous turn”.
- David Scott Krueger 29 Jul 2020 3:26 UTC
  LW: 1 AF: 1
  0
  AF Parent
  In other words, there is a fully general argument for learning algorithms producing mesa-optimization to the extent that they use relatively weak learning algorithms on relatively hard tasks.
  It’s very unclear ATM how much weight to give this argument in general, or in specific contexts.
  But I don’t think it’s particularly sensitive to the choice of task/learning algorithm.