magfrump comments on The case for aligning narrowly superhuman models

magfrump 14 Mar 2021 21:41 UTC
2 points
AF
Looks like the initial question was here and a result around it was posted here. At a glance I don’t see the comments with counterexamples, and I do see a post with a formal result, which seems like a direct contradiction to what you’re saying, though I’ll look in more detail.
Coming back to the scaling question, I think I agree that multiplicative scaling over the whole model size is obviously wrong. To be more precise, if there’s something like a Q-learning inner optimizer for two tasks, then you need the cross product of the state spaces, so the size of the Q-space could scale close-to-multiplicatively. But the model that condenses the full state space into the Q-space scales additively, and in general I’d expect the model part to be much bigger—like the Q-space has 100 dimensions and the model has 1 billion parameters, so going adding a second model of 1 billion parameters and increasing the Q-space to 10k dimensions is mostly additive in practice, even if it’s also multiplicative in a technical sense.
I’m going to update my probability that “GPT-3 can solve X, Y implies GPT-3 can solve X+Y,” and take a closer look at the comments on the linked posts. This also makes me think that it might make sense to try to find simpler problems, even already-mostly-solved problems like Chess or algebra, and try to use this process to solve them with GPT-2, to build up the architecture and search for possible safety issues in the process.
- abramdemski 14 Mar 2021 23:13 UTC
  LW: 4 AF: 3
  AF Parent
  I do see a post with a formal result, which seems like a direct contradiction to what you’re saying, though I’ll look in more detail.
  If you mean to suggest this post has a positive result, then I think you’re just mis-reading it; the key result is
  The conclusion of this post is the following: if there exists some set of natural tasks for which the fastest way to solve them is to do some sort of machine learning to find a good policy, and there is some task for which that machine learning results in deceptive behavior, then there exists a natural task such that the minimal circuit that solves that task also produces deceptive behavior.
  which says that under some assumptions, there exists a task for which the minimal circuit will engage in deceptive behavior (IE is a malign inner optimizer).
  The comment with a counterexample on the original post is here.
  - magfrump 15 Mar 2021 6:14 UTC
    2 points
    Parent
    I see, I definitely didn’t read that closely enough.