Rohin Shah comments on [AN #156]: The scaling hypothesis: a plan for building AGI

Rohin Shah 22 Jul 2021 14:35 UTC
LW: 4 AF: 4
AF
The fourth bullet point claims that GPT-N will go on filling in missing words rather than doing a treacherous turn.
?? I said nothing about a treacherous turn? And where did I say it would go on filling in missing words?
EDIT: Ah, you mean the fourth bullet point in ESRogs response. I was thinking of that as one example of how such reasoning could go wrong, as opposed to the only case. So in that case the model_1 predicts a treacherous turn confidently, but this is the wrong epistemic state to be in because it is also plausible that it just “fills in words” instead.
Seems to me the conclusion of this argument is that “In general it’s not true that the AI is trying to achieve its training objective.”
Isn’t that effectively what I said? (I was trying to be more precise since “achieve its training objective” is ambiguous, but given what I understand you to mean by that phrase, I think it’s what I said?)
we have no idea what it’ll do; treacherous turn is a real possibility because that’s what’ll happen for most goals it could have, and it may have a goal for all we know.
This seems reasonable to me (and seems compatible with what I said)
- Daniel Kokotajlo 23 Jul 2021 6:18 UTC
  LW: 4 AF: 4
  AF Parent
  OK cool, sorry for the confusion. Yeah I think ESRogs interpretation of you was making a bit stronger claim than you actually were.