The fourth bullet point claims that GPT-N will go on filling in missing words rather than doing a treacherous turn.
?? I said nothing about a treacherous turn? And where did I say it would go on filling in missing words?
EDIT: Ah, you mean the fourth bullet point in ESRogs response. I was thinking of that as one example of how such reasoning could go wrong, as opposed to the only case. So in that case the model_1 predicts a treacherous turn confidently, but this is the wrong epistemic state to be in because it is also plausible that it just “fills in words” instead.
Seems to me the conclusion of this argument is that “In general it’s not true that the AI is trying to achieve its training objective.”
Isn’t that effectively what I said? (I was trying to be more precise since “achieve its training objective” is ambiguous, but given what I understand you to mean by that phrase, I think it’s what I said?)
we have no idea what it’ll do; treacherous turn is a real possibility because that’s what’ll happen for most goals it could have, and it may have a goal for all we know.
This seems reasonable to me (and seems compatible with what I said)
?? I said nothing about a treacherous turn? And where did I say it would go on filling in missing words?
EDIT: Ah, you mean the fourth bullet point in ESRogs response. I was thinking of that as one example of how such reasoning could go wrong, as opposed to the only case. So in that case the model_1 predicts a treacherous turn confidently, but this is the wrong epistemic state to be in because it is also plausible that it just “fills in words” instead.
Isn’t that effectively what I said? (I was trying to be more precise since “achieve its training objective” is ambiguous, but given what I understand you to mean by that phrase, I think it’s what I said?)
This seems reasonable to me (and seems compatible with what I said)
OK cool, sorry for the confusion. Yeah I think ESRogs interpretation of you was making a bit stronger claim than you actually were.