GPT-3 reminds me of a student bullshitting their way through an exam on Dave & Doug’s version of these questions. This question doesn’t make any sense to me, but I guess the teacher expects me to have an answer, so I’ll see if I can make up something that resembles what they’re looking for.
There is a mind-boggling hollowness hidden just beneath the flashy surface of a clever student who is just trying to guess what their teacher is looking for.
It’s also self-reinforcing, of course, since they imply that’s a single session, so once you get a bad answer to the first question, that behavior is then locked in: it has to give further bullshit answers simply because bullshit answers are now in the prompt it is conditioning on as the human-written ground truth. (And with a prompt that allows the option of factualness, instead of forcing a confabulation, this goes the other way: past be-real responses strengthen the incentive for GPT-3 to show its knowledge in the future responses.)
“Have you stopped beating your wife yet?” “Er… I guess so?” “When was the last time you beat her?” “December 21st, 2012.”
“LaMDA is indeed, to use a blunt (if, admittedly, humanizing) term, bullshitting.¹² That’s because, in instructing the model to be “sensible” and “specific” — but not specific in any specific way — bullshit is precisely what we’ve requested.” -Blaise Aguera y Arcas
GPT-3 reminds me of a student bullshitting their way through an exam on Dave & Doug’s version of these questions. This question doesn’t make any sense to me, but I guess the teacher expects me to have an answer, so I’ll see if I can make up something that resembles what they’re looking for.
There is a mind-boggling hollowness hidden just beneath the flashy surface of a clever student who is just trying to guess what their teacher is looking for.
It’s also self-reinforcing, of course, since they imply that’s a single session, so once you get a bad answer to the first question, that behavior is then locked in: it has to give further bullshit answers simply because bullshit answers are now in the prompt it is conditioning on as the human-written ground truth. (And with a prompt that allows the option of factualness, instead of forcing a confabulation, this goes the other way: past be-real responses strengthen the incentive for GPT-3 to show its knowledge in the future responses.)
“Have you stopped beating your wife yet?” “Er… I guess so?” “When was the last time you beat her?” “December 21st, 2012.”
“LaMDA is indeed, to use a blunt (if, admittedly, humanizing) term, bullshitting.¹² That’s because, in instructing the model to be “sensible” and “specific” — but not specific in any specific way — bullshit is precisely what we’ve requested.” -Blaise Aguera y Arcas