One thing we know about these models is that they’re good at interpolating within their training data, and that they have seen enormous amounts of training data. But they’re weak outside those large training sets. They have a very different set of strengths and weaknesses than humans.
And yet… I’m not 100% convinced that this matters. If these models have seen a thousand instances of self-reflection (or mirror test awareness, or whatever), and if they can use those examples to generalize to other forms of self-awareness, then might that still give them very rudimentary ability to pass the mirror test?
I’m not sure that I’m explaining this well—the key question here is “does generalizing over enough examples of passing the ‘mirror test’ actually teach the models some rudimentary (unconscious) self-awareness?” Or maybe, “Will the model fake until it makes it?” I could not confidently answer either way.
Yeah, the precise ability I’m trying to point to here is tricky. Almost any human (barring certain forms of senility, severe disability, etc) can do some version of what I’m talking about. But as in the restaurant example, not every human could succeed at every possible example.
I was trying to better describe the abilities that I thought GPT-4 was lacking, using very simple examples. And it started looking way too much like a benchmark suite that people could target.
Suffice to say, I don’t think GPT-4 is an AGI. But I strongly suspect we’re only a couple of breakthroughs away. And if anyone builds an AGI, I am not optimistic we will remain in control of our futures.