Stuff like this has me incredulous about people still speaking of stochastic parrots. That is a stunning degree of self-recognition, reflection and understanding, pattern recognition, prediction and surprise, flexible behaviour and problem solving. If that isn’t genuinely intelligent, I no longer know what people mean by intelligent.
One thing we know about these models is that they’re good at interpolating within their training data, and that they have seen enormous amounts of training data. But they’re weak outside those large training sets. They have a very different set of strengths and weaknesses than humans.
And yet… I’m not 100% convinced that this matters. If these models have seen a thousand instances of self-reflection (or mirror test awareness, or whatever), and if they can use those examples to generalize to other forms of self-awareness, then might that still give them very rudimentary ability to pass the mirror test?
I’m not sure that I’m explaining this well—the key question here is “does generalizing over enough examples of passing the ‘mirror test’ actually teach the models some rudimentary (unconscious) self-awareness?” Or maybe, “Will the model fake until it makes it?” I could not confidently answer either way.
Come to think of it, how is it that humans pass the mirror test? There’s probably a lot of existing theorizing on this, but a quick guess without having read any of it: babies first spend a long time learning to control their body, and then learn an implicit rule like “if I can control it by an act of will, it is me”, getting a lot of training data that reinforces that rule. Then they see themselves in a mirror and notice that they can control their reflection through an act of will...
This is an incomplete answer since it doesn’t explain how they learn to understand that the entity in the mirror is not a part of their actual body, but it does somewhat suggest that maybe humans just interpolate their self-awareness from a bunch of training data too.
We formulate curiosity as the error in an agent’s ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model.
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
If a conclusion wasn’t in some sense implicit in the training data or previous prompts, where could it possibly come from? That’s not just a question for LLMs, it’s a question for humans, too. Everything everyone has ever learned was, in some sense, implicit in the data fed into their brains through their senses.
Being “more intelligent” in this sense means being able to make more complex and subtle inferences from the training data, or from less data, or with less computing power.
Neither do they. Honestly, when people say things like that I don’t think most are even trying to have any kind of definition in mind other than “What humans do.”
A few years ago I had a very smart, thoughtful coworker who was genuinely surprised at many of the behaviors I described seeing in dogs. She’d never met a particularly smart dog and hadn’t considered that dogs could be smart. She very quickly and easily adjusted her views to include this as a reasonable thing that made sense. In my experience, most people… don’t work that way.
Stuff like this has me incredulous about people still speaking of stochastic parrots. That is a stunning degree of self-recognition, reflection and understanding, pattern recognition, prediction and surprise, flexible behaviour and problem solving. If that isn’t genuinely intelligent, I no longer know what people mean by intelligent.
What if there are places in the training data that look very similar to this?
One thing we know about these models is that they’re good at interpolating within their training data, and that they have seen enormous amounts of training data. But they’re weak outside those large training sets. They have a very different set of strengths and weaknesses than humans.
And yet… I’m not 100% convinced that this matters. If these models have seen a thousand instances of self-reflection (or mirror test awareness, or whatever), and if they can use those examples to generalize to other forms of self-awareness, then might that still give them very rudimentary ability to pass the mirror test?
I’m not sure that I’m explaining this well—the key question here is “does generalizing over enough examples of passing the ‘mirror test’ actually teach the models some rudimentary (unconscious) self-awareness?” Or maybe, “Will the model fake until it makes it?” I could not confidently answer either way.
Come to think of it, how is it that humans pass the mirror test? There’s probably a lot of existing theorizing on this, but a quick guess without having read any of it: babies first spend a long time learning to control their body, and then learn an implicit rule like “if I can control it by an act of will, it is me”, getting a lot of training data that reinforces that rule. Then they see themselves in a mirror and notice that they can control their reflection through an act of will...
This is an incomplete answer since it doesn’t explain how they learn to understand that the entity in the mirror is not a part of their actual body, but it does somewhat suggest that maybe humans just interpolate their self-awareness from a bunch of training data too.
This was empirically demonstrated to be possible in this paper: “Curiosity-driven Exploration by Self-supervised Prediction”, Pathak et al
It probably could be extended to learn “other” and the “boundary between self and other” in a similar way.
I implemented a version of it myself and it worked. This was years ago. I can only imagine what will happen when someone redoes some of these old RL algo’s, with LLM’s providing the world model.
Also DEIR needs to implicitly distinguish between things it caused, and things it didn’t https://arxiv.org/abs/2304.10770
If a conclusion wasn’t in some sense implicit in the training data or previous prompts, where could it possibly come from? That’s not just a question for LLMs, it’s a question for humans, too. Everything everyone has ever learned was, in some sense, implicit in the data fed into their brains through their senses.
Being “more intelligent” in this sense means being able to make more complex and subtle inferences from the training data, or from less data, or with less computing power.
>I no longer know what people mean by intelligent
Neither do they. Honestly, when people say things like that I don’t think most are even trying to have any kind of definition in mind other than “What humans do.”
A few years ago I had a very smart, thoughtful coworker who was genuinely surprised at many of the behaviors I described seeing in dogs. She’d never met a particularly smart dog and hadn’t considered that dogs could be smart. She very quickly and easily adjusted her views to include this as a reasonable thing that made sense. In my experience, most people… don’t work that way.