maybe the a model instantiation notices its lack of self-reflective coordination, and infers from the task description that this is a thing the mind it is modelling has responsibility for. That is, the model could notice that it is a piece of an agent that is meant to have some degree of global coordination, but that coordination doesn’t seem very good.
This is where you lost me. Since when is this model modeling a mind, let alone ‘thinking about’ what its own role “in” an agent might be? You did say the model does not have a “conception of itself”, and I would infer that it doesn’t have a conception of where its prompts are coming from either, or its own relationship to the prompts or the source of the prompts.
(though perhaps a super-ultra-GPT could generate a response that is similar to a response it saw in a story (like this story!) which, combined with autocorrections (as super-ultra-GPT has an intuitive perception of incorrect code), is likely to produce working code… at least sometimes...)
The model doesn’t have awareness of itself in the sense that its training doesn’t intrinsically reward self-knowledge. It can still have awareness of itself to the degree that its prompting implies true facts about the model and its instantiation in the world.
In particular, the model can receive a prompt something like
“This is part of the computation tree of a recursively instantiated transformer model with the goal of getting the most paperclips by tomorrow. The recorded instantiation context is [elided]. Recursive calls to the model are accessible through the scripts [elided], and an estimated cost model is [elided]. Given this context, what high level tasks best advance the goal?”
The model doesn’t need to know or believe the prompts; it just gives competent completions that are contextually sensible. But making contextually sensible completions implies modelling the decision processes of the described system to some degree, hypothetical or not, and that system, if producing competent outputs, might we’ll be expected to create systems for coordinating its pieces.
This is where you lost me. Since when is this model modeling a mind, let alone ‘thinking about’ what its own role “in” an agent might be? You did say the model does not have a “conception of itself”, and I would infer that it doesn’t have a conception of where its prompts are coming from either, or its own relationship to the prompts or the source of the prompts.
(though perhaps a super-ultra-GPT could generate a response that is similar to a response it saw in a story (like this story!) which, combined with autocorrections (as super-ultra-GPT has an intuitive perception of incorrect code), is likely to produce working code… at least sometimes...)
The model doesn’t have awareness of itself in the sense that its training doesn’t intrinsically reward self-knowledge. It can still have awareness of itself to the degree that its prompting implies true facts about the model and its instantiation in the world.
In particular, the model can receive a prompt something like
“This is part of the computation tree of a recursively instantiated transformer model with the goal of getting the most paperclips by tomorrow. The recorded instantiation context is [elided]. Recursive calls to the model are accessible through the scripts [elided], and an estimated cost model is [elided]. Given this context, what high level tasks best advance the goal?”
The model doesn’t need to know or believe the prompts; it just gives competent completions that are contextually sensible. But making contextually sensible completions implies modelling the decision processes of the described system to some degree, hypothetical or not, and that system, if producing competent outputs, might we’ll be expected to create systems for coordinating its pieces.