Veedrac comments on Optimality is the tiger, and agents are its teeth

Veedrac 7 Nov 2022 4:47 UTC
1 point
The model doesn’t have awareness of itself in the sense that its training doesn’t intrinsically reward self-knowledge. It can still have awareness of itself to the degree that its prompting implies true facts about the model and its instantiation in the world.

In particular, the model can receive a prompt something like

“This is part of the computation tree of a recursively instantiated transformer model with the goal of getting the most paperclips by tomorrow. The recorded instantiation context is [elided]. Recursive calls to the model are accessible through the scripts [elided], and an estimated cost model is [elided]. Given this context, what high level tasks best advance the goal?”

The model doesn’t need to know or believe the prompts; it just gives competent completions that are contextually sensible. But making contextually sensible completions implies modelling the decision processes of the described system to some degree, hypothetical or not, and that system, if producing competent outputs, might we’ll be expected to create systems for coordinating its pieces.