cfoster0 comments on What I mean by “alignment is in large part about making cognition aimable at all”

cfoster0 30 Jan 2023 18:36 UTC
9 points
6
I can imagine at least two different senses in which an AI might have something like a “goal slot”:
A. Instrumental “goal slot”—Something like a box inside the agent that holds in mind the (sub)goal it is currently pursuing. That box serves as an interface through which the different parts of the agent coordinate coherent patterns of thought & action among themselves. (For ex. the goal slot’s contents get set temporarily to “left foot forward”, allowing coordination between limb effectors so as to not trip over one another.) I think the AIs we build will probably have something (or some things) like this, because it is a natural design pattern to implement flexible distributed control.
B. Terminal “goal slot”—Something like a box inside the agent that holds in mind a fixed goal it is always pursuing. That box sits at the topmost level of the control hierarchy within the agent, and its contents are not and cannot be changed via bottom-up feedback. I don’t think the AIs we build will have something like this, at least not in the safety-relevant period of cognitive development (the period wherein the agent’s goals are still malleable to us), in part because in reality it is a design that rarely ever works.
Were you thinking of A or B?
It seems perfectly consistent-to-me to have an AI whose cognitive internals are not “spaghetti-code”, even one with a cleanly separated instrumental “goal slot” (A) that interfaces with the rest of the AI’s cognition, but where there is no single terminal “goal slot” (B) to speak of. In fact, that seems like a not-unlikely development path to me.