I agree that the framing of rational agentic behavior as being “about” maximizing some (arbitrary) utility function is getting at things from the wrong perspective. Yes, consistent rational behavior can always be cast in those terms, and a fixed utility function can be found that is being maximized by any given behavior, but I don’t think that’s what is usually driving behavior in the first place.
How about this:
An intelligent agent is a system that engages in teleogenesis, generating internal representations of arbitrary goal states (or trajectories or limit cycles) and optimizing its behavior to steer toward states that match those representations. The broader the space of potential goal states that can be successfully navigated towards, and the better the system models its environment in order to do so, the more intelligent it is.
The manifold of reachable goal states may be necessarily restricted in some dimensions, such as for homeostatic or allostatic maintenance, but in general, an intelligent system should be able to set arbitrary goals and reward itself in proportion to how well it is achieving them.
Goal states may be more “terminal”-like, representing states with high predicted utility according to some built-in value function (homestasis, status, reproductive success, number of staples), or they may be more “instrumental”-like, representing states from which reaching terminal goals is predicted to be easier (power, resources, influence, etc.), or they may be more purely arbitrary (commandments from on high, task assignments, play, behavioral curiosity, epistemic curiosity). But wherever goals come from, intelligence is about being able to find ways to achieve them.
That seems pretty close. One complication is what “can be successfully navigated towards” means; can a paperclip maximizer successfully navigate towards states without lots of paperclips? I suppose if it factors into a “goal module” and a “rest of the agent module”, then the “rest of the agent module” could navigate towards lots of different states even if the overall agent couldn’t.
One reason I mentioned MDP value functions is that they don’t bake in the assumption that the value function only specifies terminal values, the value function also includes instrumental state values. So it might be able to represent some of what you’re talking about.
“Can be successfully navigated towards” means that there exists a set of policies for the agent that is reachable via reinforcement learning on the goal objective, which would allow the agent to consistently achieve the goal when followed (barring any drastic changes to the environment, although the policy may account for environmental fluctuations).
Thanks for the paper on causal entropic forces, by the way. I hadn’t seen this research before, but it synergizes well with ideas I’ve been having related to alignment. At the risk of being overly reductive, I think we could do worse than designing an AGI that predictively models the goal distributions of other agents (i.e., humans) and generates as its own “terminal” goals those states that maximize the entropy of goal distributions reachable by the other agents. Essentially, seeking to create a world from which humans (and other systems) have the best chance at directing their own future.
Different agents sense and store different information bits from the environment and affect different property bits of the environment. Even if two agents have the same capability (number of bits controlled), the facets they may actually control may be very different. Only at high level of capability, where more and more bits are controlled overall, do bitsets overlap more and more and capabilities converge—instrumental convergence.
I agree that the framing of rational agentic behavior as being “about” maximizing some (arbitrary) utility function is getting at things from the wrong perspective. Yes, consistent rational behavior can always be cast in those terms, and a fixed utility function can be found that is being maximized by any given behavior, but I don’t think that’s what is usually driving behavior in the first place.
How about this:
An intelligent agent is a system that engages in teleogenesis, generating internal representations of arbitrary goal states (or trajectories or limit cycles) and optimizing its behavior to steer toward states that match those representations. The broader the space of potential goal states that can be successfully navigated towards, and the better the system models its environment in order to do so, the more intelligent it is.
The manifold of reachable goal states may be necessarily restricted in some dimensions, such as for homeostatic or allostatic maintenance, but in general, an intelligent system should be able to set arbitrary goals and reward itself in proportion to how well it is achieving them.
Goal states may be more “terminal”-like, representing states with high predicted utility according to some built-in value function (homestasis, status, reproductive success, number of staples), or they may be more “instrumental”-like, representing states from which reaching terminal goals is predicted to be easier (power, resources, influence, etc.), or they may be more purely arbitrary (commandments from on high, task assignments, play, behavioral curiosity, epistemic curiosity). But wherever goals come from, intelligence is about being able to find ways to achieve them.
That seems pretty close. One complication is what “can be successfully navigated towards” means; can a paperclip maximizer successfully navigate towards states without lots of paperclips? I suppose if it factors into a “goal module” and a “rest of the agent module”, then the “rest of the agent module” could navigate towards lots of different states even if the overall agent couldn’t.
Causal entropic forces is another proposal that’s related to being able to reach a lot of states. Also empowerment objectives.
One reason I mentioned MDP value functions is that they don’t bake in the assumption that the value function only specifies terminal values, the value function also includes instrumental state values. So it might be able to represent some of what you’re talking about.
“Can be successfully navigated towards” means that there exists a set of policies for the agent that is reachable via reinforcement learning on the goal objective, which would allow the agent to consistently achieve the goal when followed (barring any drastic changes to the environment, although the policy may account for environmental fluctuations).
Thanks for the paper on causal entropic forces, by the way. I hadn’t seen this research before, but it synergizes well with ideas I’ve been having related to alignment. At the risk of being overly reductive, I think we could do worse than designing an AGI that predictively models the goal distributions of other agents (i.e., humans) and generates as its own “terminal” goals those states that maximize the entropy of goal distributions reachable by the other agents. Essentially, seeking to create a world from which humans (and other systems) have the best chance at directing their own future.
Different agents sense and store different information bits from the environment and affect different property bits of the environment. Even if two agents have the same capability (number of bits controlled), the facets they may actually control may be very different. Only at high level of capability, where more and more bits are controlled overall, do bitsets overlap more and more and capabilities converge—instrumental convergence.