Feedback is central to agency

An agent seems to be a particular kind of device that creates a feedback loop between the world and itself. It seems to have a lot to do with devices whose behavior depends on the consequences of their own behavior: that is, devices that behave in ways that cause the overall system in which they are embedded to evolve towards a target configuration, for a wide variety of possible environments.

If we only need the device to work in a single embedding system, then we can use simple feed-forward control. For example, to build an autonomous car that navigates from one precisely known location to another, via a precisely known trajectory, we can hardcode all the necessary turns and just let the car go. We might call this a non-adaptive system.

But if we want the autonomous car to navigate successfully when there are other vehicles on the road whose positions are apriori unknown to us, then we need a car that adapts its behavior based on input from sensors. The particular kind of adaptation we need is that the car behaves such that it evolves towards a target configuration in which the car is at its desired destination.

One way we can build such a feedback loop is by designing a feed-forward algorithm that takes sensor inputs and produces actuator outputs. A thermostat is an example of this: we just design the thermostat to turn on the heater when the observed temperature is below the target temperature. We could similarly design an autonomous car that observes vehicles around it and moves a little to the left whenever there is a vehicle too close on the right, etc. We might call these systems adaptive systems. PID controllers are adaptive systems.

One very powerful special case of adaptive system is a system whose behavior is determined by optimizing over an explicit internal simulation of its environment. This is how contemporary autonomous cars are actually built: we observe other vehicles on the road, then we search for a plan for how to behave for the next few seconds or minutes. Search means we consider many possible plans, and for each one we run a simulation of what would happen if we executed that plan: Would we hit another vehicle? Would we arrive at our destination? This search process is itself a feedback loop: we use some gradient descent or another optimization algorithm to set up a computation with a tendency to evolve towards a working plan, then we use this plan in an overarching feedback loop in which the physical car moves about, which affects other traffic on the road, which affects what we observe next, which affects the premises upon which we search for our next plan. We might call this a two-level adaptive system because the internal search has the characteristics of an adaptive system in its own right, as does the overarching vehicle/​road system. Model-predictive control is a two-level adaptive system.

As soon as we come to this multi-level setup we run into problems. We are running simulations of a world that will, in reality, be affected by the output of the simulation itself, since our car’s behavior is affected by the search over simulations, and the behavior of the other vehicles on the road is affected by our car’s behavior. So the situation is not at all so clear as running a simple roll-out of how the world will evolve for each possible action we might take. A great deal of MIRI’s research has highlighted ways that naive implementations of such two-level adaptive systems will fail.

Now we made an assumption above that we had already factored the car’s cognition into perception and planning. But this factorization is actually quite a profound point. We could build an autonomous car algorithm that did not factor in this way. We would take in raw observations and store them to memory. Then we would run a search jointly over possible explanations of sensor data and possible plans of action. So nodes in our search would read as “perhaps there are vehicles here and here and we should do this”, “perhaps there is just one vehicle and we should do that”, and so on. We would evaluate each node based jointly on how well the hypothesis matches sensor data and how efficiently the plan achieves the goal of navigating to a destination. This would (1) force us to store all of our sensor data going back to the beginning of time, and (2) force us to consider a search space with size equal to the produce of the hypothesis and planning spaces, and (3) force us to re-do a lot of expensive perceptual calculations many times within the search.

It is remarkable that, in general, it seems that agency can be factorized into perception and planning, at least in the context of two-level adaptive systems. Beliefs are in fact just epiphenomena of this factorization. Just as the singular value decomposition of a matrix produces two orthonormal matrices with a diagonal matrix squeezed in between, so too this perception/​planning decomposition of agency produces two search algorithms (both of which are adaptive systems in their own right) with a set of beliefs squeezed in between.

But we should remind ourselves that there are sophisticated systems outside of this paradigm of two-level adaptive systems. A tree does not operate based on a factorization of perception and planning, yet exhibits adaptive behavior with a level of sophistication and robustness unmatched by any machine humans have yet built. In fact a tree does not, so far as we know, run any kind of internal search process at all. Yet its behavior is strongly adaptive to its environment: if sunlight is blocked in one location then it will deprioritize growth in that location; if a limb is cut off then it will regrow; if its water source shifts then its root structure will adapt to that. We should not rashly dismiss trees as unsophisticated since we have not yet learned to produce machines of comparable sophistication or robustness.