The ground of optimization
This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community.
What is optimization? What is the relationship between a computational optimization process — say, a computer program solving an optimization problem — and a physical optimization process — say, a team of humans building a house?
We propose the concept of an optimizing system as a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system. We compare our definition to that proposed by Yudkowsky, and place our work in the context of work by Demski and Garrabrant’s Embedded Agency, and Drexler’s Comprehensive AI Services. We show that our definition resolves difficult cases proposed by Daniel Filan. We work through numerous examples of biological, computational, and simple physical systems showing how our definition relates to each.
Introduction
In the field of computer science, an optimization algorithm is a computer program that outputs the solution, or an approximation thereof, to an optimization problem. An optimization problem consists of an objective function to be maximized or minimized, and a feasible region within which to search for a solution. For example we might take the objective function as a minimization problem and the whole real number line as the feasible region. The solution then would be and a working optimization algorithm for this problem is one that outputs a close approximation to this value.
In the field of operations research and engineering more broadly, optimization involves improving some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements. For example, we might choose to measure a nail factory by the rate at which it outputs nails, relative to the cost of production inputs. We can view this as a kind of objective function, with the factory as the object of optimization just as the variable x was the object of optimization in the previous example.
There is clearly a connection between optimizing the factory and optimizing for x, but what exactly is this connection? What is it that identifies an algorithm as an optimization algorithm? What is it that identifies a process as an optimization process?
The answer proposed in this essay is: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process.
We do not imagine that there is some engine or agent or mind performing optimization, separately from that which is being optimized. We consider the whole system jointly — engine and object of optimization — and ask whether it exhibits a tendency to evolve towards a predictable target configuration. If so, then we call it an optimizing system. If the basin of attraction is deep and wide then we say that this is a robust optimizing system.
An optimizing system as defined in this essay is known in dynamical systems theory as a dynamical system with one or more attractors. In this essay we show how this framework can help to understand optimization as manifested in physically closed systems containing both engine and object of optimization.
In this way we find that optimizing systems are not something that are designed but are discovered. The configuration space of the world contains countless pockets shaped like small and large basins, such that if the world should crest the rim of one of these pockets then it will naturally evolve towards the bottom of the basin. We care about them because we can use our own agency to tip the world into such a basin and then let go, knowing that from here on things will evolve towards the target region.
All optimization basins have a finite extent. A ball may roll to the center of a valley if initially placed anywhere within the valley, but if it is placed outside the valley then it will roll somewhere else entirely, or perhaps will not roll at all. Similarly, even a very robust optimizing system has an outer rim to its basin of attraction, such that if the configuration of the system is perturbed beyond that rim then the system no longer evolves towards the target that it once did. When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.
Example: computing the square root of two
Say I ask my computer to compute the square root of two, for example by opening a python interpreter and typing:
>>> print(math.sqrt(2))
1.41421356237
The value printed here is actually calculated by solving an optimization problem. It works roughly as follows. First we set up an objective function that has as its minimum value the square root of two. One function we could use is
Next we pick an initial estimate for the square root of two, which can be any number whatsoever. Let’s take 1.0 as our initial guess. Then we take a gradient step in the direction indicated by computing the slope of the objective function at our initial estimate:
Then we repeat this process of computing the slope and updating our estimate over and over, and our optimization algorithm quickly converges to the square root of two:
This is gradient descent, and it can be implemented in a few lines of python code:
current_estimate = 1.0
step_size = 1e3
while True:
objective = (current_estimate**2  2) ** 2
gradient = 4 * current_estimate * (current_estimate**2  2)
if abs(gradient) < 1e8:
break
current_estimate = gradient * step_size
But this program has the following unusual property: we can modify the variable that holds the current estimate of the square root of two at any point while the program is running, and the algorithm will still converge to the square root of two. That is, while the code above is running, if I drop in with a debugger and overwrite the current estimate while the loop is still executing, what will happen is that the next gradient step will start correcting for this perturbation, pushing the estimate back towards the square root of two:
If we give the algorithm time to converge to within machine precision of the actual square root of two then the final output will be bitforbit identical to the result we would have gotten without the perturbation.
Consider this for a moment. For most kinds of computer code, overwriting a variable while the code is running will either have no effect because the variable isn’t used, or it will have a catastrophic effect and the code will crash, or it will simply cause the code to output the wrong answer. If I use a debugger to drop in on a webserver servicing an http request and I overwrite some variable with an arbitrary value just as the code is performing a loop in which this variable is used in a central way, bad things are likely to happen! Most computer code is not robust to arbitrary inflight data modifications.
But this code that computes the square root of two is robust to inflight data modifications, or at least the “current estimate” variable is. It’s not that our perturbation has no effect: if we change the value, the next iteration of the algorithm will compute the objective function and its slope at a completely different point, and each iteration after that will be different to how it would have been if we hadn’t intervened. The perturbation may change the total number of iterations before convergence is reached. But ultimately the algorithm will still output an estimate of the square root of two, and, given time to fully converge, it will output the exact same answer it would have output without the perturbation. This is an unusual breed of computer program indeed!
What is happening here is that we have constructed a physical system consisting of a computer and a python program that computes the square root of two, such that:

for a set of starting configurations (in this case the set of configurations in which the “current estimate” variable is set to each representable floating point number),

the system exhibits a tendency to evolve towards a small set of target configurations (in this case just the single configuration in which the “current estimate” variable is set to the square root of two),

and this tendency is robust to inflight perturbations to the system’s configuration (in this case robustness is limited to just the dimensions corresponding to changes in the “current estimate” variable).
In this essay I argue that systems that converge to some target configuration, and will do so despite perturbations to the system, are the systems we should rightly call “optimizing systems”.
Example: building a house
Consider a group of humans building a house. Let us consider the humans together with the building materials and construction site as a single physical system. Let us imagine that we assemble this system inside a completely closed chamber, including food and sleeping quarters for the humans, lighting, a power source, construction materials, construction blueprint, as well as the physical humans with appropriate instructions and incentives to build the house. If we just put these physical elements together we get a system that has a tendency to evolve under the natural laws of physics towards a configuration in which there is a house matching the blueprint.
We could perturb the system while the house is being built — say by dropping in at night and removing some walls or moving some construction materials about — and this physical system will recover. The team of humans will come in the next day and find the construction materials that were moved, put in new walls to replace the ones that were removed, and so on.
Just like the square root of two example, here is a physical system with:

A basin of attraction (all the possible arrangements of viable humans and building materials)

A target configuration set that is small relative to the basin of attraction (those in which the building materials have been arranged into a house matching the design)

A tendency to evolve towards the target configurations when starting from any point within the basin of attraction, despite inflight perturbations to the system
Now this system is not infinitely robust. If we really scramble the arrangement of atoms within this system then we’ll quickly wind up with a configuration that does not contain any humans, or in which the building materials are irrevocably destroyed, and then we will have a system without the tendency to evolve towards any small set of final configurations.
In the physical world we are not surprised to find systems that have this tendency to evolve towards a small set of target configurations. If I pick up my dog while he is sleeping and move him by a few inches, he still finds his way to his water bowl when he wakes up. If I pull a piece of bark off a tree, the tree continues to grow in the same upward direction. If I make a noise that surprises a friend working on some math homework, the math homework still gets done. Systems that contain living beings regularly exhibit this tendency to evolve towards target configurations, and tend to do so in a way that is robust to inflight perturbations. As a result we are familiar with physical systems that have this property, and we are not surprised when they arise in our lives.
But physical systems in general do not have the tendency to evolve towards target configurations. If I move a billiard ball a few inches to the left while a bunch of billiard balls are energetically bouncing around a billiard table, the balls are likely to come to rest in a very different position than if I had not moved the ball. If I change the trajectory of a satellite a little bit, the satellite does not have any tendency to move back into its old orbit.
The computer systems that we have built are still, by and large, more primitive than the living systems that we inhabit, and most computer systems do not have the tendency to evolve robustly towards some set of target configurations, so optimization algorithms as discussed in the previous section, which do have this property, are somewhat unusual.
Defining optimization
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
Some systems may have a single target configuration towards which they inevitably evolve. Examples are a ball in a steep valley with a single local minimum, and a computer computing the square root of two. Other systems may have a set of target configurations and perturbing the system may cause it to evolve towards a different member of this set. Examples are a ball in a valley with multiple local minima, or a tree growing upwards (perturbing the tree by, for example, cutting off some branches while it is growing will probably change its final shape, but will not change its tendency to grow towards one of the configurations in which it has reached its maximum size).
We can quantify optimizing systems in the following ways.
Robustness. Along how many dimensions can we perturb the system without altering its tendency to evolve towards the target configuration set? What magnitude perturbation can the system absorb along these dimensions? A selfdriving car navigating through a city may be robust to perturbations that involve physically moving the car to a different position on the road in the city, but not to perturbations that involve changing the state of physical memory registers that contain critical bits of computer code in the car’s internal computer.
Duality. To what extent can we identify subsets of the system corresponding to “that which is being optimized” and “that which is doing the optimization”? Between engine and object of optimization; between agent and world. Highly dualistic systems may be robust to perturbations of the object of optimization, but brittle with respect to perturbations of the engine of optimization. For example, a system containing a 2020sera robot moving a vase around is a dualistic optimizing system: there is a clear subset of the system that is the engine of optimization (the robot), and object of optimization (the vase). Furthermore, the robot may be able to deal with a wide variety of perturbations to the environment and to the vase, but there are likely to be numerous small perturbations to the robot itself that will render it inert. In contrast, a tree is a nondualistic optimizing system: the tree does grow towards a set of target configurations, but it makes no sense to ask which part of the tree is “doing” the optimization and which part is “being” optimized. This latter example is discussed further below.
Retargetability. Is it possible, using only a microscopic perturbation to the system, to change the system such that it is still an optimizing system but with a different target configuration set? A system containing a robot with the goal of moving a vase to a certain location can be modified by making just a small number of microscopic perturbations to key memory registers such that the robot holds the goal of moving the vase to a different location and the whole vase/robot system now exhibits a tendency to evolve towards a different target configuration. In contrast, a system containing a ball rolling towards the bottom of a valley cannot generally be modified by any microscopic perturbation such that the ball will roll to a different target location. A tree is an intermediate example: to cause the tree to evolve towards a different target configuration set — say, one in which its leaves were of a different shape — one would have to modify the genetic code simultaneously in all of the tree’s cells.
Relationship to Yudkowsky’s definition of optimization
In Measuring Optimization Power, Eliezer Yudkowsky defines optimization as a process in which some part of the world ends up in a configuration that is high in an agent’s preference ordering, yet has low probability of arising spontaneously. Yudkowsky’s definition asks us to look at a patch of the world that has already undergone optimization by an agent or mind, and draw conclusions about the power or intelligence of that mind by asking how unlikely it would be for a configuration of equal or greater utility (to the agent) to arise spontaneously.
Our definition differs from this in the following ways:

We look at whole systems that evolve naturally under physical laws. We do not assume that we can decompose these systems into some engine and object of optimization, or into mind and environment. We do not look at systems that are “being optimized” by some external entity but rather at “optimizing systems” that exhibit a natural tendency to evolve towards a target configuration set. These optimizing systems may contain subsystems that have the properties of agents, but as we will see there are many instances of optimizing systems that do not contain dualistic agentic subsystems.

When discerning the boundary between optimization and nonoptimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.
Relationship to Drexler’s Comprehensive AI Services
Eric Drexler has written about the need to consider AI systems that are not goaldirected agents. He points out that the most economically important AI systems today are not constructed within the agent paradigm, and that in fact agents represent just a tiny fraction of the design space of intelligent systems. For example, a system that identifies faces in images would be an intelligent system but not an agent according to Drexler’s taxonomy. This perspective is highly relevant to our discussion here since we seek to go beyond the narrow agent model in which intelligent systems are conceived of as unitary entities that receive observations from the environment, send actions back into the environment, but are otherwise separate from the environment.
Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goaldirected agentic systems.
Figure: relationship between our optimizing system concept and Drexler’s taxonomy of AI systems
Examples of systems that lie in each of these three tiers are as follows:

A system that identifies faces in images by evaluating a feedforward neural network is an AI system but not an optimizing system.

A tree is an optimizing system but not a goaldirected agent system (see section below analyzing a tree as an optimizing system).

A robot with the goal of moving a ball to a specific destination is a goaldirected agent system.
Relationship to Garrabrant and Demski’s Embedded Agency
Scott Garrabrant and Abram Demski have written about the many ways that a dualistic view of agency in which one conceives of a hard separation between agent and environment fails to capture the reality of agents that are reducible to the same basic buildingblocks as the environments in which they are embedded. They show that if one starts from a dualistic view of agency then it is difficult to design agents capable of reflecting on and making improvements to their own cognitive processes, since the dualistic view of agency rests on a unitary agent whose cognition does not affect the world except via explicit actions. They also show that reasoning about counterfactuals becomes nonsensical if starting from a dualistic view of agency, since the agent’s cognitive processes are governed by the same physical laws as those that govern the environment, and the agent can come to notice this fact, leading to confusion when considering the consequences actions that are different from the actions that the agent will, in fact, output.
One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the “optimizer” concept as the starting point for designing intelligent systems, rather than “optimizing system” as we propose here. The present work is strongly inspired by Garrabrant and Demski’s work. Our hope is to point the way to a view of optimization and agency that captures reality sufficiently well to avoid the logical pitfalls identified in the Embedded Agency work.
Example: ball in a valley
Consider a physical ball rolling around in a small valley. According to our definition of optimization, this is an optimizing system:
Configuration space. The system we are studying consists of the physical valley plus the ball
Basin of attraction. The ball could initially be placed anywhere in the valley (these are the configurations comprising the basin of attraction)
Target configuration set. The ball will roll until it ends up at the bottom of the valley (the set of local minima are the target configurations)
We can perturb the ball while it is “in flight”, say by changing its position or velocity, and the ball will still ultimately end up at one of the target configurations. This system is robust to perturbations along dimensions corresponding to the spatial position and velocity of the ball, but there are many more dimensions along which this system is not robust. If we change the shape of the ball to a cube, for example, then the ball will not continue rolling to the bottom of the valley.
Example: ball in valley with robot
Consider now a ball in a valley as above, but this time with the addition of an intelligent robot holding the goal of ensuring that the ball reaches the bottom of the valley.
Configuration space. The system we are studying now consists of the physical valley, the ball, and the robot. We consider the evolution of and perturbations to this whole joint system.
Target configuration set. As before, the target configuration is the ball being at the bottom of the valley
Basin of attraction. As before, the basin of attraction consists of all the possible spatial locations that the ball could be placed in the valley.
We can now perturb the system along many more dimensions than in the case where there was no robot. For example, we could introduce a barrier that prevents the ball from rolling downhill past a certain point, and we can then expect a sufficiently intelligent robot to move the ball over the barrier. We can expect a sufficiently welldesigned robot to be able to overcome a wide variety of hurdles that gravity would not overcome on its own. Therefore we say that this system is more robust than the system without the robot.
There is a sequence of systems spanning the gap between a ball rolling in a valley, which is robust to a narrow set of perturbations and therefore we say exhibits a weak degree of optimization, up to a robot with a goal of moving a ball around in a valley, which is robust to a much wider set of perturbations, and therefore we say exhibits a stronger degree of optimization. Therefore the difference between systems that do and do not undergo optimization is not a binary distinction but a continuous gradient of increasing robustness to perturbations.
By introducing the robot to the system we have also introduced new dimensions along which the system is fragile: the dimensions corresponding to modifications to the robot itself, and in particular the dimensions corresponding to modifications to the code running on the robot (i.e. physical perturbations to the configuration of the memory cells in which the code is stored). There are two types of perturbation we might consider:

Perturbations that destroy the robot. There are numerous ways we could cut wires or scramble computer code that would leave the robot completely nonoperational. Many of these would be physically microscopic, such as flipping a single bit in a memory cell containing some critical computer code. In fact there are now more ways to break the system via microscopic perturbations compared to when we were considering a ball in a valley without a robot, since there are few ways to cause a ball not to reach the bottom of a valley by making only a microscopic perturbation to the system, but there are many ways to break modern computer systems via a microscopic perturbation.

Perturbations that change the target configurations. We could also make physically microscopic perturbations to this system that change the robot’s goal. For example we might flip the sign on some critical computations in the robot’s code such that the robot works to place the ball at the highest point rather than the lowest. This is still a physical perturbation to the valley/ball/robot system: it is one that affects the configuration of the memory cells containing the robot’s computer code. These kinds of perturbations may point to a concept with some similarity to that of an agent. If we have a system that can be perturbed in a way that preserves the robustness of the basin of convergence but changes the target configuration towards which the system tends to evolve, and if we can find perturbations that cause the target configurations to match our own goals, then we have a way to navigate between convergence basins.
Example: computer performing gradient descent
Consider now a computer running an iterative gradient descent algorithm in order to solve an optimization problem. For concreteness let us imagine that the objective function being optimized is globally convex, in which case the algorithm will certainly reach the global optimum given sufficient time. Let us further imagine that the computer stores its current best estimate of the location of the global optimum (which we will henceforth call the “optimizand”) at some known memory location, and updates this after every iteration of gradient descent.
Since this is a purely computational process, it may be tempting to define the configuration space at the computational level — for example by taking the configuration space to be the domain of the objective function. However, it is of utmost importance when analyzing any optimizing system to ground our analysis in a physical system evolving according to the physical laws of nature, just as we have for all previous examples. The reason this is important is to ensure that we always study complete systems, not just some inert part of the system that is “being optimized” by something external to the system. Therefore we analyze this system as follows.
Configuration space. The system consists of a physical computer running some code that performs gradient descent. The configurations of the system are the physical configurations of the atoms comprising the computer.
Targetconfiguration set. The target configuration set consists of the set of physical configurations of the computer in which the memory cells that store the current optimized state contain the true location of the global optimum (or the closest floating point representation of it).
Basin of attraction. The basin of attraction consists of the set of physical configurations in which there is a viable computer and it is running the gradient descent algorithm.
Example: billiard balls
Let us now examine a system that is not an optimizing system according to our definition. Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration. Is this an optimizing system?
In order to qualify as an optimizing system, a system must (1) have a tendency to evolve towards a set of target configurations that are small relative to the basin of attraction, and (2) continue to evolve towards the same set of target configurations if perturbed.
If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations. A system does not need to be robust along all dimensions in order to be an optimizing system, but a billiard table exhibits no such robust dimensions at all, so it is not an optimizing system.
Example: satellite in orbit
Consider a second example of a system that is not an optimizing system: a satellite in orbit around Earth. Unlike the billiard balls, there is no chaotic tendency for small perturbations to lead to large deviations in the system’s evolution, but neither is there any tendency for the system to come back to some target configuration when perturbed. If we perturb the satellite’s velocity or position, then from that point on it is in a different orbit and has no tendency to return to its previous orbit. There is no set of target configurations towards which the system evolves despite perturbations, so this is not an optimizing system.
Example: a tree
Consider a patch of fertile ground with a tree growing in it. Is this an optimizing system?
Configuration space. For the sake of concreteness let us take a region of space that is sealed off from the outside world — say 100m x 100m x 100m. This region is filled at the bottom with fertile soil and at the top with an atmosphere conducive to the tree’s growth. Let us say that the region contains a single tree.
We will analyze this system in terms of the arrangement of atoms inside this region of space. Out of all the possible configurations of these atoms, the vast majority consist of a uniform hazy gas. An astronomically tiny fraction of configurations contain a nontrivial mass of complex biological nutrients making up soil. An even tinier fraction of configurations contain a viable tree.
Targetconfiguration set. A tree has a tendency to grow taller over time, to sprout more branches and leaves, and so on. Furthermore, trees can only grow so tall due to the physics of transporting sugars up and down the trunk. So we can identify a set of target configurations in which the atoms in our region of space are arranged into a tree that has grown to its maximum size (has sprouted as many branches and leaves as it can support given the atmosphere, the soil that it is growing in, and the constraints of its own biology). There are many topologies in which the tree’s branches could divide, many positions that leaves could sprout in, and so on, so there are many configurations within the target configuration set. But this set is still tiny compared to all the ways that the same atoms could be arranged without the constraint of forming a viable tree.
Basin of convergence. This system will evolve towards the target configuration set starting from any configuration in which there is a viable tree. This includes configurations in which there is just a seed in the ground, as well as configurations in which there is a tree of small, medium, or large size. Starting from any of these configurations, if we leave the system to evolve under the natural laws of physics then the tree will grow towards its maximum size, at which point the system will be in one of the target configurations.
Robustness to perturbations. This system is highly robust to perturbations. Consider perturbing the system in any of the following ways:

Moving soil from one place to another

Removing some leaves from the tree

Cutting a branch off the tree
These perturbations might change which particular target configuration is eventually reached — the particular arrangement of branches and leaves in the tree once it reaches its maximum size — but they will not stop the tree from growing taller and evolving towards a target configuration. In fact we could cut the tree right at the base of the trunk and it would continue to evolve towards a target configuration by sprouting a new trunk and growing a whole new tree.
Duality. A tree is a nondualistic optimizing system. There is no subsystem that is responsible for “doing” the optimization, separately from that which is “being” optimized. Yet the tree does exhibit a tendency to evolve towards a set of target configurations, and can overcome a wide variety of perturbations in order to do so. There are no manmade systems in existence today that are capable of gathering and utilizing resources so flexibly as a tree, from so broad a variety of environments, and there are certainly no manmade systems that can recover from being physically dismembered to such an extent that a tree can recover from being cut at the trunk.
At this point it may be tempting to say that the engine of optimization is natural selection. But recall that we are studying just a single tree growing from seed to maximum size. Can you identify a physical subset of our 100m x 100m x 100m region of space that is this engine of optimization, analogous to how we identified a physical subset of the robotandball system as the engine of optimization (i.e. the physical robot)? Natural selection might be the process by which the initial system came into existence, but it is not the process that drives the growth of the tree towards a target configuration.
It may then be tempting to say that it is the tree’s DNA that is the engine of optimization. It is true that the tree’s DNA exhibits some characteristics of an engine of optimization: it remains unchanged throughout the life of the tree, and physically microscopic perturbations to it can disable the tree. But a tree replicates its DNA in each of its cells, and perturbing just one or a small number of these is not likely to affect the tree’s overall growth trajectory. More importantly, a single strand of DNA does not really have agency on its own: it requires the molecular machinery of the whole cell to synthesize proteins based on the genetic code in the DNA, and the physical machinery of the whole tree to collect and deploy energy, water, and nutrients. Just as it would be incorrect to identify the memory registers containing computer code within a robot as the “true” engine of optimization separate from the rest of the computing and physical machinery that brings this code to life, it is not quite accurate to identify DNA as an engine of optimization. A tree simply does not decompose into engine and object of optimization.
It may also be tempting to ask whether the tree can “really” be said to be undergoing optimization in the absence of any “intention” to reach one of the target configurations. But this expectation of a centralized mind with centralized intentions is really an artifact of us projecting our view of our self onto the world: we believe that we have a centralized mind with centralized intentions, so we focus our attention on optimizing systems with a similar structure. But this turns out to be misguided on two counts: first, the vast majority of optimizing systems do not contain centralized minds, and second, our own minds are actually far less centralized than we think! For now we put this question of whether optimization requires intentions and instead just work within our definition of optimizing systems, which a tree definitely satisfies.
Example: bottle cap
Daniel Filan has pointed out that some definitions of optimization would nonsensically classify a bottle cap as an optimizer, since a bottle cap causes water molecules in a bottle to stay inside the bottle, and the set of configurations in which the molecules are inside a bottle is much smaller than the set of configurations in which the molecules are each allowed to take a position either inside or outside the bottle.
In our framework we have the following:

The system consists of a bottle, a bottle cap, and water molecules. The configuration space consists of all the possible spatial arrangements of water molecules, either inside or outside the bottle.

The basin of attraction is the set of configurations in which the water molecules are inside the bottle

The target configuration set is the same as the basin of attraction
This is not an optimizing system for two reasons.
First, the target configuration set is no smaller than the basin of attraction. To be an optimizing system there must be a tendency to evolve from any configuration within a basin of attraction towards a smaller target configuration set, but in this case the system merely remains within the set of configurations in which the water molecules are inside the bottle. This is no different from a rock sitting on a beach: due to basic chemistry there is a tendency to remain within the set of configurations in which the molecules comprising the rock are physically bound to one another, but it has no tendency to evolve from a wide basin of attraction towards a small set of target configuration.
Second, the bottle cap system is not robust to perturbations since if we perturb the position of a single water molecule so that it is outside the bottle, there is no tendency for it to move back inside the bottle. This is really just the first point above restated, since if there were a tendency for water molecules moved outside the bottle to evolve back towards a configuration in which all the water molecules were inside the bottle, then we would have a basin of attraction larger than the target configuration set.
Example: the human liver
Filan also asks whether one’s liver should be considered an optimizer. Suppose we observe a human working to make money. If this person were deprived of a liver, or if their liver stopped functioning, they would presumably be unable to make money. So are we then to view the liver as an optimizer working towards the goal of making money? Filan asks this question as a challenge to Yudkowsky’s definition of optimization, since it seems absurd to view one’s liver as an optimizer working towards the goal of making money, yet Yudkowsky’s definition of optimization might classify it as such.
In our framework we have the following:

The system consists of a human working to make money, together with the whole human economy and world.

The basin of attraction consists of the configurations in which there is a healthy human (with a healthy liver) having the goal of making money

The target configurations are those in which this person’s bank balance is high. (Interestingly there is no upper bound here, so there is no fixed point but rather a continuous gradient.)
We can expect that this person is capable of overcoming a reasonably broad variety of obstacles in pursuit of making money, so we recognize that this overall system (the human together with the whole economy) is an optimizing system. But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.
In general we cannot expect to decompose optimizing systems into an engine of optimization and object of optimization. We can see that the system has the characteristics of an optimizing system, and we may identify parts, including in this case the person’s liver, that are necessary for these characteristics to exist, but we cannot in general identify any crisp subset of the system as that which is doing the optimization. And picking various subcomponents of the system (such as the person’s liver) and asking “is this the part that is doing the optimization?” does not in general have an answer.
By analogy, suppose we looked at a planet orbiting a star and asked: “which part here is doing the orbiting?” Is it the planet or the star that is the “engine of orbiting”? Or suppose we looked at a car and noticed that the fuel pump is a complex piece of machinery without which the car’s locomotion would cease. We might ask: is this fuel pump the true “engine of locomotion”? These questions don’t have answers because they mistakenly presuppose that we can identify a subsystem that is uniquely responsible for the orbiting of the planet or the locomotion of the car. Asking whether a human liver is an “optimizer” is similarly mistaken: we can see that the liver is a complex piece of machinery that is necessary in order for the overall system to exhibit the characteristics of an optimizing system (robust evolution towards a target configuration set), but beyond this it makes no more sense to ask whether the liver is a true “locus of optimization”.
So rather than answering Filan’s question in either the positive or the negative, the appropriate move is to dissolve the concept of an optimizer, and instead ask whether the overall system is an optimizing system.
Example: the universe as a whole
Consider the whole physical universe as a single closed system. Is this an optimizing system?
The second law of thermodynamics tells us that the universe is evolving towards a maximally disordered thermodynamic equilibrium in which it cycles through various maxentropy configuration. We might then imagine that the universe is an optimizing system in which the basin of attraction is all possible configurations of matter and energy, and the target configuration set consists of the maxentropy configurations.
However, this is not quite accurate. Out of all possible configurations of the universe, the vast majority of configurations are at or close to maximum entropy. That is, if we sample a configuration of the universe at random, we have only an astronomically tiny chance of finding anything other than a closetouniform gas of basic particles. If we define the basin of attraction as all possible configurations of matter in the universe and the target configuration set as the set of maxentropy configurations, then the target configuration set actually contains almost the entirety of the basin of attraction, with the only configurations that are in the basin of attraction but not the target configuration set being the highly unusual configurations of matter containing stars, galaxies, and so on.
For this reason the universe as a whole does not qualify as an optimizing system under our definition. (Or perhaps it would be more accurate to say that it qualifies as an extremely weak optimizing system.)
Power sources and entropy
The second law of thermodynamics tells us that any closed system will eventually tend towards a maximally disordered state in which matter and energy is spread approximately uniformly through space. So if we were to isolate one of the systems explore above inside a sealed chamber and leave it for a very long period then eventually whatever power source we put inside the sealed chamber would become depleted, and then eventually after that every complex material or compound in the system would degrade into its base products, and then finally we would be left with a chamber filled with a uniform gaseous mixture of whatever base elements we originally put in.
So in this sense there are no optimizing systems at all, since any of the systems above evolve towards their target configuration sets only for a finite period of time, after which they degrade and evolve towards a maxentropy configuration.
This is not a very serious challenge to our definition of optimization since it is common throughout physics and computer science to study various “steadystate” or “fixed point” systems even though the same objection could be made about any of them. We say that a thermometer can be used to build a heat regulator that will keep the temperature of a house within a desired range, and we do not usually need to add the caveat that eventually the house and regulator will degrade into a uniform gaseous mixture due to the heat death of the universe.
Nevertheless, two possible ways to refine our definition are:

We could stipulate that some power source is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.

We could specify a finite time horizon and say that “a system is an optimizing system if it tends towards a target configuration set up to time T”.
Connection to dynamical systems theory
The concept of “optimizing system” in this essay is very close to that of a dynamical system with one or more attractors. We offer the following remarks on this connection.

A general dynamical system is any system with a state that evolves over time as a function of the state itself. This encompasses a very broad range of systems indeed!

In dynamical system theory, an attractor is the term used for what we have called the target configuration set. A fixed point attractor is, in our language, a target configuration set with just one element, such as when computing the square root of two. A limit cycle is, in our language, a system that eventually stably loops through a sequence of states all of which are in the target configuration set, such as a satellite in orbit.

We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.

There is a concept of “wellposedness” in dynamical systems theory that justifies the identification of a mathematical model with a physical system. The conditions for a model to be wellposed are (1) that a solution exists (i.e. the model is not selfcontradictory), (2) that there is a unique solution (i.e. the model contains enough information to pick out a single system trajectory), and (3) that the solution changes continuously with the initial conditions (the behavior of the system is not too chaotic). This third condition may present an interesting avenue for future investigation as it seems related to but not quite equivalent to our notion of robustness since robustness as we define it additionally requires that the system continue to evolve towards the same attractor state despite perturbations. Exploring this connection may present an interesting avenue for future investigation.
Conclusion
We have proposed a concept that we call “optimizing systems” to describe systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.
We have analyzed optimizing systems along three dimensions:

Robustness, which measures the number of dimensions along which the system is robust to perturbations, and the magnitude of perturbation along these dimensions that the system can withstand.

Duality, which measures the extent to which an approximate “engine of optimization” subsystem can be identified.

Retargetability, which measures the extent to which the system can be transformed via microscopic perturbations into an equally robust optimizing system but with a different target configuration set.
We have argued that the “optimizer” concept rests on an assumption that optimizing systems can be decomposed into engine and object of optimization (or agent and environment, or mind and world). We have described systems that do exhibit optimization yet cannot be decomposed this way, such as the tree example. We have also pointed out that, even among those systems that can be decomposed approximately into engine and object of optimization (for example, a robot moving a ball around), we will not in general be able to meaningfully answer the question of whether arbitrary subcomponents of the agent are an optimizer not (c.f. the human liver example).
Therefore, while the “optimizer” concept clearly still has much utility in designing intelligent systems, we should be cautious about taking it as a primitive in our understanding of the world. In particular we should not expect questions of the form “is X an optimizer?” to always have answers.
 Utility Maximization = Description Length Minimization by 18 Feb 2021 18:04 UTC; 142 points) (
 Matt Botvinick on the spontaneous emergence of learning algorithms by 12 Aug 2020 7:47 UTC; 138 points) (
 Literature Review on GoalDirectedness by 18 Jan 2021 11:15 UTC; 58 points) (
 Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 58 points) (
 My take on Michael Littman on “The HCI of HAI” by 2 Apr 2021 19:51 UTC; 56 points) (
 Review of ‘But exactly how complex and fragile?’ by 6 Jan 2021 18:39 UTC; 49 points) (
 AXRP Episode 4  Risks from Learned Optimization with Evan Hubinger by 18 Feb 2021 0:03 UTC; 41 points) (
 Algorithmic Intent: A Hansonian Generalized AntiZombie Principle by 14 Jul 2020 6:03 UTC; 34 points) (
 6 Jan 2021 18:40 UTC; 33 points) 's comment on But exactly how complex and fragile? by (
 [AN #105]: The economic trajectory of humanity, and what we might mean by optimization by 24 Jun 2020 17:30 UTC; 24 points) (
 3 Jan 2021 17:34 UTC; 21 points) 's comment on Selection vs Control by (
 Sunday July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stuart_Armstrong by 8 Jul 2020 0:27 UTC; 19 points) (
 Sunday September 27, 12:00PM (PT) — talks by Alex Flint, Alex Zhu and more by 22 Sep 2020 21:59 UTC; 11 points) (
 26 Jun 2020 19:51 UTC; 10 points) 's comment on Risks from Learned Optimization: Conclusion and Related Work by (
 25 Dec 2020 17:14 UTC; 8 points) 's comment on Operationalizing compatibility with strategystealing by (
 21 Jun 2020 20:28 UTC; 3 points) 's comment on Our take on CHAI’s research agenda in under 1500 words by (
 17 Apr 2021 16:33 UTC; 2 points) 's comment on Defining “optimizer” by (
This is excellent, it feels way better as a definition of optimization than past attempts :) Thanks in particular for the academic style, specifically relating it to previous work, it made it much more accessible for me.
Let’s try to build up some core AI alignment arguments with this definition.
Task: A task is simply an “environment” along with a target configuration set. Whenever I talk about a “task” below, assume that I mean an “interesting” task, i.e. something like “build a chair”, as opposed to “have the air molecules be in one of these particular configurations”.
Solving a task: An object O solves a task T if adding O to T’s environment transforms it into an optimizing system for the T’s target configuration set.
Performance on the task: If O solves task T, its performance is quantified by how quickly it reaches the target configuration set, and how robust it is to perturbations.
Generality of intelligence: The generality of O’s intelligence is a function of the number and diversity of tasks T that it can solve, as well as its performance on those tasks.
Optimizing AI: A computer program for which there exists an interesting task such that the computer program solves that task.
This isn’t exactly right, as it includes e.g. accounting programs or video games, which when paired with a human form an optimizing system for correct financials and winning the game, respectively. You might be able to fix this by saying that the optimizing system has to be robust to perturbations in any human behavior in the environment.
AGI: An optimizing AI whose generality of intelligence is at least as great as that of humans.
Argument for AI risk: As optimizing AIs become more and more general, we will apply them to more economically useful tasks T. However, they also become more and more robust to perturbations, possibly including perturbations such as “we try to turn off the AI”. As a result, we might eventually have AIs that form strong optimizing systems for some task T that isn’t the one we actually wanted, which tends to be bad due to fragility of value.
Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.
Argument for mesa optimization: Due to the complexity and noise in the real world, most economically useful tasks require setting up a robust optimizing system, rather than directly creating the target configuration state. (See also the importance of feedback for more on this intuition.) It seems likely that humans will find it easier to create algorithms that then find AGIs that can create these robust optimizing systems, rather than creating an algorithm that is directly an AGI.
(The previous argument also applies: this is basically just a generalization of the previous point to arbitrary AI systems, instead of only deep learning.)
I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach (see also here): the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).
Outer alignment: ??? Seems hard to formalize in this framework. This makes me feel like outer alignment is less important as a concept. (I also don’t particularly like formalizations outside of this framework.)
Inner alignment: Ensuring that (conditional on mesa optimization occurring) the inner AGI is aligned with the operator / user, that is, combined with the user it forms an optimizing system for “doing what the user wants”. (Note that this is explicitly not intent alignment, as it is hard to formalize intent alignment in this framework.)
Intent alignment: ??? As mentioned above, it’s hard to formalize in this framework, as intent alignment really does require some notion of “motivation”, “goals”, or “trying”, which this framework explicitly leaves out. I see this as a con of this framework.
Expected utility maximization: One particular architecture that could qualify as an AGI (if the utility function is treated as part of the environment, and not part of the AGI). I see the fact that EU maximization is no longer highlighted as a pro of this approach.
Wireheading: Special case of the argument for AI risk with a weird task of “maximize the number in this register”. Unnatural in this framing of the AI risk problem. I see this as a pro of this framing of the problem, though I expect people disagree with me on this point.
Thanks for the very thoughtful comment Rohin. I was on retreat last week after I published the article and upon returning to computer usage I was delighted by the engagement from you and others.
I like this.
We’ll presumably need to give O some information about the goal / target configuration set for each task. We could say that a robot capable of moving a vase around is a little bit general since we can have it solve the tasks of placing the vase at many different locations by inputting some latitude/longitude into some appropriate memory location. But this means we’re actually pasting in a different object O for each task T—each of the objects differs in those memory locations into which we’re pasting the latitude/longitude. It might be helpful to think of a “agent schema” function that maps goals to objects, so we take the goal part of the task, compute the object O for that goal, then paste this object into the environment.
It’s also important that O be able to solve the task for a reasonably broad range of environments.
Perhaps we could look at it this way: take a system containing a human that is trying to get something done. This is presumably an optimizing system as humans often robustly move their environment towards some desired target configuration set. Then an inneraligned AI is an object O such that adding it to this environment does not change the target configuration set, but does change the speed and/or robustness of convergence to that target configuration set.
Yup very difficult to say much about intentions using the pure outside view approach of this framework. Perhaps we could say that an intentaligned AI is an inneraligned AI modulo less robustness. Or perhaps we could say that an intentaligned AI is an AI that would achieve the goal in a large set of benign environments, but might not achieve it in the presence of unlikely mistakes, unlikely environmental conditions, or the presence of other powerful basins of attraction.
But this doesn’t really get at the spirit of Paul’s idea, which I think is about really looking inside the AI and understanding its goals.
+1 to all of this.
I was imagining that the tasks can come equipped with some specification, but some sort of counterfactual also makes sense. This also gets around issues of the AI system not being appropriately “motivated”—e.g. I might be capable of performing the task “lock up puppies in cages”, but I wouldn’t do it, and so if you only look at my behavior you couldn’t say that I was capable of doing that task.
+1 especially to this
Mild optimization: the easiest way to solve hard tasks may be to specify a proxy, which an AI maximizes. The AI steers into configurations which maximize the proxy function. Simple proxies don’t usually have target sets which we like, because human value is complex. However, maybe we just want the AI to randomly select a configuration which satisfies the proxy, instead of finding the maximallyproxyness configuration, which may be bad due to extremal Goodhart.
Quantilization tries to solve this by randomly selecting a target configuration from some top quantile, but this is sensitive to how world states are individuated.
This makes sense, but I think you’d need a different notion of optimizing systems than the one used in this post. (In particular, instead of a target configuration set, you want a continuous notion of goodness, like a utility function / reward function.)
I’m saying the target set for nonmild optimization is the set of configurations which maximize proxyness. Just take the argmax. By contrast, we might want to sample uniformly randomly from the set of satisficing configurations, which is much larger.
(This is assuming a fixed initial state)
It sounds like you’re assuming that the target configuration set is built into the AI system. According to me, a major point of this post / framework is to avoid that assumption altogether, and only describe problems in terms of the actual observed system behavior.
(This is why within this framework I couldn’t formalize outer alignment, and why wireheading and the search / mesaobjective split is unnatural.)
I see the tension you’re pointing at. I think I had in mind something like “an AI is reliably optimizing utility function u over the configuration space (but not necessarily over universehistories!) if it reliably moves into highrated configurations”, and you could draw different epsilonneighborhoods of optimality in configuration space. It seems like you should be able to talk about dogmaximizers without requiring that the agent robustly end up in the maximumdog configurations (and not in maxminusonedog configs).
I’m still confused about parts of this.
I don’t quite understand what this is saying.
Suppose we train a giant deep learning model via selfsupervised learning on a ton of realworld data (like GPTN, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model.
We’d give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be handcoded.)
It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human.
And since most of the smarts seem like they’d be in the model rather than the interface, I’d count it as getting AGI “primarily via deep learning”, even if the interface was handcoded.
But it’s not clear to me whether that would count as using deep learning to “create a new optimizing AI system”, which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn’t seem to have the flavor of mesaoptimization, as I understand it. So it seems like a contradiction to the quoted claim.
Have I misunderstood what you’re saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesaoptimization?)
Yeah, I’m talking about the whole system.
Yeah, I agree it doesn’t fit the explanation / definition in Risks from Learned Optimization. I don’t like that definition, and usually mean something like “running the model parameters instantiates a computation that does ‘reasoning’”, which I think does fit this example. I mentioned this a bit later in the comment:
I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:
This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downside of being clearly laid out, however, is that it makes critique easier. I have a few thoughts about the reasoning in the essay.
The first thing I will note is that the essay gives three definitions for an optimizing system. These definitions are close, but not exactly equivalent. The nuances can be important. For example, that the target configuration set and the basin of attraction cannot be equal is obvious; that is made explicit in definition 3, but only implied in definitions 1 and 2. A bigger issue is that there are no criteria or rationale for their extent and relative size.
For example, the essay offers two reasons why the posterchild of nonoptimizers—the bottle with a cap—is not an optimizing system; they both arise from the rather arbitrary definition of the basin of attraction as equal to the target configuration set. I see no necessary reason why the basin of attraction couldn’t be defined as the set of all configurations of water molecules both inside and outside the bottle. That way, the definitional requirement of a target configuration set smaller than the basis of attraction is met. The important point is: will water molecules in this new, larger basin of attraction tend to the target configuration set?
Let’s suppose that capped bottle is in a sealed room (not necessary but easier to think about), and that the cap is made of a special material that allows water molecules to pass through it in only one direction: from outside the bottle to inside. The water molecules inside the bottle stay inside the bottle, as for any cap. The water molecules inside the room, but outside the bottle, are zooming about (thermodynamic energy), bouncing off the walls, each other, and the bottle. Although it will take some time, sooner or later all the molecules outside the bottle will hit the bottle cap, go through, and be trapped in the bottle. Voila!
Originally, the bottlewithacap system was a nonoptimizing system by definition; the bottle cap type was irrelevant and could have been the rather special one I described. Simply by changing the definition of the basin of attraction, we could turn it into an optimizing system. Further, the original, “nonoptimizing” system (with the original definitions of the basin of attraction and target set) would have behaved exactly the same as my optimizing system. On the other hand, changing the bottle cap from our special one to a regular cap will change the system into a nonoptimizing system, regardless of the definitions of the basin of attraction and the target configuration set. Perhaps, we should insist that a properly formed system description has a basin of attraction that is larger than the target set, and count on the system behavior to make the optimizing/nonoptimizing distinction.
Definitions 1 and 2 both contain the phrase “a small set of target configurations” which implies that the target set << than the basin of attraction. This is a problem for the notion of the universe as a system with maximum entropy as the target configuration set because the target set is most of the possible configurations. For this reason, the essay’s author concludes that universewithentropy system is not an optimizing system, or at best, a weak one. Stars, galaxies, black holes – there are strong forces that pull matter into these structures. I would say that any system that has succeeded in getting nearly everything within the basin of attraction into the target configuration is a strong optimizer!
Regardless of the way we chose to think about strong or weak, the universe is a system that tends to a set of configurations smaller than the set of possible configurations despite perturbations (the occasional housebuilding project for example!). Personally, I see no value in a definitional limitation. The behavior of the system (tending toward a smaller set of configurations out of a larger set) should govern the definition of an optimizing system, regardless of relative sizes of the sets.
Between the universewithentropy and bottlewithacap systems, I question the utility of the “all configurations >= basin of attraction >> target set configuration” structure in the definition of optimizing systems. I believe it is worth thinking about what the necessary relationships among these configurations are, and how they are chosen.
The example of the billiards system raised another (to me) interesting question. The essay did not offer a system description but says “Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration…. If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations.”
This example has some odd features. Friction between the balls and the table surface, along with the loss of energy during nonelastic collisions, cause the balls to slow down and stop. The minutia of their travels determines where they stop. The final arrangement is unpredictable (ok, it could be modeled given complete information, but let’s skip that as beside the point), and any arrangement is as likely as another. This suggests that the billiards system is a nonoptimizing system even without the proposed perturbation of moving the balls around while the balls are in motion.
Looked at another way, billiards system does tend to a certain target configuration set, while friction and the nonelasticity of the collisions are perturbations. If we make the surface frictionless and the collisions perfectly elastic, the balls will bounce around the table without stopping. Much like the water molecules in the bottlewithacap example, each will eventually fall into one pocket or another during its travels. Once in the pocket, the ball cannot get out, and thus eventually all will end up in the pockets. So, this system tends to a target configuration set of all balls in pockets.
Adding back in the perturbing friction and energy loss does not mean that this system is not tending to the target configuration set. Reaching in and moving a ball to a different point, or even redirecting any ball heading for a pocket, will not keep this system from tending towards the target configuration. It seems as though the billiards system was an optimizing system all along! The larger point is that it seems, by definition, an optimizing system is an optimizing system even if there are a set of perturbations that prevent it from ever reaching the target configuration! “Tending toward”, not “reaching”, a target configuration set is in all three definitions. It is worth thinking about an optimizing system that never actually optimizes. This may have some bearing on the AGI question.
[And for you readers who, like me, would say, whoa—it is possible that the balls will enter some repeating pattern of motion where some do not enter pockets. Maybe we need a robot to move the balls around randomly if they seem stuck, just like the ballinvalley+robot system where the robot moves the ball over barriers. I maintain that the point is the same.]
The satellite system illustrates (perhaps an obvious point) that the definition of the target configuration set can change a single system from optimizing and to nonoptimizing. What is a little more subtle is that the definition of the system boundaries is essential to the characterization of the system as optimizing or nonoptimizing, even if the behavior of the system is the same under both definitions. In particular, what we consider to be part of the system and what is considered to be a perturbation can flip a system between characterizations. [This latter point is illustrated by the billiards system as well, as I will explain below.]
The essay says that a satellite in orbit is a nonoptimizing system because if its position or velocity is perturbed, it has no tendency to return to its original orbit; that is, the author defines the target configuration as a particular orbit. With respect to another target configuration that may be described as “a scorched pile of junk on the surface of the Earth”, a satellite in orbit is an optimizing system exactly like a ball in a valley. As soon as the launch rocket stops firing, a satellite starts falling to the center of the earth because atmospheric drag and solar radiation pressure continuously decrease the component of the satellite’s velocity perpendicular to the force of gravity. So, unless a perturbation is big enough to send it out of orbit altogether, a satellite tends towards a target configuration of junk located on Earth’s surface.
Since a particular orbit is usually the desired target configuration (!), many satellites incorporate a rocket system to force them to stay in a chosen orbit. If a rocket system is included in the system definition, then the satellite is an optimizing system relative to the desired orbit. What is a little more interesting, with respect to the junkontheEarth target set, drag and solar pressure are the part of the optimizing system; an orbit correction system is a perturbation. If the target set is the particular orbit the satellite started in, these definitions swap.
This observation has bearing on the billiards system example. If we include drag and nonelastic collisions as part of the billiards system, then the system is nonoptimizing. If we see them as perturbations outside the system, then the billiards system is optimizing. I find this flexibility as a little curious, although I haven’t completely thought through the implications.
A completely different sort of question is suggested by the section on Drexler. There the essay sets out a hierarchy of all AI systems, optimizing systems, and goaldirected agent systems. This makes sense with respect to AI systems, but I do not see how optimizing systems, as defined, can be wholly contained within the category of AI systems, unless you define AI systems pretty broadly. For example, I think that pretty much any control system is an optimizing system by the definition in the essay. If we accept this definition of optimizing system, and hold that all optimizing systems are a subset of AI systems, do we have to accept our thermostats as AI systems? What about the program that determined the square root of 2? Is that AI? Is this an issue for this definition, or does its broadness matter in an AI context?
And a nitpick: The first example of an optimizing system offered in the essay is a program calculating the square root of 2. It meets the definition of an optimizing system, but it seems to contradict the earlier assertion that “… optimizing systems are not something that are designed but are discovered.” The algorithm and the program were both designed. I’m not sure why this point is necessary. Either I do not understand something fundamental, or the only purpose of the statement of discovery is to give people like me something to argue about!
In summary, the definition in the essay suggests a few questions that could have a bearing on its application:
How do we choose the basis of attraction relative to the target configuration set, if our choice can change the status of the system from optimizing to nonoptimizing and vice versa?
Is it an issue that an optimizing system may never actually optimize?
How do we choose what is part of the system versus a perturbation outside the system when our choice changes the status of the system as optimizing or nonoptimizing?
All control systems are optimizing systems by the definition, but are all control systems AI systems? Does it matter? If it does matter, how do we tell the difference?
For any of these, how do they affect our thinking for AI?
Finally, it might be better to have one, consistent definition that covers all the possibilities, including (in my opinion) that perturbations may be confined to certain dimensions.
This was actually part of a conversation I was having with this colleague regarding whether or not evolution can be viewed as an optimization process. Here are some followup comments to what she wrote above related to the evolution angle:
We could define the natural selection system as:
All configurations = all arrangements of matter on a planet (both arrangements that are living and those that are nonliving)
Basis of attraction = all arrangements of matter on a planet that meet the definition of a living thing
Target configuration set = all arrangements of living things where the type and number of living things remains approximately stable.
I think that this system meets the definition of an optimizing system given in the Ground for Optimization essay. For example, predator and prey coevolve to be about “equal” in survival ability. If a predator become so much better than its prey that it eats them all, the predator will die out along with its prey; the remaining animals will be in balance. I think this works for climate perturbations, etc. too.
HOWEVER, it should be clear that there are numerous ways in which this can happen – like the ball on bumpy surface with a lot of convex “valleys” (local minima), there is not just one way that living things can be in balance. So, to say that “natural selection optimized for intelligence” is quite not right – it just fell into a “valley” where intelligence happened. FURTHER, it’s not clear that we have reached the local minimum! Humans may be that predator that is going to fall “prey” to its own success. If that happened (and any intelligent animals remain at all), I guess we could say that natural selection optimized for lessthanhuman intelligence!
Further, this definition of optimization has no connotation of “best” or even better – just equal to a defined set. The word “optimize” is loaded. And its use in connection with natural selection has led to a lot of trouble in terms of human races, and humans v. animal rights.
Finally, in the essay’s definition, there is no imperative that the target set be reached. As long as the set of living things is “tending” toward intelligence, then the system is optimizing. So even if natural selection was optimizing for intelligence there is no guarantee that it will be achieved (in its highest manifestation). Like a billiards system where the table is slick (but not frictionless) and the collisions are close to elastic, the balls may come to rest with some of the balls outside the pockets. The reason I think this is important for AI research, especially AGI and ASI, is perhaps we should be looking for those perturbations to prevent us from ever reaching what we may think of as the target configuration, despite our best efforts.
This seems great, I’ll read and comment more thoroughly later. Two quick comments:
It didn’t seem like you defined what it meant to evolve towards the target configuration set. So it seems like either you need to commit to the system actually reaching one of the target configurations to call it an optimiser, or you need some sort of metric over the configuration space to tell whether it’s getting closer to or further away from the target configuration set. But if you’re ranking all configurations anyway, then I’m not sure it adds anything to draw a binary distinction between target configurations and all the others. In other words, can’t you keep the definition in terms of a utility function, but just add perturbations?
Also, you don’t cite Dennett here, but his definition has some important similarities. In particular, he defines several different types of perturbation (such as random perturbations, adversarial perturbations, etc) and says that a system is more agentic when it can withstand more types of perturbations. Can’t remember exactly where this is from—perhaps The Intentional Stance?
+1 for swapping out the target configuration set with a utility function, and looking for a robust tendency for the utility function to increase. This would also let you express mild optimization (see this thread).
Would this work for highly nonmonotonic utility functions?
It would work at least as well as the original proposal, because your utility function could just be whatever metric of “getting closer to the target states” would be used in the original proposal.
Thanks for the post, this is my favourite formalisation of optimisation so far!
One concern I haven’t seen raised so far, is that the definition seems very sensitive to the choice of configuration space. As an extreme example, for any given system, I can always augment the configuration space with an arbitrary number of dummy dimensions, and choose the dynamics such that these dummy dimensions always get set to all zero after each time step. Now, I can make the basin of attraction arbitrarily large, while the target configuration set remains a fixed size. This can then make any such dynamical system seem to be an arbitrarily powerful optimiser.
This could perhaps be solved by demanding the configuration space be selected according to Occam’s razor, but I think the outcome still ends up being prior dependent. It’d be nice for two observers who model optimising systems in a systematically different way to always agree within some constant factor, akin to Kolmogorov complexity’s invariance theorem, although this may well be impossible.
As a less facetious example, consider a computer program that repeatedly sets a variable to 0. It seems again we can make the optimising power arbitrarily large by making the variable’s size arbitrarily large. But this doesn’t quite map onto the intuitive notion of the “difficulty” of an optimisation problem. Perhaps including some notion of how many other optimising systems would have the same target set would resolve this.
Curated. Come on dude, stop writing so many awesome posts so quickly, it’s too much.
This is a central question in the science of agency and optimization. The proposal is simple, you connected it to other ideas from Drexler and Demski+Garrabrant, and you gave a ton of examples of how to apply the idea. I generally get scared by the academic style, worried that the authors will fill out the text and make it really hard to read, but this was all highly readable, and set its own context (reexplaining the basic ideas at the start). I’m looking forward to you discussing it in the comments with Ricraz, Rohin and John.
Please keep writing these posts!
Thank you Ben. Reading this really filled me with joy and gives me energy to write more. Thank you for your curation work—it’s a huge part of why there is this place for such high quality discussion of topics like this, for which I’m very grateful.
You’re welcome :)
Seconded that the academic style really helped, particularly discussing the problem and prior work early on. One classic introduction paragraph that I was missing is “what have prior works left unaddressed?”.
Two examples which I’d be interested in your comments on:
1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).
2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the “target states”, since whatever state I’m in, I’ll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).
On the topic of the black hole...
There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”.
All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.
Great examples! Thank you.
Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet.
This seems to me to be a lot like a ball rolling down a hill: a black hole doesn’t seem alive or agentic, and it doesn’t really respond in any meaningful way to hurdles put in its way, but yes it does qualify as an optimizing system. For this reason my definition isn’t yet a very good definition of what agency is, or what postagency concept we should adopt. I like Rohin’s comment on how we might view agency in this framework.
Yes it’s true that using a set of target states rather than an ordering over states means that we can’t handle cases where there is a direction of optimization but not a “destination”. But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say “well it should be an ordering over states with a compact representation” or “it should be more compact than competing explanations”. This may be okay but it seems quite dicey to me.
It actually seems quite important to me that the definition point to systems that “get back on track” even when you push them around. It may be possible to do this with an ordering over states and I’d love to discuss this more.
Hmmm, I’m a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I’m assuming that we’re ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states).
As a more general comment: I suspect that what starts to happen after you start digging into what “perturbation” means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a nonoptimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the preexisting concept of which subsystems are doing the optimisation.
My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we’ve ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it’s better to be vaguely right than precisely wrong. Unfortunately I haven’t written much about this approach publicly—I briefly defend it in a comment thread on this post though.
FYI: I think something got messed up with this link. The text of the link is a valid url, but it links to a mangled one (s.t. if you click it you get a 404 error).
That’s weird; thanks for the catch. Fixed.
Yes you’re right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degree of optimization is zero.
This discussion is updating me in the direction that a preference ordering formulation is possible, but that we need some analogy for “degree of optimization” that captures how “tight” or “constrained” the system’s evolution is relative to the size of the basin of attraction. We need a way to say that a constant utility function corresponds to a degree of optimization equal to zero. We also need a way to handle the case where our utility function assigns utility proportional to entropy, so again we can describe all physical systems as optimizing systems and thermodynamics ensures that we are correct. This utility function would be extremely flat and wide, with most configurations receiving nearidentical utility (since the high entropy configurations constitute the vast majority of all possible configurations). I’m sure there is some way to quantify this—do you know of any appropriate measure?
The challenge here is that in order to actually deal with the case you mentioned originally—the goal of moving as fast as possible—we need a measure that is not based on the size or curvature of some local maxima of the utility function. If we are working with local maxima then we are really still working with systems that evolve towards a specific destination (although there still may be advantages to thinking this way rather than in terms of a binary set).
Nice—I’d love to hear more about this
Doesn’t the setoftargetstates version have just the same issue (or an analogous one)?
For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?
Well most system don’t have a tendency to evolve towards any small set of target states despite perturbations. Most systems, if you perturb then, just go off in some different direction. For example, if you perturb most running computer programs by modifying some variable with a debugger, they do not selfcorrect. Same with the satellite and billiard balls example. Most systems just don’t have this “attractor” dynamic.
Hmm, I see what you’re saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it.
So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that’s quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that’s quantified).
EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states.
And if your utility function isn’t simple, maybe you wouldn’t get a small set of target states when you do the conversion, and vice versa?
I’d say the utility function needs to contain one or more local optima with large basins of attraction that contain the initial state, not that the utility function needs to be simple. The simplest possible utility function is a constant function, which allows the system to wander aimlessly and certainly not “correct” in any way for perturbations.
Ah, good points!
My biggest objection to this definition is that it inherently requires time. At a bare minimum, there needs to be an “initial state” and a “final state” within the same state space, so we can talk about the system going from outside the target set to inside the target set.
One class of cases which definitely seem like optimization but do not satisfy this property at all: oneshot noniterative optimization. For instance, I could write a convex function optimizer which works by symbolically differentiating the objective function and then algebraically solving for a point at which the gradient is zero.
Is there an argument that I should not consider this to be an optimizer?
Fascinating—but why is this an objection? Is it just the inelegance of not being able to look at a single time slice and answer the question of whether optimization is happening?
Yes this is a fascinating case! I’d like to write a whole post about it. Here are my thoughts:
First, just as a fun fact, not that it’s actually extremely rare to see any noniterative optimization in practical usage. When we solve linear equations, we could use gaussian elimination but it’s so unstable that in practice we use, most likely, the SVD, which is iterative. When we solve a system of polynomial equation we could use something like a Grobner basis or the resultant, but it’s so unstable that in practice we something like a companion matrix method, which comes down to an eigenvalue decomposition, which is again iterative.
Consider finding the roots of a simple quadratic equation (ie solving a cubic optimization problem). We can use the quadratic equation to do this. But ultimately this comes down to computing a square root, which is typically (though not necessarily) solved with an iterative method.
That these methods (for solving linear systems, polynomial systems, and quadratic equations) have at their heart an iterative optimization algorithm is not accidental. The iterative methods involved are not some small or sideline part of what’s going on. In fact when you solve a system of polynomial equations using a companion matrix, you spend a lot of energy rearranging the system into a form where it can be solved via an eigenvalue decomposition, and then the eigenvalue decomposition itself is very much operating on the full problem. It’s not some unimportant side operation. I find this fascinating.
Nevertheless it is possible to solve linear systems, polynomial systems etc with noniterative methods.
These methods are definitely considered “optimization” by any normal use of that term. So in this way my definition doesn’t quite line up with the common language use of the word “optimization”.
But these noniterative methods actually do not have the core property that I described in the squarerootoftwo example. If I reach in and flip a bit while a Guassian elimination is running, the algorithm does not in any sense recover. Since the algorithm is just performing a linear sequence of steps, the error just grows and grows as the computation unfolds. This is the opposite of what happens if I reach in and flip a bit while an SVD is being computed: in this case the error will be driven back to zero by the iterative optimization algorithm.
You might say that my focus on errorcorrection simply doesn’t capture the common language use of the term optimization, as demonstrated by the fact that noniterative optimization algorithms do not have this errorcorrecting property. You would be correct!
But perhaps my real response is that fundamentally I’m interested in these processes that somewhat mysteriously drive the state of the world towards a target configuration, and keep doing so despite perturbations. I think these are central to what AI and agency are. The term “optimizing system” might not be quite right, but it seems close enough to be compelling.
Thanks for the question—I clarified my own thinking while writing up this response.
Another big thing to note in examples like e.g. iteratively computing a square root for the quadratic formula or iteratively computing eigenvalues to solve a matrix: the optimization problems we’re solving are subproblems, not the original full problem. These crucially differ from most of the examples in the OP in that the system’s objective function (in your sense) does not match the objective function (in the usual intuitive sense). They’re iteratively optimizing a subproblem’s objective, not the “full” problem’s objective.
That’s potentially an issue for thinking about e.g. AI as an optimizer: if it’s using iterative optimization on subproblems, but using those results to perform some higherlevel optimization in a noniterative manner, then aligning the sobproblemoptimizers may not be synonymous with aligning the full AI. Indeed, I think a lot of reasoning works very much like this: we decompose a highdimensional problem into coupled lowdimensional subproblems (i.e. “gears”), then apply iterative optimizers to the subproblems. That’s exactly how eigenvalue algorithms work, for instance: we decompose the full problem into a series of optimization subproblems in narrower and narrower subspaces, while the “highlevel” part of the algorithm (i.e. outside the subproblems) doesn’t look like iterative optimization.
No, the issue is that the usual definition of an optimization problem (e.g. maxx f(x)) has no builtin notion of time, and the intuitive notion of optimization (e.g. “the system makes Y big”) has no builtin notion of time (or at least linear time). It’s this really fundamental thing that isn’t present in the “original problem”, so to speak; it would be very surprising and interesting if time had to be involved when it’s not present from the start.
If I specifically try to brainstorm thingswhichlooklikeoptimizationbutdon’tinvolveobjectiveimprovementovertime, then it’s not hard to come up with examples:
Rather than a functionvalue “improving” along linear time, I could think about a function value improving along some tree or DAG—e.g. in a heap data structure, we have a tree where the “function value” always “improves” as we move from any leaf toward the root. There, any path from a leaf to the root could be considered “time” (but the whole set of nodes at the “same level” can’t be considered a timeslice, because we don’t have a meaningful way to compare whole sets of values; we could invent one, but it wouldn’t actually reflect the tree structure).
The example from the earlier comment: a oneshot noniterative optimizer
A distributed optimizer: the system fans out, tests a whole bunch of possible choices in parallel, then selects the best of those.
Various flavors of constraint propagation, e.g. the simplex algorithm (and markets more generally)
I think this is covered in my view of optimization via selection, where “direct solution” is the third option. Any oneshot optimizer is implicitly relying on an internal model completely for decision making, rather than iterating, as I explain there. I think that is compatible with the model here, but it needs to be extended slightly to cover what I was trying to say there.
This model is explicitly requiring that you deal only with physical processes, so your convex function solver would require time to get from the starting state to the end state. If it is happening noniteratively then it would cease to be an optimizing system after it has completed the function, since there is no longer a target configuration.
I’m not sure what you’re trying to say here. What’s the state space (in which both the start and end state of the optimizer live), what’s the basin of attraction (i.e. set of allowed initial conditions), and what’s the target region within the state space? And remember, the target region needs to be a subset of the allowed initial conditions.
This end state state is the solution to the convex function being stored in some physical registers. The initial state is those registers containing arbitrary data to be overwritten. It’s not particularly interesting as optimization problems go (not a very large basin of attraction) but it fulfills the basic criteria.
The unique thing about your example is that it solves once and then it is done (relative to the examples in the post), so it ceases to be an optimizing system once it finishes computing the solution to your convex function.
With a slight modification, you could be repeating this algorithm in a loop so it constantly recalculates a new function. Now the initial state can be some value in the result and input registers, and the target region is the set of input equations and appropriate solution in the output registers. It widens the basin of attraction to both the input and output registers rather than just the output.
Ok, two problems with this:
There’s no reason why that target set would be smaller than the basin of attraction. Given one such optimization problem, there are no obvious perturbations we could make which would leave the result in the target region.
The target region is not a subset of the basin of attraction. The system doesn’t evolve from a larger region to a smaller subset (as in the Venndiagram visuals in the OP), it just evolves from one set to another.
The first problem explicitly violates the OP’s definition of an optimizer, and the second problem violates one of the unspoken assumptions present in all of the OP’s examples.
I don’t believe that either of these points are true. In your original example, there is one correct solution for any convex function. I will assume there is a single hardcoded function for the following, but it can be extended to work for an arbitrary function.
The output register having the correct solution is the target set.
The output register having any state is the basin of attraction.
Clearly any specific number (or rather singleton of that number) is a subset of all numbers, so the target is a subset of the basin. And further, because “all numbers” has more than one element, the target set is smaller than the basin.
This argument applies to literally any deterministic program with nonempty output. Are you saying that every program is an optimizer?
Pretty much, yes, according to definition given. Like I said, not a particularly interesting optimization but an optimization none the less.
To extend on this, the basin of optimization is not any smaller than an iterative process acting on a single register (and if you loop the program, then the time horizon is the same). In both cases your basin is anything in that register and the target state is one particular number in that register. As far as I can tell the definition doesn’t have any way of saying that one is “more of an optimizer” than the other. If anything, the fixed output is more optimized because it arrives more quickly.
Ok, well, it seems like the oneshot noniterative optimizer is an optimizer in a MUCH stronger sense than a random program, and I’d still expect a definition of optimization to say something about the sense in which that holds.
I think this is great.
I would want to relate it to a few key points out which I tried to address in a few earlier posts. Principally, I discussed selection versus control, which is about the difference between what optimization does externally, and how it uses models and testing. This related strongly to your conception of an optimizing system, but focused on how much of the optimization process occurs in the system versus in the agent itself. This is principally important because of how it relates to misalignment and Goodharting of various types.
I had hopes to further apply that conceptual model to measoptimization, but I was a bit unsure how to think about it, and have been working on other projects. At this point, I think your discussion is probably a better conceptual model than the one I was trying to build there—it just needs to be slightly extended to cover the points I was trying to work out in those posts. I’d like to think about how it relates to mesaoptimization as well, but I’m unlikely to actually work on that
Very good. A lot of potential there, I feel.
Planned summary for the Alignment Newsletter:
Planned opinion:
I’m not sure this is true, at least not in the sense that we usually think about “goaldirected agent systems”.
You make a case that there’s no distinct subsystem of the tree which is “doing the optimizing”, but this isn’t obviously relevant to whether the tree is agenty. For instance, the tree presumably still needs to model its environment to some extent, and “make decisions” to optimize its growth within the environment—e.g. new branches/leaves growing toward sunlight and roots growing toward water, or the tree “predicting” when the seasons are turning and growing/dropping leaves accordingly.
One to think about whether “the set of optimizing systems is smaller than the set of all AI services, but larger than the set of goaldirected agentic systems” is that it’s equivalent to Scott’s (open) question does agentlike behavior imply agentlike architecture?
At first I particularly liked the idea of identifying systems with “an optimizer” as those which are robust to changes in the object of optimization, but brittle with respect to changes in the engine of optimization.
On reflection, it seems like a useful heuristic but not a reliable definition. A counterexample: suppose we do manage to build a robust AI which maximizes some utility function. One desirable property of such an AI is that it’s robust to e.g. one of its servers going down or corrupted data on a hard drive; the AI itself should be robust to as many interventions as possible. Ideally it would even be robust to minor bugs in its own source code. Yet it still seems like the AI is the “engine”, and it optimizes the rest of the world.
Yeah I agree that duality is not a good measure of whether a system contains something like an AI. There is one kind of AI that we can build that is highly dualistic. Most presentday AI systems are quite dualistic, because they are predicated on having some robust compute infrastructure that is separate from and mostly unperturbed by the world around it. But there is every reason to go beyond these dualistic designs, for precisely the reason you point to: such systems do tend to be somewhat brittle.
I think it’s quite feasible to build highly robust AI systems, although doing so will likely require more than just hardening (making it really unlikely for the system to be perturbed). What we really want is an AI system where the core AI itself tends to evolve back to a stable configuration despite perturbations to its core infrastructure. My sense is that this will actually require a significant shift in how we think about AI—specifically moving from the agent model to something that captures what is good and helpful in the agent model but discards the dualistic view of things.
This is excellent! Very well done, I would love to see more work like this.
I have a whole bunch of things to say along separate directions so I’ll break them into separate comments. This first one is just a couple minor notes:
For the universe section, the universe doesn’t push “toward” maxent, it just wanders around and usually ends up in maxent states because that’s most of the states. The basin of attraction includes all states.
Regarding “whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions”, I believe there’s an old theorem that the longterm behavior of dynamical systems on a compact space is always ergodic on some manifold within the space. That manifold has a name which I don’t remember, which is probably what you want to look for.
Does “ergodic on some manifold” here mean it approaches every point within the manifold, as in the ergodicity assumption, or does it mean described by an ergodic function? I realize the latter implies the former, but what I am driving at is the behavior vs. the formalism.
Not sure.
This post reminds me of thinking from 1950s when people taking inspiration from Wiener’s work on cybernetics tried to operationalize “purposeful behavior” in terms of robust convergence to a goal state:
https://heinonline.org/HOL/Page?collection=journals&handle=hein.journals/josf29&id=48&men_tab=srchresults
> When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.
I appreciate the direct attention to this process as an important instance of optimization. The first talk I ever gave in the EECS department at UC Berkeley (to the full EECS faculty) included a diagram of Earth drifting out of the region of phase spare where humans would exist. Needless to say, I’d like to see more explicit consideration of this type of scenario.
This seems like a good definition of optimization for algorithmic systems, but I don’t see how it works for physical systems. Going by the primary definition,
But in the physical world, there are literally zero closed systems with this property. Entropy always increases*, and the target configuration set will never be smaller than the basin of attraction. The dirtplusseedplussunlight system has a vastly smaller configuration space than the dirtplustreeplusheat system. Perhaps one could object that one should discount the incoming sunlight and outgoing heat since the system isn’t really closed, but then consider a very similar system consisting of only dirt, air, and fungal spores. Surely if a growing tree is an optimizing system, then a growing mushroom in a closed system is an optimizer too. But the entropy increase in the latter case is unambiguous: the number of ways to arrange atoms into a fully grown mushroom is again vastly larger than the number of ways to configure atoms into dirt without mushrooms but with the nutrients to grow them.
It may be possible to get around this by redefining configuration spaces that better match our intuition (it does seem like a mushroom is more special than dirt), but I don’t see any way to do this rigorously.
*or, at least, entropy always tends to increase.
I agree that closed physical systems aren’t optimizing systems. It seems like the first patch given by the author works when worded more carefully: “We could stipulate that some [lowentropy] power source [and some entropy sink] is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.”
Then an optimizing system with X bits of “optimization power” (which is log(target states / basin of attraction size) or something) has to sink at least X bits, and this seems like it works. Maybe it gets hard to rigorously define the exact form of the power source and entropy sink though? Disclaimer: I don’t know statistical mechanics.
FYI, it seems pretty clear to me that a liver should be considered an optimiser: as an organ in the human body, it performs various tasks mostly reliably, achieves homeostasis, etc. The question I was rhetorically asking was whether it is an optimiser of one’s income, and the answer (I claim) is ‘no’.
It always gives the same answer for the last digit?
Well we could always just set the last digit to 0 as a postprocessing step to ensure perfect repeatability. But point taken, you’re right that most numerical algorithms are not quite as perfectly stable as I claimed.
Great post.
I’m not keen on the requirement that the basin of attraction be strictly larger than the target configuration set. I don’t think this buys you much, and seems to needlessly rule out goals based on narrow maintenance of some statusquo. Switching to a utility function as suggested by others improves things, I think.
For example: a highly capable AI whose only goal is to maintain a chess set in a particular position for as long as possible, but not to care about it after it’s disturbed.
Here the target set is identical to the basin of attraction: states containing the chess set in the particular position (or histories where it’s remained undisturbed).
This doesn’t tell us anything about what the AI will do in pursuing this goal. It may not do much until something approaches the board; it may rearrange the galaxy to minimise the chances that a piece will be moved (but arbitrarily small environmental changes might have it take very different actions, so in general we can’t say it’s optimising for some particular configuration of the galaxy).
I want to say that this system is optimising to keep the chess set undisturbed.
With utility you can easily represent this goal, and all you need to do is compare unperturbed utility with the utility under various perturbations.
Something like: The system S optimises U 𝛿robustly to perturbation x if E[U(S)]  E[U(x(S))] < 𝛿
Truly a joy to read! Thank you.
The information theoretic measure of individuality attempts to answer exactly this type of question.
From this view, a set of components (the system) is decomposed into two subsets (subsystem + environment). The proposed subsystem is assigned a degree of individuality by measuring the amount of information it shares with its future state, optionally conditioned on its environment. This leads to 2 types of individuality. The first type says that a proposed subsystem is individualistic to the degree that the subsystem is predictive of its future state after accounting for the information in the environment. The second type captures the notion of inseparability by assigning a high degree of individuality to subsystems that are strongly coupled with their environment in such a way that neither the subsystem nor environment alone are predictive of the next state of the subsystem.
For example, considering the set of atoms making up the space containing the robotoptimizer and vase, the set of robotatoms retains the desired properties of an optimizer, and is also highly individualistic in the first sense since knowing the state of the robot atoms tells you a lot about their next state, but knowing about the set of nonrobot atoms tells you very little about the state of the robot. On the other hand, considering the set of atoms making up the tree, the system as a whole is an optimizing system, but no individual subset of atoms accomplishes the target of the larger optimizing system.
Thank you for the pointer to this terminology. It seems relevant and I wasn’t aware of the terminology before.