Adaptation Executors and the Telos Margin

Plinthist20 Jun 2022 13:06 UTC

2 points

Thank you to Justis Mills for feedback on a draft of this post.

You’ll often hear this bit of wisdom: “Humans are not utility optimizers, but rather adaptation executors.” At first glance, it seems to be pretty self-explanatory. Humans are not effectively described by the optimization of some particular utility function—to the contrary, human behavior is the product of a slew of hot-fix adaptations, most easily understood in terms of how they function.

On a second look, though, there’s a little more here. What’s the difference between these two representations? For any given pattern of behavior, a utility function can be selected that values precise adherence to any given behavioral pattern. On some level, then, an adaptation executor is a utility maximizer—at minimum, we can retrospectively say that its utility function was based on how well it did what the adaptations drove it toward. That’s not very satisfying, though, as there does seem to be a substantial difference. Looking for what sets them apart, the only option is the operative qualifier: that behavior is effectively described.

To effectively describe some behavior, it’s necessary to describe that behavior as simply and directly as possible. Notice how convoluted that boilerplate utility function is. “Actions that would be produced by behavioral model X” fits the form of a utility function, but it’s certainly not the form that comes to mind when thinking about utility functions in general. Taking this route to transform a pattern of behavior into a utility function is always going to increase the complexity. So while you can theoretically construct a utility function to explain human choices, there very well may not be a utility function which expresses human decisions with substantially less complexity than the behavior produced by a taxonomy of adaptations.

What I propose to call an agent’s telos is the difference between how complex that agent’s behavior is and how complex the simplest utility function corresponding to that behavior is. It’s the degree to which an agent is better expressed by purpose than procedural action. I’m going to formalize that in the next few sections, but if you just want to see how it looks in practice, you can skip to “Back to Reality.”

The Behavior Form

That’s a good start, but it’s not worth much to wave around algorithmic complexity without having phrased the problem using algorithms. So, we’re going to be looking at a computable agent, and a computable environment. Since they’re both programs, they’ll need to interact in discrete steps—the agent sends an output to the environment, and the environment gives it new input, and so on ad infinitum. They’ll both have memory of all past exchanges to draw on, allowing things to get a little more complicated than function composition. In this paradigm, the agent’s behavior form is just the program that links its input and its memory to its output—it’s the way it decides what action to take given its circumstances up to present. The behavior complexity, then, is simply the algorithmic complexity of this procedure—how simply can the way the agent acts be described, moving from stimulus to reaction?

The Utility Form

What I’ll call the utility form is a shift from looking at the behavior to the preference. What’s the simplest computable utility function that the behavior is already optimizing?

Starting from basic structure, a program filling the role of a utility function would simply take a history of inputs and outputs from an agent and assign it a weight. Basically, it would grade an agent’s choices, as a utility function tends to.

Meanwhile, it’s true that the environment is unknown to the agent, at least beyond the information it gets from its history of interactions. However, by virtue of it being a computable environment, we can make certain assumptions. In particular, from the lab of the mad computer scientist Ray Solomonoff, the concept of algorithmic probability allows us to set expectations for any given output based on the premise of a program with random code. We can narrow this down using Bayes’ theorem, in a (non-computable) algorithm called Solomonoff induction. With that approach, there are actually pretty significant expectations for an environment with a particular history, setting up an optimal path for any given utility function.

So, smashing these two rocks together, we can describe the utility form of an agent as the minimum program which, when used to weight the expectations of various outcomes according to conditional algorithmic probability, makes the actions of the target behavioral program optimal. The utility complexity can then be just the length of this program. It comes down to finding the simplest way of representing an agent as optimizing a utility function.

In reality, most utility forms have more than one behavioral optimizer. A particularly concerning case is the function which always returns 0 - extremely simple, and optimized for by all behaviors. This issue can be addressed by including the burden of specification in the utility complexity. That’s the last type of complexity I’ll introduce, and I’ll call it specification complexity—the number of bits, on average, needed to distinguish the behavior form in question from the space of other behavior forms that satisfy a utility form. This is just $- l o g_{2} (P (B | U optimized))$ , where $P (B | U optimized)$ is the algorithmic probability of the behavior form out of only those algorithms that optimize U. For example, the always-0 utility function doesn’t constrain algorithmic probability at all, so nearly the agent’s full behavior complexity must be added. By that token, the utility complexity is actually going to be the length of the minimal program + the specification complexity.^[1]

Properly Introducing Telos

Okay, so we have two ways of representing a given agent: we start by directly coding its behavior, but from there we can also represent it with a computable utility function for which it’s a true optimizer. In both of these forms, we can describe its complexity. However, the trick from before—choosing a utility function that just rewards the behavior established—means that the utility complexity can only exceed the behavior complexity by a constant amount. After all, we can represent this baseline utility function with any particular behavior form hardcoded. There’s no bound on how simple the utility function can be, though; a menagerie of complex conditionals can boil down to the pursuit of a goal that can be expressed in a couple lines.

What I’ve labeled telos is the difference between this baseline, the behavior complexity + C, and the actual utility complexity. As mentioned before, this is a measurement for the extent to which the behavior of an agent is better described by a categorical purpose than a procedure—in other words, how teleological its design is.

Back to Reality

To contrast “high-telos” and “low-telos” agents, let’s run through a couple scenarios.

Suppose we have a hypothetical AI which is constructed to maximize the production of paperclips, as an old tale describes. This is the spitting image of a high telos agent: its behavior complexity is tremendous, as evidenced by its ability to respond dynamically to whatever challenges it encounters in paperclip-optimization, but its utility complexity is roughly (thought not quite) as tiny as a program that checks known paperclips. The difference between these, its telos, is unthinkable.

On the other extreme, a pocket calculator is about as low-telos as they come. If you had to represent it as having a goal, it would be to spit out the appropriate calculation for its input, which is just as complex as its behavior (namely, spitting out the appropriate calculation for its input).

Somewhere in the middle is a human, and I suspect that’s the significance of the contrast between people as “utility maximizers” and “adaptation executors.” The goals of humans are not categorical, and so even the simplest utility function is not massively better than a collection of evolutionary incidentals. This makes it much more plausible to understand people in terms of behavior, as opposed to goals.

What’s exciting to me is that it’s very likely intelligence and telos are mostly orthogonal properties. Humans are the quintessential example—abstract thinking happens primarily in the neocortex, which is devoted to sensory processing. Both motor control and reward-optimization, the latter being where any telos we have certainly comes from, are completely separate from this center. In other words, it’s feasible for us to imitate only the reasoning portion of our neurology and build low-telos intelligent systems—pure processors of information which don’t meaningfully “want” anything.

^
It makes nearly no difference, but this coding of “expected bits needed” is actually also appropriate for behavior complexity and the unspecified utility complexity. Both of these actually correspond to the same $- l o g_{2} (P (X))$ form. Notice that this is usually going to be almost exactly the length of the shortest program, though, since adding length decreases the probabilities exponentially. I found it easier to communicate what this operation actually means through the idea of a minimum.
Tacking the specification complexity on to each utility form, we get:
$T e l o s = l o g_{2} [\sum_{U} P (U) \sum_{B} P (B | U optimized)] - [l o g_{2} (\sum_{B} P (B)) + C]$
$= l o g_{2} [\sum_{U} \sum_{B} P (U) P (B | U optimized)] - [l o g_{2} (\sum_{B} P (B)) + C]$
All sums here are just over all individual behavior/utility forms that generate the agent’s actions.
At a slightly deeper level, that’s what telos is: the difference between the average number of bits needed for behavior + the cost of conversion, and the average number of bits needed for both utility and specification within that utility.

Plinthist20 Jun 2022 13:06 UTC

2 points

8 comments5 min readLW link

Utility Functions AI World Optimization

JBlack 20 Jun 2022 11:48 UTC
3 points
0
“Actions that would be produced by behavioral model X” fits the form of a utility function
It only fits the form of a utility function if the behaviour model satisfies a fairly stringent collection of mathematical properties. A random model, or even a moderately strongly optimized one, will almost certainly not come very close to fitting the form of any utility function.
- Plinthist 20 Jun 2022 13:00 UTC
  1 point
  0
  Parent
  By “utility function” here, I just mean a function encoding the preferences of an agent—one that it optimizes—based on everything available to it. So, for any behavioral model, you could construct such a function that universally prefers the agent’s actions to be linked to its information by that model.
  It sounds like this may not be what you associate this word with. Could you give me an example of a behavior pattern that is not optimized by any utility function?
  - JBlack 21 Jun 2022 1:18 UTC
    1 point
    0
    Parent
    A simple and familiar example is that if preferences are not transitive, then there does not exist any utility function that models them. Similar problems arise with other failures of the VNM axioms, all of which are capable of being violated by the actual behaviour model of an agent.
    Simplest example of non-transitivity: in state A the agent always takes action b, which yields state B. In state B the agent always takes action c, which yields state C. In state C the agent always takes action a, yielding state A.
    It’s a very stupid agent, but it’s obviously one that can exist. The inference of preferences from actions says that it prefers state B over A, state C over B, and state A over C. There is no utility function U such that U(A) > U(C) > U(B) > U(A).
    - Plinthist 21 Jun 2022 16:45 UTC
      1 point
      0
      Parent
      I see—so you’re describing a purely input-based and momentary utility function, which can rely only the time-independent response from the environment. For the incomplete-information circumstances that I’m modeling, agents representable in this way would need to be ridiculously stupid, as they couldn’t make any connections between their actions and the feedback they get, nor between various instances of feedback. For example, a paperclip maximizer of this form could only check whether a paperclip is currently in its sensory access, in the best case.
      Do you see how, if we expand the utility function’s scope to both the agent’s actions and its full history, a “behavior-adherence” utility function becomes trivial?
      - JBlack 22 Jun 2022 2:18 UTC
        1 point
        0
        Parent
        No, “state” here refers to the entire state of the universe including the agent’s internal state. My example doesn’t care about the internal state of the agent, but that’s because the example is indeed a very stupid agent for simplicity, and not because this is in any way intrinsically required.
        Any purely observational model of behaviour can always correspond to a utility function, true. But also useless since such a model doesn’t predict anything at all. As soon as you allow generalization beyond strictly what you have observed, you lose the guarantee that a utility function exists corresponding to that behaviour model.
        Plinthist 22 Jun 2022 3:02 UTC
        1 point
        0
        Parent
        Any form of generalization can be represented by a function on behavior which produces its results and yields actions based on them—I’m not following you here. Can you give me an example of a model of behavior that isn’t purely observational, in the sense that it can’t be represented as a function of the full history of actions and responses? Any model with such a representation is susceptible to a utility function that just checks whether each past action adhered to said function.
        JBlack 22 Jun 2022 5:42 UTC
        1 point
        0
        Parent
        A purely observational model of behaviour is simply a list of actions that have actually been observed, and the histories of the universe that led to them. For example, with my trivial agent you could observe:
        “With the history of the universe being just the list of states [A], it performed action b leading to state B. With the list being [AB] it performed action c leading to state C. With the list being [ABC] it performed action a leading to state A.”
        From this model you can conclude that if the universe was somehow rewound, and placed into state A, that the agent would once again perform action a. This agent is deterministic.
        From these observations you can fit any utility function with U(ABCA) > U(ABC) > U(AB) > U(A). But it’s useless, since the history of the universe now contains states ABCA and you can’t in fact roll back the universe. In particular, you have no idea whether U(ABCAB) > U(ABCAA) or not because your observations don’t tell you.
        There are infinitely many behavioural rules that are not purely observational, but are compatible with the observations. Some of them allow predictions, some of them don’t. Independently of that, some of them are compatible with a utility function, some of them aren’t.
        The rules I gave for my agent are not purely observational—they are the actual rules that the agent uses for its actions (in a simplified, quantized universe) and not just some finite set of observations. The behavioural model corresponding to those rules is incompatible with every utility function.
        Plinthist 22 Jun 2022 13:47 UTC
        1 point
        0
        Parent
        In that case, “purely observational” would describe an expectation for behavior and not the actual pattern of behavior. This is not at all what the conversion I described involves.
        Remember: I’m allowing unlimited memory, taking into account the full history of inputs and outputs (i.e. environmental information received and agent response).
        In your example, the history X might be (for example) A(ab)B(bc)C(ca)A, where (pq) is the action that happens to cause the environment to produce Q after P. In this case, the behavioral function B(X) would yield (ab).
        Meanwhile, a suitable utility function U(X) would just need to prefer all sequences where each input A is followed by (ab), and so on, to those that where that doesn’t hold. In the case of complete information, as your scenario entails, the utility function could just prefer sequences where B follows A; regardless, this trivially generates the behavior.