Optimisation: Selective versus Predictive

Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confuse selective processes for predictive processes.

Here, I’m going to try to make that more general claim, rehash some examples in light of it, and end with a few ambient confusions I think this framework can help with, for the reader to ponder.


When you encounter an entity that is very good at achieving some outcome, there are two very different processes that could be going on under the hood:

  1. The entity’s behaviour could be guided by predictions about how to achieve the outcome[1]

  2. The entity’s behaviour could be selected to achieve that outcome

It’s not a perfect binary, and often what you see is a mix of the two. In particular, all predictive optimisers have emerged from selective optimisation and often retain some fingerprint.

Selective

Predictive

Weird Mix

Bacteria developing antibiotic resistance

Hacker finding a way to penetrate a secure system

Humans evolving to be good at lying

Gradient descent on Atari games

Tree searching Connect Four

AlphaZero training a policy on its own rollouts

Flowers co-evolving with their pollinators

Humans genetically modifying crops

Humans selectively breeding dogs

Human brains seem to be hardwired to reason about intent, in the same way that we see faces everywhere. The problem is, selective processes behave a bit differently. For example:

  • Predictive optimisers generalise to achieve their goal (within some world model); selective optimisation creates things which generalise pretty poorly out of distribution

  • Predictive optimisers intend to achieve an outcome; selective optimisation creates things which needn’t intend to achieve the outcomes they consistently reach

  • If you miss the presence of selective optimisation, you might underrate the amount of computation that has gone into searching for solutions

So when you try to interpret a system as purely predictive when it’s at least partly selective, you might mistakenly assume it generalises a lot more cleanly than it actually does, that its behaviour is in some meaningful sense intended, or that it can’t be that optimised because you can’t see much computation lying around. These can sometimes be dangerous mistakes.

(The last one in particular gives you a slightly more precise variant of Chesterton’s Fence: before scrapping a tradition where nobody can articulate why it’s useful, at least ballpark how much optimisation has probably gone into it.)

That’s the whole point. The rest of this post will just be spelling out examples, but feel free to stop if you’ve already got the gist.

This picture is basically right but not quite — can you spot why?

Classic Examples and Confusions

  • Adaptation Executors vs Fitness Maximisers — Even though evolution maximises inclusive genetic fitness, humans don’t: our ancestral drives and behaviours were selected, but they now generalise in ways that don’t promote fitness, like using contraceptives or eating so much ice cream that you get heart problems.

  • Unconscious Economics — Although economics is usually explained in terms of people following their incentives, you recover a lot of the same dynamics by thinking about selection pressures and behaviours being unconsciously reinforced, which notably lets you avoid ascribing as much intent.[2]

  • Reward is not the optimisation target — Even though we talk about “training” AIs to maximise “reward”, the actual process is much more like shaping the AI into the kind of thing which takes actions that achieve high reward, without it actually wanting reward per se.

  • Culture more broadly — From contemplative practices to elaborate food preparation, often there’s some subtle benefit that even the practitioners don’t understand but which is nonetheless important, hence a great deal of fence-crossing.

  • The watchmaker argument — For a long time, one of the major arguments for the existence of God was that the complexity of (human) life could only have come from an intelligent designer.

  • Gradual Disempowerment — Alignment would be a lot simpler if we just needed to align AI with “humanity” and stop it from spontaneously deciding to do harmful things. Unfortunately I think our civilisation is already majorly driven by selective pressures, and they are meaningfully misaligned with us, and capable of co-opting AIs in a way that could be catastrophic.[3]

  • Political extremists often say things like “the system is designed to stop you from ever having a house /​ true love /​ … so that you’re easier to control” — I think these people are often correctly identifying that things are suspiciously well set up to screw them, and then overestimating how much design there is. This makes them overestimate how much human malice there is, or mistakenly infer that there is some group that could decide to change things.

Now that I’ve given the general claim and a few instantiations, I’m going to close with some other cases where I think this distinction is relevant, for the reader to ponder.

  • What will happen if we get perfect AI lie detection? Will this basically solve cooperation for us?

  • To what extent are sycophancy, eval awareness, sandbagging, and alignment faking more like optimisation phenomena, and to what extent are they more like selection phenomena?

  • How much should we expect AIs to eventually tend towards being like locusts that only want to consume, and what does that depend on?

  • Why do people cooperate in the one-shot Prisoner’s Dilemma even though it’s irrational under standard economic logic?

  • How does one cooperate or trade with an egregore?

  • How much should punishments and rewards depend on intent?

  • When considering humanity’s prospects around the development of AGI, how should we weigh the observation that we as a species have already made it this far?

  1. ^

    Either by making predictions itself or by being crafted by a predicting entity

  2. ^

    The author of this post claims that this is specifically because people find these explanations less intuitive.

  3. ^

    One of the most common counterarguments is roughly “nobody wants this so it won’t happen”. I think it has some bite, but not much, and I am still not sure how to bridge that inferential gap — that’s part of what motivated me to write this post up.