Optimisation: Selective versus Predictive
Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confuse selective processes for predictive processes.
Here, I’m going to try to make that more general claim, rehash some examples in light of it, and end with a few ambient confusions I think this framework can help with, for the reader to ponder.
When you encounter an entity that is very good at achieving some outcome, there are two very different processes that could be going on under the hood:
The entity’s behaviour could be guided by predictions about how to achieve the outcome[1]
The entity’s behaviour could be selected to achieve that outcome
It’s not a perfect binary, and often what you see is a mix of the two. In particular, all predictive optimisers have emerged from selective optimisation and often retain some fingerprint.
Selective | Predictive | Weird Mix |
Bacteria developing antibiotic resistance | Hacker finding a way to penetrate a secure system | Humans evolving to be good at lying |
Gradient descent on Atari games | Tree searching Connect Four | AlphaZero training a policy on its own rollouts |
Flowers co-evolving with their pollinators | Humans genetically modifying crops | Humans selectively breeding dogs |
Human brains seem to be hardwired to reason about intent, in the same way that we see faces everywhere. The problem is, selective processes behave a bit differently. For example:
Predictive optimisers generalise to achieve their goal (within some world model); selective optimisation creates things which generalise pretty poorly out of distribution
Predictive optimisers intend to achieve an outcome; selective optimisation creates things which needn’t intend to achieve the outcomes they consistently reach
If you miss the presence of selective optimisation, you might underrate the amount of computation that has gone into searching for solutions
So when you try to interpret a system as purely predictive when it’s at least partly selective, you might mistakenly assume it generalises a lot more cleanly than it actually does, that its behaviour is in some meaningful sense intended, or that it can’t be that optimised because you can’t see much computation lying around. These can sometimes be dangerous mistakes.
(The last one in particular gives you a slightly more precise variant of Chesterton’s Fence: before scrapping a tradition where nobody can articulate why it’s useful, at least ballpark how much optimisation has probably gone into it.)
That’s the whole point. The rest of this post will just be spelling out examples, but feel free to stop if you’ve already got the gist.
This picture is basically right but not quite — can you spot why?
Classic Examples and Confusions
Adaptation Executors vs Fitness Maximisers — Even though evolution maximises inclusive genetic fitness, humans don’t: our ancestral drives and behaviours were selected, but they now generalise in ways that don’t promote fitness, like using contraceptives or eating so much ice cream that you get heart problems.
Unconscious Economics — Although economics is usually explained in terms of people following their incentives, you recover a lot of the same dynamics by thinking about selection pressures and behaviours being unconsciously reinforced, which notably lets you avoid ascribing as much intent.[2]
Reward is not the optimisation target — Even though we talk about “training” AIs to maximise “reward”, the actual process is much more like shaping the AI into the kind of thing which takes actions that achieve high reward, without it actually wanting reward per se.
Culture more broadly — From contemplative practices to elaborate food preparation, often there’s some subtle benefit that even the practitioners don’t understand but which is nonetheless important, hence a great deal of fence-crossing.
The watchmaker argument — For a long time, one of the major arguments for the existence of God was that the complexity of (human) life could only have come from an intelligent designer.
Gradual Disempowerment — Alignment would be a lot simpler if we just needed to align AI with “humanity” and stop it from spontaneously deciding to do harmful things. Unfortunately I think our civilisation is already majorly driven by selective pressures, and they are meaningfully misaligned with us, and capable of co-opting AIs in a way that could be catastrophic.[3]
Political extremists often say things like “the system is designed to stop you from ever having a house / true love / … so that you’re easier to control” — I think these people are often correctly identifying that things are suspiciously well set up to screw them, and then overestimating how much design there is. This makes them overestimate how much human malice there is, or mistakenly infer that there is some group that could decide to change things.
Now that I’ve given the general claim and a few instantiations, I’m going to close with some other cases where I think this distinction is relevant, for the reader to ponder.
What will happen if we get perfect AI lie detection? Will this basically solve cooperation for us?
To what extent are sycophancy, eval awareness, sandbagging, and alignment faking more like optimisation phenomena, and to what extent are they more like selection phenomena?
How much should we expect AIs to eventually tend towards being like locusts that only want to consume, and what does that depend on?
Why do people cooperate in the one-shot Prisoner’s Dilemma even though it’s irrational under standard economic logic?
How does one cooperate or trade with an egregore?
How much should punishments and rewards depend on intent?
When considering humanity’s prospects around the development of AGI, how should we weigh the observation that we as a species have already made it this far?
- ^
Either by making predictions itself or by being crafted by a predicting entity
- ^
The author of this post claims that this is specifically because people find these explanations less intuitive.
- ^
One of the most common counterarguments is roughly “nobody wants this so it won’t happen”. I think it has some bite, but not much, and I am still not sure how to bridge that inferential gap — that’s part of what motivated me to write this post up.
I think the really interesting interaction between these two frames is when selection pressures lead to predictive capacities. When does this happen? A first guess might be: when the training (selecting) environment is so complicated, and there is so much local variance that the selective loop finds its easiest to instill a predictive agent and let that take care of the local adaptation.
A lot of stuff works like this: you can have generic chess/math heuristics but you need to be able to do local calculations to not fall flat on your face; evolution more or less works like this in mammals and obviously humans, maybe much more; presumably LLMs work like this; our central nervous system/mind works like this wrt individual cells in the body.
Are there other factors that mediate how a selective process can give rise to local predictive agents? What consequences does this transition have? Cancer/parasites/fraud are three instances of one example, what else?
Selective optimization finding predictive optimizers (with a different objective) is the main idea of “Risks from Learned Optimisation” and indeed they have Section 2: Conditions for mesa-optimization
I mean, “selection pressure creates artefacts with learning/predictive capabiltiies” is also just how evolution works. It’s selective optimisation creating predictive optimisers all the way down: Even cells and nematodes have learning capabilities. What we humans see as our unique intelligence can be considered belonging to a long and storied genre of pathfinding and future-simulating behaviours, only now carried out in higher and higher dimensional action spaces—and at each step selection is used to grow and develop these capabilties. Given that, it seems reasonable to say that if evolution can be described as a coherent phenomenon at all, it will be a phenomenon that acts on intelligent goal-driven systems. (This comment comes from some notes I wrote down while reading the Moloch essay)
I feel a bit confused about gradient descent being described as a selective process, and thus about this binary. Is gradient descent a selective process? It doesn’t seem like it.
All the other examples of selective processes involve… variation and selection: you have a population with variation, the population gets culled, the remaining population has more of some quality, repeat. But gradient descent does not feature this, at least not in a straightforward way. There’s no pool of candidates, no acceptance / rejection, no competition, really.
(This might have consequences, for instance, with how gradient descent can work differently from more selective / evolutionary processes. Evolutionary Strategies At Scale for instance, finds that “Evolutionary Strategies” has a different behavior when used to train an LLM than gradient descent. See also.)
But generally this binary feels pretty fuzzy to me; the MECE-ness of it, or membership criteria seems unclear.
I wrote something about this a while back: in short, with a squint gradient descent and natural selection are the same.
From my point of view, one thing that’s particularly relevant is that they’re both operating locally, with very no/little foresight, over a high-dimensional design space. You could look at GD as selecting among all the possible local steps, and ‘competing’ them based on the heuristic of their local loss gradient (as approximated by the (sampled) dataset-derived estimator).
Some key practical differences between varying instantiations of GD/NS will be in the effective ‘proposal’/generating procedures and ‘promotion’/selection heuristics.
This confusion comes about because natural selection has no mechanism to maintain variation. Equivalently, gradient descent can only work with the data provided or in other words it has no “proposal” step like Gibbs sampling or MCMC. So the idea that gradient descent and natural selection are the same feels intuitive to me.
It is also known that some models of evolutionary game theory recover Fisher’s theorem of natural selection as a consequence of the replicator equation (a model of natural selection) as a gradient flow, see this arxiv paper. [Might have bungled the explanation on this one, so take with some salt.]
I think it’s possible that gradient descent works by applying a selection pressure to preexisting circuits in the initial randomization with some finetuning. This would explain why most weights are zero after training as well as stuff like the lottery ticket hypothesis.
As far as I know this is just false, though?
Daniel Dennett has called competence without comprehension what you call selective optimization.
Possibly see also Meditations on Moloch
In ‘Emergent’ misaligned outcomes’ I was trying to gesture at something like the selection effects on non-human, made-of-humans-for-now systems, and how they might incorporate AI capabilities (even non-agentic or corrigible agentic ones) and become ‘more than merely/mainly selected’ at some point.
I think this is something like a garbled proto gradual disempowerment perspective, and I appreciate the crisper articulation here and in your other writing.