Demons in Imperfect Search
One day, a gradient descent algorithm ball was happily rolling down a high-dimensional surface hill. All it wanted was to roll as far down as possible. Unbeknownst to the ball, just off to the side was a steep drop-off—but there was a small bump between the ball and the drop-off. No matter; there was enough random noise on the ball that it would jump the bump sooner or later.
But the ball was headed into unfriendly territory.
As the ball rolled along, the bump became taller. The farther it rolled, the taller the bump grew, until no hope remained of finding the big drop anytime before the stars burned out. Then the road began to narrow, and to twist and turn, and to become flatter. Soon the ball rolled down only the slightest slope, with tall walls on both sides constraining its path. The ball had entered the territory of a demon, and now that demon was steering the ball according to its own nefarious ends.
This wasn’t the first time the ball had entered the territory of a demon. In early times, the demons had just been bumps which happened to grow alongside the ball’s path, for a time—chance events, nothing more. But every now and then, two bumps in close proximity would push the ball in different directions. The ball would roll on, oblivious, and end up going in one direction or the other. Whichever bump had “won” would continue to steer the ball’s trajectory—and so a selection process occurred. The ball tended to roll alongside bumps which more effectively controlled its trajectory—bumps which were taller, bumps which steered it away from competing bumps. And so, over time, bumps gave way to barriers, and barriers gave way to demons—twisty paths with high walls to keep the ball contained and avoid competing walls, slowing the ball’s descent to a crawl, conserving its potential energy in case a sharp drop were needed to avoid a competitor’s wall.
The ball’s downhill progress slowed and slowed. Even though the rich, high-dimensional space was filled with lower points to explore, the highly effective demons had built tall walls to carefully contain the ball within their own territory, drawing out its travels indefinitely.
The Pattern
This tale visualizes a pattern:
There is some optimization process—in this case, some variant of gradient descent.
The optimizing search is imperfect: gradient descent only looks at local information, so it doesn’t “know” if there’s a steep drop beyond a nearby bump.
Exploiting the imperfect search mechanism: in this case, the steep drop is hidden by raising high walls.
Demon: in a rich enough search space, a feedback loop can appear, inducing more-and-more-perfect exploitation of the imperfect search mechanism. A whole new optimization process appears, with goals quite different from the original.
Does this actually happen? Let’s look at a few real-world examples...
Metabolic reactions
Optimization process: free energy minimization in a chemical system. Search operates by random small changes to the system state, then keeping changes with lower free energy (very roughly speaking).
Search is imperfect: the system does not immediately jump to the global maximum. It’s searching locally, based on random samples.
Exploiting the imperfect search mechanism: there’s often a free energy barrier between low-free-energy states. Biological systems manipulate the height of the barriers, raising or lowering the activation energies required to cross them, in order to steer the local-free-energy-minimization process toward some states and away from others.
Demon: in primordial times, some chemicals happened to raise/lower barriers to steer the process in such a way that it made more copies of the chemicals. This kicked off an unstable feedback loop, producing more and more such chemicals. The rest is natural history.
Greedy genes
Optimization process: evolution, specifically selection pressure at the level of an organism. Search operates by making random small changes to the genome, then seeing how much the organism reproduces.
Search is imperfect: the system does not immediately jump to the global optimum. It’s searching locally, based on random samples, with the samples themselves chosen by a physical mechanism.
Exploiting the imperfect search mechanism: some genes can bias the random sampling, making some random changes more or less likely than others. For instance, in sexual organisms, the choice of which variant of a gene to retain is made at random during fertilization—but some gene variants can bias that choice in favor of themselves.
Demon: sometimes, a gene can bias the random sampling to make itself more likely to be retained. This can kick off an unstable feedback loop, e.g. a gene which biases toward male children can result in a more and more male-skewed population until the species dies out.
Managers
Optimization process: profit maximization. Search operates by people in the company suggesting and trying things, and seeing what makes/saves money.
Search is imperfect: the company does not immediately jump to perfect profit-maximizing behavior. Its actions are chosen based on what sounds appealing to managers, which in turn depends on the managers’ own knowledge, incentives, and personal tics.
Exploiting the imperfect search mechanism: actions which would actually maximize profit are not necessarily actions which look good on paper, or which reward the managers deciding whether to take them. Managers will take actions which make them look good, rather than actions which maximize profit.
Demon: some actions which make managers look good will further decouple looking-good from profit-maximization—e.g. changing evaluation mechanisms. This kicks off an unstable feedback loop, eventually decoupling action-choice from profit-maximization.
I’d be interested to hear other examples people can think of.
The big question is: when does this happen? There are enough real-world examples to show that it does happen, and not just in one narrow case. But it also seems like it requires a fairly rich search space with some structure to it in order to kick off a full demonic feedback loop. Can that instability be quantified? What are the relevant parameters?
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by 21 Mar 2023 0:06 UTC; 356 points) (
- My Objections to “We’re All Gonna Die with Eliezer Yudkowsky” by 21 Mar 2023 1:23 UTC; 166 points) (EA Forum;
- The Main Sources of AI Risk? by 21 Mar 2019 18:28 UTC; 121 points) (
- Long-Term Future Fund: May 2021 grant recommendations by 27 May 2021 6:44 UTC; 110 points) (EA Forum;
- Tessellating Hills: a toy model for demons in imperfect search by 20 Feb 2020 0:12 UTC; 97 points) (
- Formal Inner Alignment, Prospectus by 12 May 2021 19:57 UTC; 95 points) (
- 2020 Review Article by 14 Jan 2022 4:58 UTC; 74 points) (
- “Inner Alignment Failures” Which Are Actually Outer Alignment Failures by 31 Oct 2020 20:18 UTC; 66 points) (
- Box inversion revisited by 7 Nov 2023 11:09 UTC; 40 points) (
- What’s the “This AI is of moral concern.” fire alarm? by 13 Jun 2022 8:05 UTC; 37 points) (
- What’s the Relationship Between “Human Values” and the Brain’s Reward System? by 19 Apr 2022 5:15 UTC; 36 points) (
- Formal Philosophy and Alignment Possible Projects by 30 Jun 2022 10:42 UTC; 34 points) (
- Cheat sheet of AI X-risk by 29 Jun 2023 4:28 UTC; 19 points) (
- Is Fisherian Runaway Gradient Hacking? by 10 Apr 2022 13:47 UTC; 15 points) (
- Box inversion revisited by 7 Nov 2023 11:09 UTC; 13 points) (EA Forum;
- [AN #90]: How search landscapes can contain self-reinforcing feedback loops by 11 Mar 2020 17:30 UTC; 11 points) (
- 19 Feb 2021 1:22 UTC; 10 points) 's comment on Formal Solution to the Inner Alignment Problem by (
- 7 Nov 2021 17:07 UTC; 8 points) 's comment on Speaking of Stag Hunts by (
- 8 May 2022 3:33 UTC; 7 points) 's comment on But What’s Your *New Alignment Insight,* out of a Future-Textbook Paragraph? by (
- 15 Apr 2021 23:41 UTC; 4 points) 's comment on Old post/writing on optimization daemons? by (
- 21 Jun 2022 7:41 UTC; 2 points) 's comment on Explaining inner alignment to myself by (
- 16 Nov 2022 11:36 UTC; 1 point) 's comment on Vanessa Kosoy’s PreDCA, distilled by (
Pedagogical note: something that feels like it’s missing from the fable is a “realistic” sense of how demons get created and how they can manipulate the hill.
Fortunately your subsequent real-world examples all have this, and, like, I did know what you meant. But it felt sort of arbitrary to have this combo of “Well, there’s a very concrete, visceral example of the ball rolling downhill – I know what that means. But then there are some entities that can arbitrarily shape the hill. Why are the demons weak at the beginning and stronger the more you fold into demon space? What are the mechanics there?
It’s not the worst thing, and I don’t have any ideas to tighten it. Overall I do think the post did a good job of communicating the idea it was aiming at.
Updated the long paragraph in the fable a bit, hopefully that will help somewhat. It’s hard to make it really concrete when I don’t have a good mathematical description of how these things pop up; I’m not sure which aspects of the environment make it happen, so I don’t know what to emphasize.
Cool!
Another cute example is the accidental “viruses” found when training EURISKO:
Do you see yourself as extending the concept of Demon to apply to things which are not necessarily even close to intelligent? (e.g. your first two examples) Or did the concept always mean that and I was just mistaken about what it meant?
The example with the ball rolling downhill seemed to imply that the demons were pretty damn smart, and getting smarter over time via competition with each other. But only your third example with managers seems like a real-world case of this. At least, that’s my current claim. For example, I’d bet that if Lenat had let EURISKO run forever, it wouldn’t have eventually been taken over by a superintelligence. Rather, it probably would have been stuck in that “insert my own name as the creator of other useful heuristics” optima forever, or something mundane like that at any rate. For that matter, can you say more about the difference between demons and mere local optima?
I love the example, I’d never heard of that project before.
I’m agnostic on demonic intelligence. I think the key point is not the demons themselves but the process which produces them. Somehow, an imperfect optimizing search process induces a secondary optimizer, and it’s that secondary optimizer which produces the demons. For instance, in the metabolism example, evolution is the secondary optimizer, and its goals are (often) directly opposed to the original optimizer—it wants to conserve free energy, in order to “trade” with the free energy optimizer later. The demons themselves (i.e. cells/enzymes in the metabolism example) are inner optimizers of the secondary optimizer; I expect that Risks From Learned Optimization already describes the secondary optimizer <-> demon relationship fairly well, including when the demons will be more/less intelligent.
The interesting/scary point is that the secondary optimizer is consistently opposed to the original optimizer; the two are basically playing a game where the secondary tries to hide information from the original.
Hmmm, this doesn’t work to distinguish the two for me. Couldn’t you say a local minima involves a secondary optimizing search process that has that minima as its objective? To use your ball analogy, what exactly is the difference between these twisty demon hills and a simple crater-shaped pit? (Or, what is the difference between a search process that is vulnerable to twisty demon hills and one which is vulnerable to pits?)
In the ball example, it’s the selection process that’s interesting—the ball ending up rolling alongside one bump or another, and bumps “competing” in the sense that the ball will eventually end up rolling along at most one of them (assuming they run in different directions).
Only if such a search process is actually taking place. That’s why it’s key to look at the process, rather than the bumps and valleys themselves.
There isn’t inherently any important difference between those two. That said, there are some environments in which “bumps” which effectively steer a ball will tend to continue to do so in the future, and other environments in which the whole surface is just noise with low spatial correlation. The latter would not give rise to demons (I think), while the former would. This is part of what I’m still confused about—what, quantitatively, are the properties of the environment necessary for demons to show up?
Does that help clarify, or should I take another stab at it?
Ah, that does help, thanks. In my words: A search process that is vulnerable to local minima doesn’t necessarily contain a secondary search process, because it might not be systematically comparing local minima and choosing between them according to some criteria. It just goes for the first one it falls for, or maybe slightly more nuanced, the first sufficiently big one it falls for.
By contrast, in the ball rolling example you gave, the walls/ridges were competing with each other, such that the “best” one (or something like that) would be systematically selected by the ball, rather than just the first one or the first-sufficiently-big one.
So in that case, looking over your list again...
OK, I think I see how organic life arising from chemistry is an example of a secondary search process. It’s not just a local minima that chemistry found itself in, it’s a big competition between different kinds of local minima. And now I think I see how this would go in the other examples too. As I originally said in my top-level comment, I’m not sure this applies to the example I brought up, actually. Would the “Insert my name as the author of all useful heuristics” heuristic be outcompeted by something else eventually, or not? I bet not, which indicates that it’s a “mere” local minima and not one that is part of a broader secondary search process.
+1, creating a self-reinforcing feedback loop =/= being an optimiser, and so I think any explanation of demons needs to focus on them making deliberate choices to reinforce themselves.
?
Oops, forgot to delete that bit. Thanks for pointing it out.
Another example might be democratic politics. Optimization is meant to produce a government and policies representing a majority view while protecting minority rights. Search is via voting, a procedure which is defined in a difficult-to-change constitution; politicians who are elected have an incentive to preserve the system that got them elected. Exploitation happens when actions that would better represent majority views and protect minority rights don’t necessarily get politicians elected. In fact, there are actions politicians can take to further decouple representation and rights-protection from voting.
Addiction might be another example. It starts with pursuing a feeling of relief. Search is imperfect, focusing on reward system responses in the brain rather than the feeling of relief originally sought. Drug makers and addicts focus on stimulating that reward center, rather than on creating/consuming drugs that might produce relief. Some actions that stimulate the reward system further decouple brain stimulus from relief, like self isolation or theft to get money for drugs.
Excellent example. Your politics example is great too.
I’m suspicious of this mechanism; I’d think that as the number of males increases, there’s increasing selection pressure against this gene. Do you have a reference?
Related:
Oh actually, I now see the explanation, from the same post, that this can arise when the gene causing male bias is itself on the Y-chromosome.
Being stuck in local minima or in a long shallow valley happens in optimization problems all the time, Isn’t this what simulated annealing and similar techniques are designed to correct? I’ve seen this in maximum likelihood Markov chain discovery problems a lot.
I expect this problem would show up in any less-than-perfect optimizer, including SA variants. Heck, the metabolic example is basically the physical system which SA was based on in the first place. But it would look different with different optimizers, mainly depending on what the optimizer “sees” and what’s needed to “hide” information from it.
Toy example and non agentic real life examples don’t have the coupling/symbiosis of walls siphoning work from balls to maintain the walls. Walls might be built from restricting the dimensions along which the ball tends to move/look ahead so that it treats saddle points instead as cul de sacs. Lowering momentum/energy in general to make the walls you need to build not as high.
It seems that there is a fundamental difference between a physical agent that participates in an arrow-of-time versus an algorithm exploring a Platonic realm off-line, for example trying to find the best way to compress a dataset. The algorithm can be tricked by red-herrings in the data into wasting CPU time chasing after mirages, but it can always restore from a checkpoint, do a random restart, spawn multiple threads, etc—it can always press “undo” and cannot be trapped forever. Most importantly, it can’t be stolen from, only tricked into wasting its time. But a physical agent interacting with the world can be have its resources stolen, further fueling its attacker, perhaps starting some sort of Red Queen dynamics.
Errata: