Demons in Imperfect Search

One day, a gra­di­ent de­scent al­gorithm ball was hap­pily rol­ling down a high-di­men­sional sur­face hill. All it wanted was to roll as far down as pos­si­ble. Un­be­knownst to the ball, just off to the side was a steep drop-off—but there was a small bump be­tween the ball and the drop-off. No mat­ter; there was enough ran­dom noise on the ball that it would jump the bump sooner or later.

But the ball was headed into un­friendly ter­ri­tory.

As the ball rol­led along, the bump be­came taller. The farther it rol­led, the taller the bump grew, un­til no hope re­mained of find­ing the big drop any­time be­fore the stars burned out. Then the road be­gan to nar­row, and to twist and turn, and to be­come flat­ter. Soon the ball rol­led down only the slight­est slope, with tall walls on both sides con­strain­ing its path. The ball had en­tered the ter­ri­tory of a de­mon, and now that de­mon was steer­ing the ball ac­cord­ing to its own ne­far­i­ous ends.

This wasn’t the first time the ball had en­tered the ter­ri­tory of a de­mon. In early times, the demons had just been bumps which hap­pened to grow alongside the ball’s path, for a time—chance events, noth­ing more. But ev­ery now and then, two bumps in close prox­im­ity would push the ball in differ­ent di­rec­tions. The ball would roll on, oblivi­ous, and end up go­ing in one di­rec­tion or the other. Whichever bump had “won” would con­tinue to steer the ball’s tra­jec­tory—and so a se­lec­tion pro­cess oc­curred. The ball tended to roll alongside bumps which more effec­tively con­trol­led its tra­jec­tory—bumps which were taller, bumps which steered it away from com­pet­ing bumps. And so, over time, bumps gave way to bar­ri­ers, and bar­ri­ers gave way to demons—twisty paths with high walls to keep the ball con­tained and avoid com­pet­ing walls, slow­ing the ball’s de­scent to a crawl, con­serv­ing its po­ten­tial en­ergy in case a sharp drop were needed to avoid a com­peti­tor’s wall.

The ball’s down­hill progress slowed and slowed. Even though the rich, high-di­men­sional space was filled with lower points to ex­plore, the highly effec­tive demons had built tall walls to care­fully con­tain the ball within their own ter­ri­tory, draw­ing out its trav­els in­definitely.

The Pattern

This tale vi­su­al­izes a pat­tern:

  • There is some op­ti­miza­tion pro­cess—in this case, some var­i­ant of gra­di­ent de­scent.

  • The op­ti­miz­ing search is im­perfect: gra­di­ent de­scent only looks at lo­cal in­for­ma­tion, so it doesn’t “know” if there’s a steep drop be­yond a nearby bump.

  • Ex­ploit­ing the im­perfect search mechanism: in this case, the steep drop is hid­den by rais­ing high walls.

  • De­mon: in a rich enough search space, a feed­back loop can ap­pear, in­duc­ing more-and-more-perfect ex­ploita­tion of the im­perfect search mechanism. A whole new op­ti­miza­tion pro­cess ap­pears, with goals quite differ­ent from the origi­nal.

Does this ac­tu­ally hap­pen? Let’s look at a few real-world ex­am­ples...

Metabolic reactions

  • Op­ti­miza­tion pro­cess: free en­ergy min­i­miza­tion in a chem­i­cal sys­tem. Search op­er­ates by ran­dom small changes to the sys­tem state, then keep­ing changes with lower free en­ergy (very roughly speak­ing).

  • Search is im­perfect: the sys­tem does not im­me­di­ately jump to the global max­i­mum. It’s search­ing lo­cally, based on ran­dom sam­ples.

  • Ex­ploit­ing the im­perfect search mechanism: there’s of­ten a free en­ergy bar­rier be­tween low-free-en­ergy states. Biolog­i­cal sys­tems ma­nipu­late the height of the bar­ri­ers, rais­ing or low­er­ing the ac­ti­va­tion en­er­gies re­quired to cross them, in or­der to steer the lo­cal-free-en­ergy-min­i­miza­tion pro­cess to­ward some states and away from oth­ers.

  • De­mon: in pri­mor­dial times, some chem­i­cals hap­pened to raise/​lower bar­ri­ers to steer the pro­cess in such a way that it made more copies of the chem­i­cals. This kicked off an un­sta­ble feed­back loop, pro­duc­ing more and more such chem­i­cals. The rest is nat­u­ral his­tory.

Greedy genes

  • Op­ti­miza­tion pro­cess: evolu­tion, speci­fi­cally se­lec­tion pres­sure at the level of an or­ganism. Search op­er­ates by mak­ing ran­dom small changes to the genome, then see­ing how much the or­ganism re­pro­duces.

  • Search is im­perfect: the sys­tem does not im­me­di­ately jump to the global op­ti­mum. It’s search­ing lo­cally, based on ran­dom sam­ples, with the sam­ples them­selves cho­sen by a phys­i­cal mechanism.

  • Ex­ploit­ing the im­perfect search mechanism: some genes can bias the ran­dom sam­pling, mak­ing some ran­dom changes more or less likely than oth­ers. For in­stance, in sex­ual or­ganisms, the choice of which var­i­ant of a gene to re­tain is made at ran­dom dur­ing fer­til­iza­tion—but some gene var­i­ants can bias that choice in fa­vor of them­selves.

  • De­mon: some­times, a gene can bias the ran­dom sam­pling to make it­self more likely to be re­tained. This can kick off an un­sta­ble feed­back loop, e.g. a gene which bi­ases to­ward male chil­dren can re­sult in a more and more male-skewed pop­u­la­tion un­til the species dies out.


  • Op­ti­miza­tion pro­cess: profit max­i­miza­tion. Search op­er­ates by peo­ple in the com­pany sug­gest­ing and try­ing things, and see­ing what makes/​saves money.

  • Search is im­perfect: the com­pany does not im­me­di­ately jump to perfect profit-max­i­miz­ing be­hav­ior. Its ac­tions are cho­sen based on what sounds ap­peal­ing to man­agers, which in turn de­pends on the man­agers’ own knowl­edge, in­cen­tives, and per­sonal tics.

  • Ex­ploit­ing the im­perfect search mechanism: ac­tions which would ac­tu­ally max­i­mize profit are not nec­es­sar­ily ac­tions which look good on pa­per, or which re­ward the man­agers de­cid­ing whether to take them. Man­agers will take ac­tions which make them look good, rather than ac­tions which max­i­mize profit.

  • De­mon: some ac­tions which make man­agers look good will fur­ther de­cou­ple look­ing-good from profit-max­i­miza­tion—e.g. chang­ing eval­u­a­tion mechanisms. This kicks off an un­sta­ble feed­back loop, even­tu­ally de­cou­pling ac­tion-choice from profit-max­i­miza­tion.

I’d be in­ter­ested to hear other ex­am­ples peo­ple can think of.

The big ques­tion is: when does this hap­pen? There are enough real-world ex­am­ples to show that it does hap­pen, and not just in one nar­row case. But it also seems like it re­quires a fairly rich search space with some struc­ture to it in or­der to kick off a full de­monic feed­back loop. Can that in­sta­bil­ity be quan­tified? What are the rele­vant pa­ram­e­ters?