Gears-Level Models are Capital Investments


The usual method to solve a maze is some var­i­ant of bab­ble-and-prune: try a path, if it seems to get closer to the exit then keep go­ing, if it hits a dead end then go back and try an­other path. It’s a black-box method that works rea­son­ably well on most mazes.

How­ever, there are other meth­ods. For in­stance, you could start by look­ing for a chain of walls with only one open­ing, like this:

This chain of walls is a gears-level in­sight into the maze—a piece of the in­ter­nal struc­ture which lets us bet­ter un­der­stand “how the maze works” on a low level. It’s not spe­cific to any par­tic­u­lar path, or to any par­tic­u­lar start/​end points—it’s a prop­erty of the maze it­self. Every short­est path be­tween two points in the maze ei­ther starts and ends on the same side of that line, or passes through the gap.

If we only need to solve the maze once, then look­ing for a chain of walls is not very use­ful—it could eas­ily take as long as solv­ing the maze! But if we need to solve the same maze more than once, with differ­ent start and end points… then we can spend the time find­ing that chain of walls just once, and re-use our knowl­edge over and over again. It’s a cap­i­tal in­vest­ment: we do some ex­tra work up-front, and it pays out in lower costs ev­ery time we look for a path through the maze in the fu­ture.

This is a gen­eral fea­ture of gears-level mod­els: figur­ing out a sys­tem’s gears takes ex­tra work up-front, but yields div­i­dends for­ever. The al­ter­na­tive, typ­i­cally, is a black-box strat­egy: use a method which works with­out need­ing to un­der­stand the in­ter­nals of the sys­tem. The black-box ap­proach is cheaper for one-off tasks, but usu­ally doesn’t yield any in­sights which will gen­er­al­ize to new tasks us­ing the same sys­tem—it’s con­text-de­pen­dent.


Sup­pose we work with the mar­ket­ing team at an on­line car loan re­fi­nance com­pany, and we’re tasked with op­ti­miz­ing the com­pany’s mar­ket­ing to max­i­mize the num­ber of car loans the com­pany re­fi­nances. Here’s two differ­ent ap­proaches we might take:

  • We a/​b test hun­dreds of differ­ent ad spend strate­gies, mar­ket­ing copy per­mu­ta­tions, ban­ner images, land­ing page lay­outs, etc. Ideally, we find a par­tic­u­lar com­bi­na­tion works es­pe­cially well.

  • We ob­tain some anonymized data from a credit agency on peo­ple with car loans. Ideally, we learn some­thing about the mar­ket—e.g. maybe sub­prime bor­row­ers usu­ally ei­ther de­clare bankruptcy or dra­mat­i­cally in­crease their credit score within two years of tak­ing a loan.

The first strat­egy is black-box: we don’t need to know any­thing about who our po­ten­tial cus­tomers are, what they want, the psy­chol­ogy of click­ing on ads, etc. We can treat our mar­ket­ing pipeline as a black box and fid­dle with its in­puts to see what works. The sec­ond strat­egy is gears-level, the ex­act op­po­site of black-box: the whole point is to learn who our po­ten­tial cus­tomers are, break­ing open the black box and look­ing at the in­ter­nal gears.

Th­ese aren’t mu­tu­ally ex­clu­sive, and they have differ­ent rel­a­tive ad­van­tages. Some up­sides of black-box:

  • Black-box is usu­ally cheaper and eas­ier, since the code in­volved is pretty stan­dard and we don’t need to track down ex­ter­nal data. Gears-level strate­gies re­quire more cus­tom work and find­ing par­tic­u­lar data.

  • Black-box yields di­rect benefits when it works, whereas gears-level re­quires an ex­tra step to trans­late what­ever in­sights we find into ac­tual im­prove­ments.

On the other hand:

  • Gears-level in­sights can high­light ideas we wouldn’t even have thought to try, whereas black-box just tests the things we think to test.

  • When some tests are ex­pen­sive (e.g. in­te­grat­ing with a new ad chan­nel), gears-level knowl­edge can tell us which tests are most likely to be worth­while.

  • Black-box op­ti­miza­tion is sub­ject to Good­hart, while gears-level in­sights usu­ally are not (at least in-and-of them­selves)

  • Gears-level in­sights are less likely sub­ject to dis­tri­bu­tion shift. For in­stance, if we change ad chan­nels, then the dis­tri­bu­tion of peo­ple see­ing our ads will shift. Differ­ent ad copy will perform well, and we’d need to restart our black-box a/​b test­ing, whereas gen­eral in­sights about sub­prime bor­row­ers are more likely to re­main valid.

  • Con­versely, black-box op­ti­miza­tions de­pre­ci­ate over time. Au­di­ences and ad chan­nels evolve, and ads need to change with them, re­quiring con­stant re-op­ti­miza­tion to check that old choices are still op­ti­mal.

  • By ex­ten­sion, gears-level in­sights tend to be per­ma­nent and broadly ap­pli­ca­ble, and have the po­ten­tial for com­pound re­turns, whereas black-box im­prove­ments are much more con­text-spe­cific and likely to shift with time.

In short, the black-box ap­proach is eas­ier, cheaper, and more di­rectly use­ful—but its benefits are ephemeral and it can’t find un­known un­knowns. Gears-level un­der­stand­ing is more difficult, ex­pen­sive, and risky, but it offers per­ma­nent, gen­er­al­iz­able in­sights and can sug­gest new ques­tions we wouldn’t have thought to ask.

With this in mind, con­sider the world through the eyes of an an­cient lich or thou­sand-year-old vam­pire. It’s a wor­ld­view in which ephemeral gains are ir­rele­vant. All that mat­ters is per­ma­nent, gen­er­al­iz­able knowl­edge—ev­ery­thing else will fade in time, and usu­ally not even very much time. In this wor­ld­view, gears-level un­der­stand­ing is ev­ery­thing.

On the other end of the spec­trum, con­sider the world through the eyes of a startup with six months of run­way which needs to show rapid growth in or­der to close an­other round of fund­ing. For them, black-box op­ti­miza­tion is ev­ery­thing—they want fast, cheap re­sults which don’t need to last for­ever.

Wheel with Weights

There’s a neat ex­per­i­ment where peo­ple are given a wheel with some weights on it, each of which can be shifted closer to/​fur­ther from the cen­ter. Groups of sub­jects have to co­op­er­a­tively find set­tings for the weights which min­i­mize the time for the wheel to roll down a ramp.

Given the op­por­tu­nity to test things out, sub­jects would of­ten iter­ate their way to op­ti­mal set­tings—but they didn’t iter­ate their way to cor­rect the­o­ries. When asked to pre­dict how hy­po­thet­i­cal set­tings would perform, sub­jects’ pre­dic­tions didn’t im­prove much as they iter­ated. This is black-box op­ti­miza­tion: op­ti­miza­tion was achieved, but in­sight into the sys­tem was not.

If the prob­lem had changed sig­nifi­cantly—e.g. chang­ing weight ra­tios/​an­gles, ramp length/​an­gle, etc—the op­ti­mal set­tings could eas­ily change enough that sub­jects would need to re-op­ti­mize from scratch. On the other hand, the sys­tem is sim­ple enough that just do­ing all the math is tractable—and that math would re­main es­sen­tially the same if weights, an­gles, and lengths changed. A gears-level un­der­stand­ing is pos­si­ble, and would re­duce the cost of op­ti­miz­ing for new sys­tem pa­ram­e­ters. It’s a cap­i­tal in­vest­ment: it only makes sense to make the in­vest­ment in gears-level un­der­stand­ing if it will pay off on many differ­ent fu­ture prob­lems.

In the ex­per­i­ment, sub­jects were un­der no pres­sure to achieve gears-level un­der­stand­ing—they only needed to op­ti­mize for one set of pa­ram­e­ters. I’d pre­dict that peo­ple would be more likely to gain un­der­stand­ing if they needed to find op­ti­mal weight-set­tings quickly for many differ­ent wheel/​ramp pa­ram­e­ters. (A close anal­ogy is evolu­tion of mod­u­lar­ity: chang­ing ob­jec­tives in­cen­tivize learn­ing gen­eral struc­ture.)


Let’s bring in the man­ioc ex­am­ple:

There’s this plant, man­ioc, that grows eas­ily in some places and has a lot of calories in it, so it was a sta­ple for some in­dige­nous South Amer­i­cans since be­fore the Euro­peans showed up. Tra­di­tional han­dling of the man­ioc in­volved some elab­o­rate time-con­sum­ing steps that had no ap­par­ent pur­pose, so when the Por­tuguese in­tro­duced it to Africa, they didn’t bother with those steps—just, grow it, cook it, eat it.
The prob­lem is that man­ioc’s got cyanide in it, so if you eat too much too of­ten over a life­time, you get sick, in a way that’s not eas­ily trace­able to the plant. Some­how, over prob­a­bly hun­dreds of years, the peo­ple liv­ing in man­ioc’s origi­nal range figured out a way to leach out the poi­son, with­out un­der­stand­ing the un­der­ly­ing chem­istry—so if you asked them why they did it that way, they wouldn’t nec­es­sar­ily have a good an­swer.

The tech­niques for pro­cess­ing man­ioc are a stock ex­am­ple of metis: tra­di­tional knowl­edge ac­cu­mu­lated over gen­er­a­tions, which doesn’t seem like it has any ba­sis in rea­son or any rea­son to be use­ful. It’s black-box knowl­edge, where the black-box op­ti­mizer is cul­tural trans­mis­sion and evolu­tion. Man­ioc is a cau­tion­ary tale about the dan­gers of throw­ing away or ig­nor­ing black-box knowl­edge just be­cause it doesn’t con­tain any gears.

In this case, build­ing a gears-level model was very ex­pen­sive—peo­ple had to get sick on a large scale in or­der to figure out that any knowl­edge was miss­ing at all, and even af­ter that it pre­sum­ably took a while for sci­en­tists to come along and link the prob­lem to cyanide con­tent. On the other hand, now that we have that gears-level model in hand, we can quickly and eas­ily test new cook­ing meth­ods to see whether they elimi­nate the cyanide—our gears-level model pro­vides gen­er­al­iz­able in­sights. We can even check whether any par­tic­u­lar dish of man­ioc is safe be­fore eat­ing it, or breed new man­ioc strains which con­tain less cyanide. Metic knowl­edge would have no way to do any of that—it doesn’t gen­er­al­ize.

More Examples

(Note: in each of these ex­am­ples, there are many other ways to for­mu­late a black-box/​gears-level ap­proach. I just provide one pos­si­ble ap­proach for each.)


  • Black box ap­proach: run a high-through­put as­say to test the effect thou­sands of chem­i­cals against low-level mark­ers of some dis­ease.

  • Gears-level ap­proach: comb the liter­a­ture for fac­tors re­lated to some dis­ease. Run ex­per­i­ments hold­ing var­i­ous sub­sets of the fac­tors con­stant while vary­ing oth­ers, to figure out which fac­tors me­di­ate the effect of which oth­ers, and ul­ti­mately build up a causal graph of their in­ter­ac­tions.

The black-box ap­proach is a lot cheaper and faster, but it’s sub­ject to Good­hart prob­lems, won’t sug­gest com­pounds that no­body thought to test, and won’t provide any knowl­edge which gen­er­al­izes to re­lated dis­eases. If none of the chem­i­cals tested are effec­tive, then the black-box ap­proach leaves no foun­da­tion to build on. The gears-level ap­proach is much slower and more ex­pen­sive, but even­tu­ally yields re­li­able, gen­er­al­iz­able knowl­edge.

Fi­nan­cial Trading

  • Black box ap­proach: build a very thor­ough back­tester, then try out ev­ery al­gorithm or in­di­ca­tor we can think of to see if any of them achieve statis­ti­cally sig­nifi­cant im­prove­ment over mar­ket perfor­mance.

  • Gears-level ap­proach: re­search the trad­ing al­gorithms and in­di­ca­tors ac­tu­ally used by oth­ers, then simu­late mar­kets with traders us­ing those al­gorithms/​in­di­ca­tors. Com­pare re­sults against real price be­hav­ior and what­ever side data can be found in or­der to iden­tify miss­ing pieces.

The gears-level ap­proach is far more work, and likely won’t pro­duce any­thing prof­itable un­til very late in de­vel­op­ment. On the other hand, the gears-level ap­proach will likely gen­er­al­ize far bet­ter to new mar­kets, new mar­ket con­di­tions, etc.

Data Science

  • Black box ap­proach: train a neu­ral net­work, ran­dom for­est, sup­port vec­tor ma­chine, or what­ever generic black-box learn­ing al­gorithm you like.

  • Gears-level ap­proach: build a prob­a­bil­is­tic graph­i­cal model. Re­search the sub­ject mat­ter to hy­poth­e­size model struc­ture, and statis­ti­cally com­pare differ­ent model struc­tures to see which match the data best. Look for side in­for­ma­tion to con­firm that the struc­ture is cor­rect.

The black box ap­proach is sub­ject to Good­hart and of­ten fails to gen­er­al­ize. The gears-level ap­proach is far more work, re­quiring do­main ex­per­tise and side data and prob­a­bly lots of cus­tom code (al­though the re­cent surge of prob­a­bil­is­tic pro­gram­ming lan­guages helps a lot in that de­part­ment), but gears-level mod­els ul­ti­mately give us hu­man-un­der­stand­able ex­pla­na­tions of how the sys­tem ac­tu­ally works. Their in­ter­nal pa­ram­e­ters have phys­i­cal mean­ing.


Build­ing gears-level mod­els is ex­pen­sive—of­ten pro­hibitively ex­pen­sive. Black-box ap­proaches are usu­ally much cheaper and faster. But black-box ap­proaches rarely gen­er­al­ize—they’re sub­ject to Good­hart, need to be re­built when con­di­tions change, don’t iden­tify un­known un­knowns, and are hard to build on top of. Gears-level mod­els, on the other hand, offer per­ma­nent, gen­er­al­iz­able knowl­edge which can be ap­plied to many prob­lems in the fu­ture, even if con­di­tions shift.

The up­front cost of gears-level knowl­edge makes it an in­vest­ment, and the pay­off of that in­vest­ment is the abil­ity to re-use the model many times in the fu­ture.