The Valley of Bad Theory

An in­ter­est­ing ex­per­i­ment: re­searchers set up a wheel on a ramp with ad­justable weights. Par­ti­ci­pants in the ex­per­i­ment then ad­just the weights to try and make the wheel roll down the ramp as quickly as pos­si­ble. The par­ti­ci­pants go one af­ter the other, each with a limited num­ber of at­tempts, each pass­ing their data and/​or the­o­ries to the next per­son in line, with the goal of max­i­miz­ing the speed at the end. As in­for­ma­tion ac­cu­mu­lates “from gen­er­a­tion to gen­er­a­tion”, wheel speeds im­prove.

This pro­duced not just one but two in­ter­est­ing re­sults.

First, af­ter each par­ti­ci­pant’s turn, the re­searchers asked the par­ti­ci­pant to pre­dict how fast var­i­ous con­figu­ra­tions would roll. Even though wheel speed in­creased from per­son to per­son, as data ac­cu­mu­lated, their abil­ity to pre­dict how differ­ent con­figu­ra­tions be­have did not in­crease. In other words, perfor­mance was im­prov­ing, but un­der­stand­ing was not.

Se­cond, par­ti­ci­pants in some groups were al­lowed to pass along both data and the­o­ries to their suc­ces­sors, while par­ti­ci­pants in other groups were only al­lowed to pass along data. Turns out, the data-only groups performed bet­ter. Why? The au­thors an­swer:

… par­ti­ci­pants who in­her­ited an in­er­tia- or en­ergy- re­lated the­ory showed skewed un­der­stand­ing pat­terns. In­her­it­ing an in­er­tia-re­lated the­ory in­creased their un­der­stand­ing of in­er­tia, but de­creased their un­der­stand­ing of en­ergy…


… par­ti­ci­pants’ un­der­stand­ing may also re­sult from differ­ent ex­plo­ra­tion pat­terns. For in­stance, par­ti­ci­pants who re­ceived an in­er­tia-re­lated the­ory mainly pro­duced bal­anced wheels (Fig. 3F), which could have pre­vented them from ob­serv­ing the effect of vary­ing the po­si­tion of the wheel’s cen­ter of mass.

So, two les­sons:

  1. Iter­a­tive op­ti­miza­tion does not re­sult in un­der­stand­ing, even if the op­ti­miza­tion is suc­cess­ful.

  2. Pass­ing along the­o­ries can ac­tu­ally make both un­der­stand­ing and perfor­mance worse.

So… we should iter­a­tively op­ti­mize and for­get about the­o­riz­ing? Fox not hedge­hog, and all that?

Well… not nec­es­sar­ily. We’re talk­ing here about a wheel, with weights on it, rol­ling down a ramp. Math­e­mat­i­cally, this sys­tem just isn’t all that com­pli­cated. Any­one with an un­der­grad-level un­der­stand­ing of me­chan­ics can just crank the math, in all of its glory. Take no short­cuts, dou­ble-check any ap­prox­i­ma­tions, do it right. It’d be te­dious, but cer­tainly not in­tractable. And then… then you’d un­der­stand the sys­tem.

What benefit would all this the­ory yield? Well, you could pre­dict how differ­ent con­figu­ra­tions would perform. You could say for sure whether you had found the best solu­tion, or whether bet­ter con­figu­ra­tions were still out there. You could avoid get­ting stuck in lo­cal op­tima. Y’know, all the usual benefits of ac­tu­ally un­der­stand­ing a sys­tem.

But clearly, the large ma­jor­ity of par­ti­ci­pants in the ex­per­i­ment did not crank all that math. They passed along ad-hoc, in­com­plete the­o­ries which didn’t ac­count for all the im­por­tant as­pects of the sys­tem.

This sug­gests a valley of bad the­ory. Peo­ple with no the­ory, who just iter­a­tively op­ti­mize, can do all right—they may not re­ally un­der­stand it, they may have to start from scratch if the sys­tem changes in some im­por­tant way, but they can op­ti­mize rea­son­ably well within the con­fines of the prob­lem. On the other end, if you crank all the math, you can go straight to the op­ti­mal solu­tion, and you can pre­dict in ad­vance how changes will af­fect the sys­tem.

But in be­tween those ex­tremes, there’s a whole lot of peo­ple who are re­ally bad at physics and/​or math and/​or the­o­riz­ing. Those peo­ple would be bet­ter off just aban­don­ing their the­o­ries, and stick­ing with dumb iter­a­tive op­ti­miza­tion.