Selling Nonapples

Pre­vi­ously in se­ries: Worse Than Random

A tale of two ar­chi­tec­tures...

Once upon a time there was a man named Rod­ney Brooks, who could justly be called the King of Scruffy Robotics. (Sam­ple pa­per ti­tles: “Fast, Cheap, and Out of Con­trol”, “In­tel­li­gence Without Rea­son”). Brooks in­vented the “sub­sump­tion ar­chi­tec­ture”—robotics based on many small mod­ules, com­mu­ni­cat­ing asyn­chronously and with­out a cen­tral world-model or cen­tral plan­ning, act­ing by re­flex, re­spond­ing to in­ter­rupts. The archety­pal ex­am­ple is the in­sect-in­spired robot that lifts its leg higher when the leg en­coun­ters an ob­sta­cle—it doesn’t model the ob­sta­cle, or plan how to go around it; it just lifts its leg higher.

In Brooks’s paradigm—which he la­beled nou­velle AI—in­tel­li­gence emerges from “situ­at­ed­ness”. One speaks not of an in­tel­li­gent sys­tem, but rather the in­tel­li­gence that emerges from the in­ter­ac­tion of the sys­tem and the en­vi­ron­ment.

And Brooks wrote a pro­gram­ming lan­guage, the be­hav­ior lan­guage, to help roboti­cists build sys­tems in his paradig­matic sub­sump­tion ar­chi­tec­ture—a lan­guage that in­cludes fa­cil­ities for asyn­chronous com­mu­ni­ca­tion in net­works of re­flex­ive com­po­nents, and pro­gram­ming finite state ma­chines.

My un­der­stand­ing is that, while there are still peo­ple in the world who speak with rev­er­ence of Brooks’s sub­sump­tion ar­chi­tec­ture, it’s not used much in com­mer­cial sys­tems on ac­count of be­ing nearly im­pos­si­ble to pro­gram.

Once you start stack­ing all these mod­ules to­gether, it be­comes more and more difficult for the pro­gram­mer to de­cide that, yes, an asyn­chronous lo­cal mod­ule which raises the robotic leg higher when it de­tects a block, and mean­while sends asyn­chronous sig­nal X to mod­ule Y, will in­deed pro­duce effec­tive be­hav­ior as the out­come of the whole in­ter­twined sys­tem whereby in­tel­li­gence emerges from in­ter­ac­tion with the en­vi­ron­ment...

Asyn­chronous par­allel de­cen­tral­ized pro­grams are harder to write. And it’s not that they’re a bet­ter, higher form of sor­cery that only a few ex­cep­tional magi can use. It’s more like the differ­ence be­tween the two busi­ness plans, “sell ap­ples” and “sell non­ap­ples”.

One note­wor­thy critic of Brooks’s paradigm in gen­eral, and sub­sump­tion ar­chi­tec­ture in par­tic­u­lar, is a fel­low by the name of Se­bas­tian Thrun.

You may re­call the 2005 DARPA Grand Challenge for the driver­less cars. How many ways was this a fair challenge ac­cord­ing to the tenets of Scruffy­dom? Let us count the ways:

  • The challenge took place in the real world, where sen­sors are im­perfect, ran­dom fac­tors in­ter­vene, and macro­scopic physics is only ap­prox­i­mately lawful.

  • The challenge took place out­side the lab­o­ra­tory—not even on paved roads, but 212km of desert.

  • The challenge took place in real time—con­tin­u­ous per­cep­tion, con­tin­u­ous ac­tion, us­ing only com­put­ing power that would fit on a car.

  • The teams weren’t told the spe­cific race course un­til 2 hours be­fore the race.

  • You could write the code any way you pleased, so long as it worked.

  • The challenge was com­pet­i­tive: The prize went to the fastest team that com­pleted the race. Any team which, for ide­olog­i­cal rea­sons, preferred el­e­gance to speed—any team which re­fused to milk ev­ery bit of perfor­mance out of their sys­tems—would surely lose to a less prin­ci­pled com­peti­tor.

And the win­ning team was Stan­ley, the Stan­ford robot, built by a team led by Se­bas­tian Thrun.

How did he do it? If I re­call cor­rectly, Thrun said that the key was be­ing able to in­te­grate prob­a­bil­is­tic in­for­ma­tion from many differ­ent sen­sors, us­ing a com­mon rep­re­sen­ta­tion of un­cer­tainty. This is likely code for “we used Bayesian meth­ods”, at least if “Bayesian meth­ods” is taken to in­clude al­gorithms like par­ti­cle fil­ter­ing.

And to heav­ily para­phrase and sum­ma­rize some of Thrun’s crit­i­cisms of Brooks’s sub­sump­tion ar­chi­tec­ture:

Robotics be­comes pointlessly difficult if, for some odd rea­son, you in­sist that there be no cen­tral model and no cen­tral plan­ning.

In­te­grat­ing data from mul­ti­ple un­cer­tain sen­sors is a lot eas­ier if you have a com­mon prob­a­bil­is­tic rep­re­sen­ta­tion. Like­wise, there are many po­ten­tial tasks in robotics—in situ­a­tions as sim­ple as nav­i­gat­ing a hal­lway—when you can end up in two pos­si­ble situ­a­tions that look highly similar and have to be dis­t­in­guished by rea­son­ing about the his­tory of the tra­jec­tory.

To be fair, it’s not as if the sub­sump­tion ar­chi­tec­ture has never made money. Rod­ney Brooks is the founder of iRobot, and I un­der­stand that the Roomba uses the sub­sump­tion ar­chi­tec­ture. The Roomba has no doubt made more money than was won in the DARPA Grand Challenge… though the Roomba might not seem quite as im­pres­sive...

But that’s not quite to­day’s point.

Ear­lier in his ca­reer, Se­bas­tian Thrun also wrote a pro­gram­ming lan­guage for roboti­cists. Thrun’s lan­guage was named CES, which stands for C++ for Embed­ded Sys­tems.

CES is a lan­guage ex­ten­sion for C++. Its types in­clude prob­a­bil­ity dis­tri­bu­tions, which makes it easy for pro­gram­mers to ma­nipu­late and com­bine mul­ti­ple sources of un­cer­tain in­for­ma­tion. And for differ­en­tiable vari­ables—in­clud­ing prob­a­bil­ities—the lan­guage en­ables au­to­matic op­ti­miza­tion us­ing tech­niques like gra­di­ent de­scent. Pro­gram­mers can de­clare ‘gaps’ in the code to be filled in by train­ing cases: “Write me this func­tion.”

As a re­sult, Thrun was able to write a small, cor­ri­dor-nav­i­gat­ing mail-de­liv­ery robot us­ing 137 lines of code, and this robot re­quired less than 2 hours of train­ing. As Thrun notes, “Com­pa­rable sys­tems usu­ally re­quire at least two or­ders of mag­ni­tude more code and are con­sid­er­ably more difficult to im­ple­ment.” Similarly, a 5,000-line robot lo­cal­iza­tion al­gorithm was reim­ple­mented in 52 lines.

Why can’t you get that kind of pro­duc­tivity with the sub­sump­tion ar­chi­tec­ture? Scruffies, ide­olog­i­cally speak­ing, are sup­posed to be­lieve in learn­ing—it’s only those evil log­i­cal Neats who try to pro­gram ev­ery­thing into their AIs in ad­vance. Then why does the sub­sump­tion ar­chi­tec­ture re­quire so much sweat and tears from its pro­gram­mers?

Sup­pose that you’re try­ing to build a wagon out of wood, and un­for­tu­nately, the wagon has a prob­lem, which is that it keeps catch­ing on fire. Sud­denly, one of the wagon-work­ers drops his wooden beam. His face lights up. “I have it!” he says. “We need to build this wagon from non­wood ma­te­ri­als!

You stare at him for a bit, try­ing to get over the shock of the new idea; fi­nally you ask, “What kind of non­wood ma­te­ri­als?”

The wag­oneer hardly hears you. “Of course!” he shouts. “It’s all so ob­vi­ous in ret­ro­spect! Wood is sim­ply the wrong ma­te­rial for build­ing wag­ons! This is the dawn of a new era—the non­wood era—of wheels, axles, carts all made from non­wood! Not only that, in­stead of tak­ing ap­ples to mar­ket, we’ll take non­ap­ples! There’s a huge mar­ket for non­ap­ples—peo­ple buy far more non­ap­ples than ap­ples—we should have no trou­ble sel­l­ing them! It will be the era of the nou­velle wagon!

The set “ap­ples” is much nar­rower than the set “not ap­ples”. Ap­ples form a com­pact cluster in thingspace, but non­ap­ples vary much more widely in price, and size, and use. When you say to build a wagon us­ing “wood”, you’re giv­ing much more con­crete ad­vice than when you say “not wood”. There are differ­ent kinds of wood, of course—but even so, when you say “wood”, you’ve nar­rowed down the range of pos­si­ble build­ing ma­te­ri­als a whole lot more than when you say “not wood”.

In the same fash­ion, “asyn­chronous”—liter­ally “not syn­chronous”—is a much larger de­sign space than “syn­chronous”. If one con­sid­ers the space of all com­mu­ni­cat­ing pro­cesses, then syn­chrony is a very strong con­straint on those pro­cesses. If you toss out syn­chrony, then you have to pick some other method for pre­vent­ing com­mu­ni­cat­ing pro­cesses from step­ping on each other—syn­chrony is one way of do­ing that, a spe­cific an­swer to the ques­tion.

Like­wise “par­allel pro­cess­ing” is a much huger de­sign space than “se­rial pro­cess­ing”, be­cause se­rial pro­cess­ing is just a spe­cial case of par­allel pro­cess­ing where the num­ber of pro­ces­sors hap­pens to be equal to 1. “Par­allel pro­cess­ing” re­opens all sorts of de­sign choices that are pre­made in se­rial pro­cess­ing. When you say “par­allel”, it’s like step­ping out of a small cot­tage, into a vast and echo­ing coun­try. You have to stand some­place spe­cific, in that coun­try—you can’t stand in the whole place, in the non­cot­tage.

So when you stand up and shout: “Aha! I’ve got it! We’ve got to solve this prob­lem us­ing asyn­chronous pro­cesses!”, it’s like shout­ing, “Aha! I’ve got it! We need to build this wagon out of non­wood! Let’s go down to the mar­ket and buy a ton of non­wood from the non­wood shop!” You’ve got to choose some spe­cific al­ter­na­tive to syn­chrony.

Now it may well be that there are other build­ing ma­te­ri­als in the uni­verse than wood. It may well be that wood is not the best build­ing ma­te­rial. But you still have to come up with some spe­cific thing to use in its place, like iron. “Non­wood” is not a build­ing ma­te­rial, “sell non­ap­ples” is not a busi­ness strat­egy, and “asyn­chronous” is not a pro­gram­ming ar­chi­tec­ture.

And this is strongly rem­i­nis­cent of—ar­guably a spe­cial case of—the dilemma of in­duc­tive bias. There’s a trade­off be­tween the strength of the as­sump­tions you make, and how fast you learn. If you make stronger as­sump­tions, you can learn faster when the en­vi­ron­ment matches those as­sump­tions well, but you’ll learn cor­re­spond­ingly more slowly if the en­vi­ron­ment matches those as­sump­tions poorly. If you make an as­sump­tion that lets you learn faster in one en­vi­ron­ment, it must always perform more poorly in some other en­vi­ron­ment. Such laws are known as the “no-free-lunch” the­o­rems, and the rea­son they don’t pro­hibit in­tel­li­gence en­tirely is that the real uni­verse is a low-en­tropy spe­cial case.

Pro­gram­mers have a phrase called the “Tur­ing Tarpit”; it de­scribes a situ­a­tion where ev­ery­thing is pos­si­ble, but noth­ing is easy. A Univer­sal Tur­ing Ma­chine can simu­late any pos­si­ble com­puter, but only at an im­mense ex­pense in time and mem­ory. If you pro­gram in a high-level lan­guage like Python, then—while most pro­gram­ming tasks be­come much sim­pler—you may oc­ca­sion­ally find your­self bang­ing up against the walls im­posed by the pro­gram­ming lan­guage; some­times Python won’t let you do cer­tain things. If you pro­gram di­rectly in ma­chine lan­guage, raw 1s and 0s, there are no con­straints; you can do any­thing that can pos­si­bly be done by the com­puter chip; and it will prob­a­bly take you around a thou­sand times as much time to get any­thing done. You have to do, all by your­self, ev­ery­thing that a com­piler would nor­mally do on your be­half.

Usu­ally, when you adopt a pro­gram ar­chi­tec­ture, that choice takes work off your hands. If I use a stan­dard con­tainer library—lists and ar­rays and hashta­bles—then I don’t need to de­cide how to im­ple­ment a hashtable, be­cause that choice has already been made for me.

Adopt­ing the sub­sump­tion paradigm means los­ing or­der, in­stead of gain­ing it. The sub­sump­tion ar­chi­tec­ture is not-syn­chronous, not-se­rial, and not-cen­tral­ized. It’s also not-knowl­edge-mod­el­ling and not-plan­ning.

This ab­sence of solu­tion im­plies an im­mense de­sign space, and it re­quires a cor­re­spond­ingly im­mense amount of work by the pro­gram­mers to reim­pose or­der. Un­der the sub­sump­tion ar­chi­tec­ture, it’s the pro­gram­mer who de­cides to add an asyn­chronous lo­cal mod­ule which de­tects whether a robotic leg is blocked, and raises it higher. It’s the pro­gram­mer who has to make sure that this be­hav­ior plus other mod­ule be­hav­iors all add up to an (ide­olog­i­cally cor­rect) emer­gent in­tel­li­gence. The lost struc­ture is not re­placed. You just get tossed into the Tur­ing Tarpit, the space of all other pos­si­ble pro­grams.

On the other hand, CES cre­ates or­der; it adds the struc­ture of prob­a­bil­ity dis­tri­bu­tions and gra­di­ent op­ti­miza­tion. This nar­row­ing of the de­sign space takes so much work off your hands that you can write a learn­ing robot in 137 lines (at least if you hap­pen to be Se­bas­tian Thrun).

The moral:

Quite a few AI ar­chi­tec­tures aren’t.

If you want to gen­er­al­ize, quite a lot of poli­cies aren’t.

They aren’t choices. They’re just protests.

Added: Robin Han­son says, “Economists have to face this in spades. So many peo­ple say stan­dard econ has failed and the solu­tion is to do the op­po­site—non-equil­ibrium in­stead of equil­ibrium, non-self­ish in­stead of self­ish, non-in­di­vi­d­ual in­stead of in­di­vi­d­ual, etc.” It seems that sel­l­ing non­ap­ples is a full-blown Stan­dard Icon­o­clast Failure Mode.