Cascades, Cycles, Insight...

Fol­lowup to: Sur­prised by Brains

Five sources of dis­con­ti­nu­ity: 1, 2, and 3...

Cas­cades are when one thing leads to an­other. Hu­man brains are effec­tively dis­con­tin­u­ous with chim­panzee brains due to a whole bag of de­sign im­prove­ments, even though they and we share 95% ge­netic ma­te­rial and only a few mil­lion years have elapsed since the branch. Why this whole se­ries of im­prove­ments in us, rel­a­tive to chim­panzees? Why haven’t some of the same im­prove­ments oc­curred in other pri­mates?

Well, this is not a ques­tion on which one may speak with au­thor­ity (so far as I know). But I would ven­ture an un­o­rigi­nal guess that, in the ho­minid line, one thing led to an­other.

The chimp-level task of mod­el­ing oth­ers, in the ho­minid line, led to im­proved self-mod­el­ing which sup­ported re­cur­sion which en­abled lan­guage which birthed poli­tics that in­creased the se­lec­tion pres­sure for out­wit­ting which led to sex­ual se­lec­tion on wit­ti­ness...

...or some­thing. It’s hard to tell by look­ing at the fos­sil record what hap­pened in what or­der and why. The point be­ing that it wasn’t one op­ti­miza­tion that pushed hu­mans ahead of chimps, but rather a cas­cade of op­ti­miza­tions that, in Pan, never got started.

We fell up the stairs, you might say. It’s not that the first stair ends the world, but if you fall up one stair, you’re more likely to fall up the sec­ond, the third, the fourth...

I will con­cede that farm­ing was a wa­ter­shed in­ven­tion in the his­tory of the hu­man species, though it in­trigues me for a differ­ent rea­son than Robin. Robin, pre­sum­ably, is in­ter­ested be­cause the econ­omy grew by two or­ders of mag­ni­tude, or some­thing like that. But did hav­ing a hun­dred times as many hu­mans, lead to a hun­dred times as much thought-op­ti­miza­tion ac­cu­mu­lat­ing per unit time? It doesn’t seem likely, es­pe­cially in the age be­fore writ­ing and tele­phones. But farm­ing, be­cause of its seden­tary and re­peat­able na­ture, led to re­peat­able trade, which led to debt records. Aha! - now we have writ­ing. There’s a sig­nifi­cant in­ven­tion, from the per­spec­tive of cu­mu­la­tive op­ti­miza­tion by brains. Farm­ing isn’t writ­ing but it cas­caded to writ­ing.

Farm­ing also cas­caded (by way of sur­pluses and cities) to sup­port pro­fes­sional spe­cial­iza­tion. I sus­pect that hav­ing some­one spend their whole life think­ing about topic X in­stead of a hun­dred farm­ers oc­ca­sion­ally pon­der­ing it, is a more sig­nifi­cant jump in cu­mu­la­tive op­ti­miza­tion than the gap be­tween a hun­dred farm­ers and one hunter-gath­erer pon­der­ing some­thing.

Farm­ing is not the same trick as pro­fes­sional spe­cial­iza­tion or writ­ing, but it cas­caded to pro­fes­sional spe­cial­iza­tion and writ­ing, and so the pace of hu­man his­tory picked up enor­mously af­ter agri­cul­ture. Thus I would in­ter­pret the story.

From a zoomed-out per­spec­tive, cas­cades can lead to what look like dis­con­ti­nu­ities in the his­tor­i­cal record, even given a steady op­ti­miza­tion pres­sure in the back­ground. It’s not that nat­u­ral se­lec­tion sped up dur­ing ho­minid evolu­tion. But the search neigh­bor­hood con­tained a low-hang­ing fruit of high slope… that led to an­other fruit… which led to an­other fruit… and so, walk­ing at a con­stant rate, we fell up the stairs. If you see what I’m say­ing.

Pre­dict­ing what sort of things are likely to cas­cade, seems like a very difficult sort of prob­lem.

But I will ven­ture the ob­ser­va­tion that—with a sam­ple size of one, and an op­ti­miza­tion pro­cess very differ­ent from hu­man thought—there was a cas­cade in the re­gion of the tran­si­tion from pri­mate to hu­man in­tel­li­gence.

Cy­cles hap­pen when you con­nect the out­put pipe to the in­put pipe in a re­peat­able trans­for­ma­tion. You might think of them as a spe­cial case of cas­cades with very high reg­u­lar­ity. (From which you’ll note that in the cases above, I talked about cas­cades through differ­ing events: farm­ing → writ­ing.)

The no­tion of cy­cles as a source of dis­con­ti­nu­ity might seem coun­ter­in­tu­itive, since it’s so reg­u­lar. But con­sider this im­por­tant les­son of his­tory:

Once upon a time, in a squash court be­neath Stagg Field at the Univer­sity of Chicago, physi­cists were build­ing a shape like a gi­ant door­knob out of al­ter­nate lay­ers of graphite and ura­nium...

The key num­ber for the “pile” is the effec­tive neu­tron mul­ti­pli­ca­tion fac­tor. When a ura­nium atom splits, it re­leases neu­trons—some right away, some af­ter de­lay while byprod­ucts de­cay fur­ther. Some neu­trons es­cape the pile, some neu­trons strike an­other ura­nium atom and cause an ad­di­tional fis­sion. The effec­tive neu­tron mul­ti­pli­ca­tion fac­tor, de­noted k, is the av­er­age num­ber of neu­trons from a sin­gle fis­sion­ing ura­nium atom that cause an­other fis­sion. At k less than 1, the pile is “sub­crit­i­cal”. At k >= 1, the pile is “crit­i­cal”. Fermi calcu­lates that the pile will reach k=1 be­tween lay­ers 56 and 57.

On De­cem­ber 2nd in 1942, with layer 57 com­pleted, Fermi or­ders the fi­nal ex­per­i­ment to be­gin. All but one of the con­trol rods (strips of wood cov­ered with neu­tron-ab­sorb­ing cad­mium foil) are with­drawn. At 10:37am, Fermi or­ders the fi­nal con­trol rod with­drawn about half-way out. The geiger coun­ters click faster, and a graph pen moves up­ward. “This is not it,” says Fermi, “the trace will go to this point and level off,” in­di­cat­ing a spot on the graph. In a few min­utes the graph pen comes to the in­di­cated point, and does not go above it. Seven min­utes later, Fermi or­ders the rod pul­led out an­other foot. Again the ra­di­a­tion rises, then lev­els off. The rod is pul­led out an­other six inches, then an­other, then an­other.

At 11:30, the slow rise of the graph pen is punc­tu­ated by an enor­mous CRASH—an emer­gency con­trol rod, trig­gered by an ioniza­tion cham­ber, ac­ti­vates and shuts down the pile, which is still short of crit­i­cal­ity.

Fermi or­ders the team to break for lunch.

At 2pm the team re­con­venes, with­draws and locks the emer­gency con­trol rod, and moves the con­trol rod to its last set­ting. Fermi makes some mea­sure­ments and calcu­la­tions, then again be­gins the pro­cess of with­draw­ing the rod in slow in­cre­ments. At 3:25pm, Fermi or­ders the rod with­drawn an­other twelve inches. “This is go­ing to do it,” Fermi says. “Now it will be­come self-sus­tain­ing. The trace will climb and con­tinue to climb. It will not level off.”

Her­bert An­der­son re­counted (as told in Rhodes’s The Mak­ing of the Atomic Bomb):

“At first you could hear the sound of the neu­tron counter, click­ety-clack, click­ety-clack. Then the clicks came more and more rapidly, and af­ter a while they be­gan to merge into a roar; the counter couldn’t fol­low any­more. That was the mo­ment to switch to the chart recorder. But when the switch was made, ev­ery­one watched in the sud­den silence the mount­ing deflec­tion of the recorder’s pen. It was an awe­some silence. Every­one re­al­ized the sig­nifi­cance of that switch; we were in the high in­ten­sity regime and the coun­ters were un­able to cope with the situ­a­tion any­more. Again and again, the scale of the recorder had to be changed to ac­co­mo­date the neu­tron in­ten­sity which was in­creas­ing more and more rapidly. Sud­denly Fermi raised his hand. ‘The pile has gone crit­i­cal,’ he an­nounced. No one pre­sent had any doubt about it.”

Fermi kept the pile run­ning for twenty-eight min­utes, with the neu­tron in­ten­sity dou­bling ev­ery two min­utes.

That first crit­i­cal re­ac­tion had k of 1.0006.

It might seem that a cy­cle, with the same thing hap­pen­ing over and over again, ought to ex­hibit con­tin­u­ous be­hav­ior. In one sense it does. But if you pile on one more ura­nium brick, or pull out the con­trol rod an­other twelve inches, there’s one hell of a big differ­ence be­tween k of 0.9994 and k of 1.0006.

If, rather than be­ing able to calcu­late, rather than fore­see­ing and tak­ing cau­tions, Fermi had just rea­soned that 57 lay­ers ought not to be­have all that differ­ently from 56 lay­ers—well, it wouldn’t have been a good year to be a stu­dent at the Univer­sity of Chicago.

The in­ex­act anal­ogy to the do­main of self-im­prov­ing AI is left as an ex­er­cise for the reader, at least for now.

Economists like to mea­sure cy­cles be­cause they hap­pen re­peat­edly. You take a potato and an hour of la­bor and make a potato clock which you sell for two pota­toes; and you do this over and over and over again, so an economist can come by and watch how you do it.

As I noted here at some length, economists are much less likely to go around mea­sur­ing how many sci­en­tific dis­cov­er­ies it takes to pro­duce a new sci­en­tific dis­cov­ery. All the dis­cov­er­ies are in­di­vi­d­u­ally dis­similar and it’s hard to come up with a com­mon cur­rency for them. The analo­gous prob­lem will pre­vent a self-im­prov­ing AI from be­ing di­rectly analo­gous to a ura­nium heap, with al­most perfectly smooth ex­po­nen­tial in­crease at a calcu­la­ble rate. You can’t ap­ply the same soft­ware im­prove­ment to the same line of code over and over again, you’ve got to in­vent a new im­prove­ment each time. But if self-im­prove­ments are trig­ger­ing more self-im­prove­ments with great reg­u­lar­ity, you might stand a long way back from the AI, blur your eyes a bit, and ask: What is the AI’s av­er­age neu­tron mul­ti­pli­ca­tion fac­tor?

Eco­nomics seems to me to be largely the study of pro­duc­tion cy­cles—highly reg­u­lar re­peat­able value-adding ac­tions. This doesn’t seem to me like a very deep ab­strac­tion so far as the study of op­ti­miza­tion goes, be­cause it leaves out the cre­ation of novel knowl­edge and novel de­signs—fur­ther in­for­ma­tional op­ti­miza­tions. Or rather, treats pro­duc­tivity im­prove­ments as a mostly ex­oge­nous fac­tor pro­duced by black-box en­g­ineers and sci­en­tists. (If I un­der­es­ti­mate your power and merely par­ody your field, by all means in­form me what kind of eco­nomic study has been done of such things.) (An­swered: This liter­a­ture goes by the name “en­doge­nous growth”. See com­ments start­ing here.) So far as I can tell, economists do not ven­ture into ask­ing where dis­cov­er­ies come from, leav­ing the mys­ter­ies of the brain to cog­ni­tive sci­en­tists.

(Nor do I ob­ject to this di­vi­sion of la­bor—it just means that you may have to drag in some ex­tra con­cepts from out­side eco­nomics if you want an ac­count of self-im­prov­ing Ar­tifi­cial In­tel­li­gence. Would most economists even ob­ject to that state­ment? But if you think you can do the whole anal­y­sis us­ing stan­dard econ con­cepts, then I’m will­ing to see it...)

In­sight is that mys­te­ri­ous thing hu­mans do by grokking the search space, wherein one piece of highly ab­stract knowl­edge (e.g. New­ton’s calcu­lus) pro­vides the mas­ter key to a huge set of prob­lems. Since hu­mans deal in the com­press­ibil­ity of com­press­ible search spaces (at least the part we can com­press) we can bite off huge chunks in one go. This is not mere cas­cad­ing, where one solu­tion leads to an­other:

Rather, an “in­sight” is a chunk of knowl­edge which, if you pos­sess it, de­creases the cost of solv­ing a whole range of gov­erned prob­lems.

There’s a parable I once wrote—I for­get what for, I think ev-bio—which dealt with crea­tures who’d evolved ad­di­tion in re­sponse to some kind of en­vi­ron­men­tal prob­lem, and not with overly so­phis­ti­cated brains—so they started with the abil­ity to add 5 to things (which was a sig­nifi­cant fit­ness ad­van­tage be­cause it let them solve some of their prob­lems), then ac­creted an­other adap­ta­tion to add 6 to odd num­bers. Un­til, some time later, there wasn’t a re­pro­duc­tive ad­van­tage to “gen­eral ad­di­tion”, be­cause the set of spe­cial cases cov­ered al­most ev­ery­thing found in the en­vi­ron­ment.

There may be even be a real-world ex­am­ple of this. If you glance at a set, you should be able to in­stantly dis­t­in­guish the num­bers one, two, three, four, and five, but seven ob­jects in an ar­bi­trary (non-canon­i­cal pat­tern) will take at least one no­tice­able in­stant to count. IIRC, it’s been sug­gested that we have hard­wired nu­meros­ity-de­tec­tors but only up to five.

I say all this, to note the differ­ence be­tween evolu­tion nib­bling bits off the im­me­di­ate search neigh­bor­hood, ver­sus the hu­man abil­ity to do things in one fell swoop.

Our com­pres­sion of the search space is also re­spon­si­ble for ideas cas­cad­ing much more eas­ily than adap­ta­tions. We ac­tively ex­am­ine good ideas, look­ing for neigh­bors.

But an in­sight is higher-level than this; it con­sists of un­der­stand­ing what’s “good” about an idea in a way that di­vorces it from any sin­gle point in the search space. In this way you can crack whole vol­umes of the solu­tion space in one swell foop. The in­sight of calcu­lus apart from grav­ity is again a good ex­am­ple, or the in­sight of math­e­mat­i­cal physics apart from calcu­lus, or the in­sight of math apart from math­e­mat­i­cal physics.

Evolu­tion is not com­pletely barred from mak­ing “dis­cov­er­ies” that de­crease the cost of a very wide range of fur­ther dis­cov­er­ies. Con­sider e.g. the ri­bo­some, which was ca­pa­ble of man­u­fac­tur­ing a far wider range of pro­teins than what­ever it was ac­tu­ally mak­ing at the time of its adap­ta­tion: this is a gen­eral cost-de­creaser for a wide range of adap­ta­tions. It like­wise seems likely that var­i­ous types of neu­ron have rea­son­ably-gen­eral learn­ing paradigms built into them (gra­di­ent de­scent, Heb­bian learn­ing, more so­phis­ti­cated op­ti­miz­ers) that have been reused for many more prob­lems than they were origi­nally in­vented for.

A ri­bo­some is some­thing like in­sight: an item of “knowl­edge” that tremen­dously de­creases the cost of in­vent­ing a wide range of solu­tions. But even evolu­tion’s best “in­sights” are not quite like the hu­man kind. A suffi­ciently pow­er­ful hu­man in­sight of­ten ap­proaches a closed form—it doesn’t feel like you’re ex­plor­ing even a com­pressed search space. You just ap­ply the in­sight-knowl­edge to what­ever your prob­lem, and out pops the now-ob­vi­ous solu­tion.

In­sights have of­ten cas­caded, in hu­man his­tory—even ma­jor in­sights. But they don’t quite cy­cle—you can’t re­peat the iden­ti­cal pat­tern New­ton used origi­nally to get a new kind of calcu­lus that’s twice and then three times as pow­er­ful.

Hu­man AI pro­gram­mers who have in­sights into in­tel­li­gence may ac­quire dis­con­tin­u­ous ad­van­tages over oth­ers who lack those in­sights. AIs them­selves will ex­pe­rience dis­con­ti­nu­ities in their growth tra­jec­tory as­so­ci­ated with be­com­ing able to do AI the­ory it­self—a wa­ter­shed mo­ment in the FOOM.