Complexity and Intelligence

Fol­lowup to: Build­ing Some­thing Smarter, Say Not “Com­plex­ity”, That Alien Message

One of the Godel-in­spired challenges to the idea of self-im­prov­ing minds is based on the no­tion of “com­plex­ity”.

Now “com­plex­ity”, as I’ve pre­vi­ously men­tioned, is a dan­ger­ous sort of word. “Com­plex­ity” con­jures up the image of a huge ma­chine with in­com­pre­hen­si­bly many gears in­side—an im­pres­sive sort of image. Thanks to this im­pres­sive­ness, “com­plex­ity” sounds like it could be ex­plain­ing all sorts of things—that all sorts of phe­nom­ena could be hap­pen­ing be­cause of “com­plex­ity”.

It so hap­pens that “com­plex­ity” also names an­other mean­ing, strict and math­e­mat­i­cal: the Kol­mogorov com­plex­ity of a pat­tern is the size of the pro­gram code of the short­est Tur­ing ma­chine that pro­duces the pat­tern as an out­put, given un­limited tape as work­ing mem­ory.

I im­me­di­ately note that this math­e­mat­i­cal mean­ing, is not the same as that in­tu­itive image that comes to mind when you say “com­plex­ity”. The vast im­pres­sive-look­ing col­lec­tion of wheels and gears? That’s not what the math term means.

Sup­pose you ran a Tur­ing ma­chine with un­limited tape, so that, start­ing from our laws of physics, it simu­lated our whole uni­verse—not just the re­gion of space we see around us, but all re­gions of space and all quan­tum branches. (There’s strong in­di­ca­tions our uni­verse may be effec­tively dis­crete, but if not, just calcu­late it out to 3^^^3 digits of pre­ci­sion.)

Then the “Kol­mogorov com­plex­ity” of that en­tire uni­verse—through­out all of space and all of time, from the Big Bang to what­ever end, and all the life forms that ever evolved on Earth and all the de­co­her­ent branches of Earth and all the life-bear­ing planets any­where, and all the in­tel­li­gences that ever de­vised galac­tic civ­i­liza­tions, and all the art and all the tech­nol­ogy and ev­ery ma­chine ever built by those civ­i­liza­tions...

...would be 500 bits, or what­ever the size of the true laws of physics when writ­ten out as equa­tions on a sheet of pa­per.

The Kol­mogorov com­plex­ity of just a sin­gle planet, like Earth, would of course be much higher than the “com­plex­ity” of the en­tire uni­verse that con­tains it.

“Eh?” you say. “What’s this?” you say. “How can a sin­gle planet con­tain more wheels and gears, more com­plex­ity, than the whole vast turn­ing uni­verse that em­beds it? Wouldn’t one planet con­tain fewer books, fewer brains, fewer species?”

But the new mean­ing that cer­tain math­e­mat­i­ci­ans have for­mu­lated and at­tached to the English word “com­plex­ity”, is not like the vi­su­ally im­pres­sive com­plex­ity of a con­fus­ing sys­tem of wheels and gears.

It is, rather, the size of the small­est Tur­ing ma­chine that un­folds into a cer­tain pat­tern.

If you want to print out the en­tire uni­verse from the be­gin­ning of time to the end, you only need to spec­ify the laws of physics.

If you want to print out just Earth by it­self, then it’s not enough to spec­ify the laws of physics. You also have to point to just Earth within the uni­verse. You have to nar­row it down some­how. And, in a big uni­verse, that’s go­ing to take a lot of in­for­ma­tion. It’s like the differ­ence be­tween giv­ing di­rec­tions to a city, and giv­ing di­rec­tions for find­ing a sin­gle grain of sand on one beach of one city. In­deed, with all those quan­tum branches ex­ist­ing side-by-side, it’s go­ing to take around as much in­for­ma­tion to find Earth in the uni­verse, as to just di­rectly spec­ify the po­si­tions of all the atoms.

Kol­mogorov com­plex­ity is the sense at which we zoom into the end­less swirls of the Man­delbrot frac­tal and think, not “How com­pli­cated”, but rather, “How sim­ple”, be­cause the Man­delbrot set is defined us­ing a very sim­ple equa­tion. But if you wanted to find a sub­set of the Man­delbrot set, one par­tic­u­lar swirl, you would have to spec­ify where to look, and that would take more bits.

That’s why we use the Kol­mogorov com­plex­ity in Oc­cam’s Ra­zor to de­ter­mine how “com­pli­cated” or “sim­ple” some­thing is. So that, when we think of the en­tire uni­verse, all the stars we can see and all the im­plied stars we can’t see, and hy­poth­e­size laws of physics stand­ing be­hind it, we will think “what a sim­ple uni­verse” and not “what a com­pli­cated uni­verse”—just like look­ing into the Man­delbrot frac­tal and think­ing “how sim­ple”. We could never ac­cept a the­ory of physics as prob­a­bly true, or even re­motely pos­si­ble, if it got an Oc­cam penalty the size of the uni­verse.

As a log­i­cal con­se­quence of the way that Kol­mogorov com­plex­ity has been defined, no closed sys­tem can ever in­crease in Kol­mogorov com­plex­ity. (At least, no closed sys­tem with­out a ‘true ran­dom­ness’ source.) A pro­gram can pat­tern ever more in­ter­act­ing wheels and gears into its RAM, but noth­ing it does from within it­self can in­crease “the size of the small­est com­puter pro­gram that un­folds into it”, by defi­ni­tion.

Sup­pose, for ex­am­ple, that you had a com­puter pro­gram that defined a syn­thetic biol­ogy and a syn­thetic gene sys­tem. And this com­puter pro­gram ran through an ar­tifi­cial evolu­tion that simu­lated 10^44 or­ganisms (which is one es­ti­mate for the num­ber of crea­tures who have ever lived on Earth), and sub­jected them to some kind of com­pe­ti­tion. And fi­nally, af­ter some pre­defined num­ber of gen­er­a­tions, it se­lected the high­est-scor­ing or­ganism, and printed it out. In an in­tu­itive sense, you would (ex­pect to) say that the best or­ganisms on each round were get­ting more com­pli­cated, their biol­ogy more in­tri­cate, as the ar­tifi­cial biosys­tem evolved. But from the stand­point of Kol­mogorov com­plex­ity, the fi­nal or­ganism, even af­ter a trillion years, is no more “com­plex” than the pro­gram code it takes to spec­ify the tour­na­ment and the crite­rion of com­pe­ti­tion. The or­ganism that wins is im­plicit in the speci­fi­ca­tion of the tour­na­ment, so it can’t be any more “com­plex” than that.

But if, on the other hand, you reached into the biol­ogy and made a few mil­lion ran­dom changes here and there, the Kol­mogorov com­plex­ity of the whole sys­tem would shoot way up: any­one want­ing to spec­ify it ex­actly, would have to de­scribe the ran­dom changes that you made.

I spec­ify “ran­dom” changes, be­cause if you made the changes with benefi­cence aforethought—ac­cord­ing to some crite­rion of good­ness—then I could just talk about the com­pact crite­rion you used to make the changes. Only ran­dom in­for­ma­tion is in­com­press­ible on av­er­age, so you have to make pur­pose­less changes if you want to in­crease the Kol­mogorov com­plex­ity as fast as pos­si­ble.

So! As you’ve prob­a­bly guessed, the ar­gu­ment against self-im­prove­ment is that since closed sys­tems can­not in­crease their “com­plex­ity”, the AI must look out upon the world, de­mand­ing a high-band­width sen­sory feed, in or­der to grow up.

If, that is, you be­lieve that “in­creas­ing Kol­mogorov com­plex­ity” is pre­req­ui­site to in­creas­ing real-world effec­tive­ness.

(We will dis­pense with the idea that if sys­tem A builds sys­tem B, then sys­tem A must be “by defi­ni­tion” as smart as sys­tem B. This is the “Ein­stein’s mother must have been one heck of a physi­cist” sophistry. Even if a fu­ture plane­tary ecol­ogy is in some sense “im­plicit” in a sin­gle self-repli­cat­ing RNA strand in some tidal pool, the ecol­ogy is a lot more im­pres­sive in a real-world sense: in a given pe­riod of time it can ex­ert larger op­ti­miza­tion pres­sures and do more neat stuff.)

Now, how does one ar­gue that “in­creas­ing Kol­mogorov com­plex­ity” has some­thing to do with in­creas­ing in­tel­li­gence? Espe­cially when small ma­chines can un­fold into whole uni­verses, and the max­i­mum Kol­mogorov com­plex­ity is re­al­ized by ran­dom noise?

One of the other things that a closed com­putable sys­tem prov­ably can’t do, is solve the gen­eral halt­ing prob­lem—the prob­lem of tel­ling whether any Tur­ing ma­chine halts.

A similar proof shows that, if you take some given solver, and con­sider the max­i­mum size bound such that the solver can solve the halt­ing prob­lem for all ma­chines of that size or less, then this om­ni­science is bounded by at most the solver’s own com­plex­ity plus a small con­stant.

So… isn’t in­creas­ing your Kol­mogorov com­plex­ity through out­side sen­sory in­puts, the key to learn­ing to solve the halt­ing prob­lem for ever-larger sys­tems?

And doesn’t this show that no closed sys­tem can “self-im­prove”?

In a word, no.

I mean, if you were to try to write it out as logic, you’d find that one of the steps in­volves say­ing, “If you can solve all sys­tems of com­plex­ity N, you must be of com­plex­ity greater than N (maybe minus a small con­stant, de­pend­ing on the de­tails of the proof). There­fore, by in­creas­ing your com­plex­ity, you in­crease the range of sys­tems you can solve.” This is for­mally a non-se­quitur.

It’s also a non-se­quitur in prac­tice.

I mean, sure, if we’re not deal­ing with a closed sys­tem, you can’t prove that it won’t solve the halt­ing prob­lem. You could be look­ing at an ex­ter­nal bright light in the sky that flashes on or off to re­veal the halt­ing solu­tion.

But un­less you already have that kind of math­e­mat­i­cal abil­ity your­self, you won’t know just from look­ing at the light that it’s giv­ing you true solu­tions to the halt­ing prob­lem. You must have just been con­structed with faith in the light, and the light must just hap­pen to work.

(And in any com­putable uni­verse, any light in the sky that you see won’t hap­pen to solve the halt­ing prob­lem.)

It’s not easy for “sen­sory in­for­ma­tion” to give you jus­tified new math­e­mat­i­cal knowl­edge that you could not in prin­ci­ple ob­tain with your eyes closed.

It’s not a mat­ter, for ex­am­ple, of see­ing writ­ten in the sky a brilli­ant proof, that you would never have thought of on your own. A closed sys­tem with in­finite RAM can close its eyes, and write out ev­ery pos­si­ble sen­sory ex­pe­rience it could have, along with its own re­ac­tions to them, that could oc­cur within some bounded num­ber of steps. Do­ing this does not in­crease its Kol­mogorov com­plex­ity.

So the no­tion can’t be that the en­vi­ron­ment tells you some­thing that you rec­og­nize as a proof, but didn’t think of on your own. Some­how, hav­ing that sen­sory ex­pe­rience in par­tic­u­lar, has to in­crease your math­e­mat­i­cal abil­ity even af­ter you perfectly imag­ined that ex­pe­rience and your own re­ac­tion to it in ad­vance.

Could it be the heal­ing pow­ers of hav­ing a larger uni­verse to live in, or other peo­ple to talk to? But you can simu­late in­cred­ibly large uni­verses—vastly larger than any­thing we see in our telescopes, up-ar­row large—within a closed sys­tem with­out in­creas­ing its Kol­mogorov com­plex­ity. Within that simu­la­tion you could peek at peo­ple watch­ing the stars, and peek at peo­ple in­ter­act­ing with each other, and pla­gia­rize the books they wrote about the deep wis­dom that comes from be­ing em­bed­ded in a larger world.

What jus­tified novel math­e­mat­i­cal knowl­edge—about the halt­ing prob­lem in par­tic­u­lar—could you gain from a sen­sory ex­pe­rience, that you could not gain from perfectly imag­in­ing that sen­sory ex­pe­rience and your own re­ac­tion to it, nor gain from peek­ing in on a simu­lated uni­verse that in­cluded some­one hav­ing that sen­sory ex­pe­rience?

Well, you can always sup­pose that you were born trust­ing the light in the sky, and the light in the sky always hap­pens to tell the truth.

But what’s ac­tu­ally go­ing on is that the non-se­quitur is com­ing back to bite: In­creas­ing your Kol­mogorov com­plex­ity doesn’t nec­es­sar­ily in­crease your abil­ity to solve math prob­lems. Swal­low­ing a bucket of ran­dom bits will in­crease your Kol­mogorov com­plex­ity too.

You aren’t likely to learn any math­e­mat­ics by gaz­ing up at the ex­ter­nal stars, that you couldn’t learn from “navel-gaz­ing” into an equally large simu­lated uni­verse. Look­ing at the ac­tual stars around you is good for figur­ing out where in the uni­verse you are (the ex­tra in­for­ma­tion that speci­fies your lo­ca­tion) but not much good for learn­ing new math that you couldn’t learn by navel-gaz­ing into a simu­la­tion as large as our uni­verse.

In fact, it’s just bloody hard to fun­da­men­tally in­crease your abil­ity to solve math prob­lems in a way that “no closed sys­tem can do” just by open­ing the sys­tem. So far as I can tell, it ba­si­cally re­quires that the en­vi­ron­ment be magic and that you be born with faith in this fact.

Say­ing that a 500-state Tur­ing ma­chine might be able to solve all prob­lems up to at most 500 states plus a small con­stant, is mis­lead­ing. That’s an up­per bound, not a lower bound, and it comes from hav­ing a con­struc­tive way to build a spe­cific un­solv­able Tur­ing ma­chine out of the solver. In re­al­ity, you’d ex­pect a 500-state Tur­ing ma­chine to get nowhere near solv­ing the halt­ing prob­lem up to 500. I would drop dead of shock if there were a 500-state Tur­ing ma­chine that solved the halt­ing prob­lem for all the Tur­ing ma­chines up to 50 states. The vast ma­jor­ity of 500-state Tur­ing ma­chines that im­ple­ment some­thing that looks like a “halt­ing prob­lem solver” will go nowhere near 500 states (but see this com­ment).

Sup­pose you write a rel­a­tively short Tur­ing ma­chine that, by virtue of its un­limited work­ing mem­ory, cre­ates an en­tire uni­verse con­tain­ing googols or up-ar­rows of atoms...

...and within this uni­verse, life emerges on some planet-equiv­a­lent, and evolves, and de­vel­ops in­tel­li­gence, and de­vises sci­ence to study its eco­sphere and its uni­verse, and builds com­put­ers, and goes out into space and in­ves­ti­gates the var­i­ous phys­i­cal sys­tems that have formed, and per­haps en­coun­ters other life-forms...

...and over the course of trillions or up-ar­rows of years, a trans­galac­tic in­trau­ni­ver­sal econ­omy de­vel­ops, with con­duits con­duct­ing in­for­ma­tion from one end of the uni­verse to an­other (be­cause you didn’t pick a physics with in­con­ve­nient light­speed limits), a su­perWeb of hy­per­in­tel­li­gences all ex­chang­ing in­for­ma­tion...

...and fi­nally—af­ter a long, long time—your com­puter pro­gram blinks a gi­ant mes­sage across the uni­verse, con­tain­ing a prob­lem to be solved and a speci­fi­ca­tion of how to an­swer back, and threat­en­ing to de­stroy their uni­verse if the an­swer is wrong...

...then this in­trau­ni­ver­sal civ­i­liza­tion—and ev­ery­thing it’s ever learned by the­ory or ex­per­i­ment over the last up-ar­row years—is said to con­tain 400 bits of com­plex­ity, or how­ever long the origi­nal pro­gram was.

But where did it get its math pow­ers, from in­side the closed sys­tem?

If we trace back the ori­gins of the hy­per­galac­tic civ­i­liza­tion, then ev­ery be­lief it ever adopted about math, came from up­dat­ing on some ob­served event. That might be a star ex­plod­ing, or it might be the out­put of a calcu­la­tor, or it might be an ob­served event within some mind’s brain… but in all cases, the up­date will oc­cur be­cause of a logic that says, “If I see this, it means that”. Be­fore you can learn, you must have the prior that struc­tures learn­ing. If you see some­thing that makes you de­cide to change the way you learn, then you must be­lieve that see­ing X im­plies you should learn a differ­ent way Y. That’s how it would be for su­per­in­tel­li­gences, I ex­pect.

If you keep trac­ing back through that simu­lated uni­verse, you ar­rive at some­thing be­fore the dawn of su­per­in­tel­li­gence—the first in­tel­li­gent be­ings, pro­duced by evolu­tion. Th­ese first minds are the ones who’ll look at Peano Arith­metic and think, “This has never pro­duced a con­tra­dic­tion, so it prob­a­bly never will—I’ll go ahead and pro­gram that into my AI.” Th­ese first minds are the ones who’ll think, “In­duc­tion seems like it works pretty well, but how do I for­mal­ize the no­tion of in­duc­tion?” And these first minds are the ones who’ll think, “If I build a self-im­prov­ing AI, how should it up­date it­self—in­clud­ing chang­ing its own up­dat­ing pro­cess—from the re­sults of ob­ser­va­tion?”

And how did the first minds get the abil­ity to think those thoughts? From nat­u­ral se­lec­tion, that gen­er­ated the adap­ta­tions that ex­e­cuted to think all those thoughts, us­ing the sim­ple evolu­tion­ary rule: “keep what works”.

And in turn, nat­u­ral se­lec­tion in this uni­verse popped out of the laws of physics.

So ev­ery­thing that this hy­per­galac­tic civ­i­liza­tion ever be­lieves about math, is re­ally just in­duc­tion in one form or an­other. All the evolved adap­ta­tions that do in­duc­tion, pro­duced by in­duc­tive nat­u­ral se­lec­tion; and all the gen­er­al­iza­tions made from ex­pe­rience, in­clud­ing gen­er­al­iza­tions about how to form bet­ter gen­er­al­iza­tions. It would all just un­fold out of the in­duc­tive prin­ci­ple...

...run­ning in a box sealed as tightly shut as our own uni­verse ap­pears to be.

And I don’t see how we, in our own closed uni­verse, are go­ing to do any bet­ter. Even if we have the abil­ity to look up at the stars, it’s not go­ing to give us the abil­ity to go out­side that in­duc­tive chain to ob­tain jus­tified math­e­mat­i­cal be­liefs.

If you wrote that 400-bit simu­lated uni­verse over the course of a cou­ple of months us­ing hu­man-level in­tel­li­gence and some mys­te­ri­ous source of un­limited com­put­ing power, then you are much more com­plex than that hy­per­galac­tic civ­i­liza­tion. You take much more than 400 bits to find within the space of pos­si­bil­ities, be­cause you are only one par­tic­u­lar brain.

But y’know, I think that your mind, and the up-ar­row mind of that in­con­ceiv­able civ­i­liza­tion, would still be worth dis­t­in­guish­ing as Pow­ers. Even if you can figure out how to ask them ques­tions. And even if you’re ask­ing them ques­tions by run­ning an in­ter­nal simu­la­tion, which makes it all part of your own “com­plex­ity” as defined in the math.

To lo­cate a up-ar­row-sized mind within an up-ar­row-sized civ­i­liza­tion, would re­quire up-ar­row bits—even if the en­tire civ­i­liza­tion un­folded out of a 400-bit ma­chine as com­pact as our own laws of physics. But which would be more pow­er­ful, that one “com­plex” mind, or the “sim­ple” civ­i­liza­tion it was part of?

None of this vi­o­lates Godelian limi­ta­tions. You can trans­mit to the hy­per­galac­tic civ­i­liza­tion a similar Tur­ing ma­chine to the one that built it, and ask it how that Tur­ing ma­chine be­haves. If you can fold a hy­per­galac­tic civ­i­liza­tion into a 400-bit Tur­ing ma­chine, then even a hy­per­galac­tic civ­i­liza­tion can con­front ques­tions about the be­hav­ior of 400-bit Tur­ing ma­chines that are real stumpers.

And 400 bits is go­ing to be an over­es­ti­mate. I bet there’s at least one up-ar­row-sized hy­per­galac­tic civ­i­liza­tion folded into a halt­ing Tur­ing ma­chine with 15 states, or some­thing like that. If that seems un­rea­son­able, you are not ac­quainted with the Busy-Beaver prob­lem.

You can get a hell of a lot of math­e­mat­i­cal abil­ity out of small Tur­ing ma­chines that un­fold into pan­galac­tic hy­per­civ­i­liza­tions. But un­for­tu­nately, there are other small Tur­ing ma­chines that are hellishly difficult prob­lems—per­haps un­fold­ing into hy­per­civ­i­liza­tions them­selves, or things even less com­pre­hen­si­ble. So even the tremen­dous math­e­mat­i­cal minds that can un­fold out of small Tur­ing ma­chines, won’t be able to solve all Tur­ing ma­chines up to a larger size bound. Hence no Godelian con­tra­dic­tion.

(I won­der: If hu­man­ity un­folded into a fu­ture civ­i­liza­tion of in­finite space and in­finite time, cre­at­ing de­scen­dants and hy­per­de­scen­dants of un­limit­edly grow­ing size, what would be the largest Busy Beaver num­ber ever agreed upon? 15? Maybe even as large as 20? Surely not 100 - you could en­code a civ­i­liza­tion of similar ori­gins and equiv­a­lent power into a smaller Tur­ing ma­chine than that.)

Olie Lamb said: “I don’t see any­thing good about com­plex­ity. There’s noth­ing art­ful about com­plex­ity. There’s noth­ing mys­ti­cal about com­plex­ity. It’s just com­plex.” This is true even when you’re talk­ing about wheels and gears, never mind Kol­mogorov com­plex­ity. It’s sim­plic­ity, not com­plex­ity, that is the ad­mirable virtue.

The real force be­hind this whole de­bate is that the word “com­plex­ity” sounds im­pres­sive and can act as an ex­pla­na­tion for any­thing you don’t un­der­stand. Then the word gets cap­tured by a math­e­mat­i­cal prop­erty that’s spel­led us­ing the same let­ters, which hap­pens to be prov­ably con­stant for closed sys­tems.

That, I think, is where the ar­gu­ment re­ally comes from, as it ra­tio­nal­izes the even more prim­i­tive in­tu­ition of some blind hel­pless thing in a box.

This ar­gu­ment is pro­mul­gated even by some peo­ple who can write proofs about com­plex­ity—but frankly, I do not think they have picked up the habit of vi­su­al­iz­ing what their math sym­bols mean in real life. That thing Richard Feyn­man did, where he vi­su­al­ized two balls turn­ing col­ors or grow­ing hairs? I don’t think they do that. On the last step of in­ter­pre­ta­tion, they just make a quick ap­peal to the sound of the words.

But I will agree that, if the laws we know are true, then a self-im­prov­ing mind which lacks sen­sory in­puts, shouldn’t be able to im­prove its math­e­mat­i­cal abil­ities be­yond the level of a up-ar­row-sized civ­i­liza­tion—for ex­am­ple, it shouldn’t be able to solve Busy-Beaver(100).

It might per­haps be more limited than this in mere prac­tice, if it’s just run­ning on a lap­top com­puter or some­thing. But if the­o­ret­i­cal math­e­mat­i­cal ar­gu­ments about closed sys­tems show any­thing, that is what they show.