# Complexity and Intelligence

One of the Godel-in­spired challenges to the idea of self-im­prov­ing minds is based on the no­tion of “com­plex­ity”.

Now “com­plex­ity”, as I’ve pre­vi­ously men­tioned, is a dan­ger­ous sort of word. “Com­plex­ity” con­jures up the image of a huge ma­chine with in­com­pre­hen­si­bly many gears in­side—an im­pres­sive sort of image. Thanks to this im­pres­sive­ness, “com­plex­ity” sounds like it could be ex­plain­ing all sorts of things—that all sorts of phe­nom­ena could be hap­pen­ing be­cause of “com­plex­ity”.

It so hap­pens that “com­plex­ity” also names an­other mean­ing, strict and math­e­mat­i­cal: the Kol­mogorov com­plex­ity of a pat­tern is the size of the pro­gram code of the short­est Tur­ing ma­chine that pro­duces the pat­tern as an out­put, given un­limited tape as work­ing mem­ory.

I im­me­di­ately note that this math­e­mat­i­cal mean­ing, is not the same as that in­tu­itive image that comes to mind when you say “com­plex­ity”. The vast im­pres­sive-look­ing col­lec­tion of wheels and gears? That’s not what the math term means.

Sup­pose you ran a Tur­ing ma­chine with un­limited tape, so that, start­ing from our laws of physics, it simu­lated our whole uni­verse—not just the re­gion of space we see around us, but all re­gions of space and all quan­tum branches. (There’s strong in­di­ca­tions our uni­verse may be effec­tively dis­crete, but if not, just calcu­late it out to 3^^^3 digits of pre­ci­sion.)

Then the “Kol­mogorov com­plex­ity” of that en­tire uni­verse—through­out all of space and all of time, from the Big Bang to what­ever end, and all the life forms that ever evolved on Earth and all the de­co­her­ent branches of Earth and all the life-bear­ing planets any­where, and all the in­tel­li­gences that ever de­vised galac­tic civ­i­liza­tions, and all the art and all the tech­nol­ogy and ev­ery ma­chine ever built by those civ­i­liza­tions...

...would be 500 bits, or what­ever the size of the true laws of physics when writ­ten out as equa­tions on a sheet of pa­per.

The Kol­mogorov com­plex­ity of just a sin­gle planet, like Earth, would of course be much higher than the “com­plex­ity” of the en­tire uni­verse that con­tains it.

“Eh?” you say. “What’s this?” you say. “How can a sin­gle planet con­tain more wheels and gears, more com­plex­ity, than the whole vast turn­ing uni­verse that em­beds it? Wouldn’t one planet con­tain fewer books, fewer brains, fewer species?”

But the new mean­ing that cer­tain math­e­mat­i­ci­ans have for­mu­lated and at­tached to the English word “com­plex­ity”, is not like the vi­su­ally im­pres­sive com­plex­ity of a con­fus­ing sys­tem of wheels and gears.

It is, rather, the size of the small­est Tur­ing ma­chine that un­folds into a cer­tain pat­tern.

If you want to print out the en­tire uni­verse from the be­gin­ning of time to the end, you only need to spec­ify the laws of physics.

If you want to print out just Earth by it­self, then it’s not enough to spec­ify the laws of physics. You also have to point to just Earth within the uni­verse. You have to nar­row it down some­how. And, in a big uni­verse, that’s go­ing to take a lot of in­for­ma­tion. It’s like the differ­ence be­tween giv­ing di­rec­tions to a city, and giv­ing di­rec­tions for find­ing a sin­gle grain of sand on one beach of one city. In­deed, with all those quan­tum branches ex­ist­ing side-by-side, it’s go­ing to take around as much in­for­ma­tion to find Earth in the uni­verse, as to just di­rectly spec­ify the po­si­tions of all the atoms.

Kol­mogorov com­plex­ity is the sense at which we zoom into the end­less swirls of the Man­delbrot frac­tal and think, not “How com­pli­cated”, but rather, “How sim­ple”, be­cause the Man­delbrot set is defined us­ing a very sim­ple equa­tion. But if you wanted to find a sub­set of the Man­delbrot set, one par­tic­u­lar swirl, you would have to spec­ify where to look, and that would take more bits.

That’s why we use the Kol­mogorov com­plex­ity in Oc­cam’s Ra­zor to de­ter­mine how “com­pli­cated” or “sim­ple” some­thing is. So that, when we think of the en­tire uni­verse, all the stars we can see and all the im­plied stars we can’t see, and hy­poth­e­size laws of physics stand­ing be­hind it, we will think “what a sim­ple uni­verse” and not “what a com­pli­cated uni­verse”—just like look­ing into the Man­delbrot frac­tal and think­ing “how sim­ple”. We could never ac­cept a the­ory of physics as prob­a­bly true, or even re­motely pos­si­ble, if it got an Oc­cam penalty the size of the uni­verse.

As a log­i­cal con­se­quence of the way that Kol­mogorov com­plex­ity has been defined, no closed sys­tem can ever in­crease in Kol­mogorov com­plex­ity. (At least, no closed sys­tem with­out a ‘true ran­dom­ness’ source.) A pro­gram can pat­tern ever more in­ter­act­ing wheels and gears into its RAM, but noth­ing it does from within it­self can in­crease “the size of the small­est com­puter pro­gram that un­folds into it”, by defi­ni­tion.

Sup­pose, for ex­am­ple, that you had a com­puter pro­gram that defined a syn­thetic biol­ogy and a syn­thetic gene sys­tem. And this com­puter pro­gram ran through an ar­tifi­cial evolu­tion that simu­lated 10^44 or­ganisms (which is one es­ti­mate for the num­ber of crea­tures who have ever lived on Earth), and sub­jected them to some kind of com­pe­ti­tion. And fi­nally, af­ter some pre­defined num­ber of gen­er­a­tions, it se­lected the high­est-scor­ing or­ganism, and printed it out. In an in­tu­itive sense, you would (ex­pect to) say that the best or­ganisms on each round were get­ting more com­pli­cated, their biol­ogy more in­tri­cate, as the ar­tifi­cial biosys­tem evolved. But from the stand­point of Kol­mogorov com­plex­ity, the fi­nal or­ganism, even af­ter a trillion years, is no more “com­plex” than the pro­gram code it takes to spec­ify the tour­na­ment and the crite­rion of com­pe­ti­tion. The or­ganism that wins is im­plicit in the speci­fi­ca­tion of the tour­na­ment, so it can’t be any more “com­plex” than that.

But if, on the other hand, you reached into the biol­ogy and made a few mil­lion ran­dom changes here and there, the Kol­mogorov com­plex­ity of the whole sys­tem would shoot way up: any­one want­ing to spec­ify it ex­actly, would have to de­scribe the ran­dom changes that you made.

I spec­ify “ran­dom” changes, be­cause if you made the changes with benefi­cence aforethought—ac­cord­ing to some crite­rion of good­ness—then I could just talk about the com­pact crite­rion you used to make the changes. Only ran­dom in­for­ma­tion is in­com­press­ible on av­er­age, so you have to make pur­pose­less changes if you want to in­crease the Kol­mogorov com­plex­ity as fast as pos­si­ble.

So! As you’ve prob­a­bly guessed, the ar­gu­ment against self-im­prove­ment is that since closed sys­tems can­not in­crease their “com­plex­ity”, the AI must look out upon the world, de­mand­ing a high-band­width sen­sory feed, in or­der to grow up.

If, that is, you be­lieve that “in­creas­ing Kol­mogorov com­plex­ity” is pre­req­ui­site to in­creas­ing real-world effec­tive­ness.

(We will dis­pense with the idea that if sys­tem A builds sys­tem B, then sys­tem A must be “by defi­ni­tion” as smart as sys­tem B. This is the “Ein­stein’s mother must have been one heck of a physi­cist” sophistry. Even if a fu­ture plane­tary ecol­ogy is in some sense “im­plicit” in a sin­gle self-repli­cat­ing RNA strand in some tidal pool, the ecol­ogy is a lot more im­pres­sive in a real-world sense: in a given pe­riod of time it can ex­ert larger op­ti­miza­tion pres­sures and do more neat stuff.)

Now, how does one ar­gue that “in­creas­ing Kol­mogorov com­plex­ity” has some­thing to do with in­creas­ing in­tel­li­gence? Espe­cially when small ma­chines can un­fold into whole uni­verses, and the max­i­mum Kol­mogorov com­plex­ity is re­al­ized by ran­dom noise?

One of the other things that a closed com­putable sys­tem prov­ably can’t do, is solve the gen­eral halt­ing prob­lem—the prob­lem of tel­ling whether any Tur­ing ma­chine halts.

A similar proof shows that, if you take some given solver, and con­sider the max­i­mum size bound such that the solver can solve the halt­ing prob­lem for all ma­chines of that size or less, then this om­ni­science is bounded by at most the solver’s own com­plex­ity plus a small con­stant.

So… isn’t in­creas­ing your Kol­mogorov com­plex­ity through out­side sen­sory in­puts, the key to learn­ing to solve the halt­ing prob­lem for ever-larger sys­tems?

And doesn’t this show that no closed sys­tem can “self-im­prove”?

In a word, no.

I mean, if you were to try to write it out as logic, you’d find that one of the steps in­volves say­ing, “If you can solve all sys­tems of com­plex­ity N, you must be of com­plex­ity greater than N (maybe minus a small con­stant, de­pend­ing on the de­tails of the proof). There­fore, by in­creas­ing your com­plex­ity, you in­crease the range of sys­tems you can solve.” This is for­mally a non-se­quitur.

It’s also a non-se­quitur in prac­tice.

I mean, sure, if we’re not deal­ing with a closed sys­tem, you can’t prove that it won’t solve the halt­ing prob­lem. You could be look­ing at an ex­ter­nal bright light in the sky that flashes on or off to re­veal the halt­ing solu­tion.

But un­less you already have that kind of math­e­mat­i­cal abil­ity your­self, you won’t know just from look­ing at the light that it’s giv­ing you true solu­tions to the halt­ing prob­lem. You must have just been con­structed with faith in the light, and the light must just hap­pen to work.

(And in any com­putable uni­verse, any light in the sky that you see won’t hap­pen to solve the halt­ing prob­lem.)

It’s not easy for “sen­sory in­for­ma­tion” to give you jus­tified new math­e­mat­i­cal knowl­edge that you could not in prin­ci­ple ob­tain with your eyes closed.

It’s not a mat­ter, for ex­am­ple, of see­ing writ­ten in the sky a brilli­ant proof, that you would never have thought of on your own. A closed sys­tem with in­finite RAM can close its eyes, and write out ev­ery pos­si­ble sen­sory ex­pe­rience it could have, along with its own re­ac­tions to them, that could oc­cur within some bounded num­ber of steps. Do­ing this does not in­crease its Kol­mogorov com­plex­ity.

So the no­tion can’t be that the en­vi­ron­ment tells you some­thing that you rec­og­nize as a proof, but didn’t think of on your own. Some­how, hav­ing that sen­sory ex­pe­rience in par­tic­u­lar, has to in­crease your math­e­mat­i­cal abil­ity even af­ter you perfectly imag­ined that ex­pe­rience and your own re­ac­tion to it in ad­vance.

Could it be the heal­ing pow­ers of hav­ing a larger uni­verse to live in, or other peo­ple to talk to? But you can simu­late in­cred­ibly large uni­verses—vastly larger than any­thing we see in our telescopes, up-ar­row large—within a closed sys­tem with­out in­creas­ing its Kol­mogorov com­plex­ity. Within that simu­la­tion you could peek at peo­ple watch­ing the stars, and peek at peo­ple in­ter­act­ing with each other, and pla­gia­rize the books they wrote about the deep wis­dom that comes from be­ing em­bed­ded in a larger world.

What jus­tified novel math­e­mat­i­cal knowl­edge—about the halt­ing prob­lem in par­tic­u­lar—could you gain from a sen­sory ex­pe­rience, that you could not gain from perfectly imag­in­ing that sen­sory ex­pe­rience and your own re­ac­tion to it, nor gain from peek­ing in on a simu­lated uni­verse that in­cluded some­one hav­ing that sen­sory ex­pe­rience?

Well, you can always sup­pose that you were born trust­ing the light in the sky, and the light in the sky always hap­pens to tell the truth.

But what’s ac­tu­ally go­ing on is that the non-se­quitur is com­ing back to bite: In­creas­ing your Kol­mogorov com­plex­ity doesn’t nec­es­sar­ily in­crease your abil­ity to solve math prob­lems. Swal­low­ing a bucket of ran­dom bits will in­crease your Kol­mogorov com­plex­ity too.

You aren’t likely to learn any math­e­mat­ics by gaz­ing up at the ex­ter­nal stars, that you couldn’t learn from “navel-gaz­ing” into an equally large simu­lated uni­verse. Look­ing at the ac­tual stars around you is good for figur­ing out where in the uni­verse you are (the ex­tra in­for­ma­tion that speci­fies your lo­ca­tion) but not much good for learn­ing new math that you couldn’t learn by navel-gaz­ing into a simu­la­tion as large as our uni­verse.

In fact, it’s just bloody hard to fun­da­men­tally in­crease your abil­ity to solve math prob­lems in a way that “no closed sys­tem can do” just by open­ing the sys­tem. So far as I can tell, it ba­si­cally re­quires that the en­vi­ron­ment be magic and that you be born with faith in this fact.

Say­ing that a 500-state Tur­ing ma­chine might be able to solve all prob­lems up to at most 500 states plus a small con­stant, is mis­lead­ing. That’s an up­per bound, not a lower bound, and it comes from hav­ing a con­struc­tive way to build a spe­cific un­solv­able Tur­ing ma­chine out of the solver. In re­al­ity, you’d ex­pect a 500-state Tur­ing ma­chine to get nowhere near solv­ing the halt­ing prob­lem up to 500. I would drop dead of shock if there were a 500-state Tur­ing ma­chine that solved the halt­ing prob­lem for all the Tur­ing ma­chines up to 50 states. The vast ma­jor­ity of 500-state Tur­ing ma­chines that im­ple­ment some­thing that looks like a “halt­ing prob­lem solver” will go nowhere near 500 states (but see this com­ment).

Sup­pose you write a rel­a­tively short Tur­ing ma­chine that, by virtue of its un­limited work­ing mem­ory, cre­ates an en­tire uni­verse con­tain­ing googols or up-ar­rows of atoms...

...and within this uni­verse, life emerges on some planet-equiv­a­lent, and evolves, and de­vel­ops in­tel­li­gence, and de­vises sci­ence to study its eco­sphere and its uni­verse, and builds com­put­ers, and goes out into space and in­ves­ti­gates the var­i­ous phys­i­cal sys­tems that have formed, and per­haps en­coun­ters other life-forms...

...and over the course of trillions or up-ar­rows of years, a trans­galac­tic in­trau­ni­ver­sal econ­omy de­vel­ops, with con­duits con­duct­ing in­for­ma­tion from one end of the uni­verse to an­other (be­cause you didn’t pick a physics with in­con­ve­nient light­speed limits), a su­perWeb of hy­per­in­tel­li­gences all ex­chang­ing in­for­ma­tion...

...and fi­nally—af­ter a long, long time—your com­puter pro­gram blinks a gi­ant mes­sage across the uni­verse, con­tain­ing a prob­lem to be solved and a speci­fi­ca­tion of how to an­swer back, and threat­en­ing to de­stroy their uni­verse if the an­swer is wrong...

...then this in­trau­ni­ver­sal civ­i­liza­tion—and ev­ery­thing it’s ever learned by the­ory or ex­per­i­ment over the last up-ar­row years—is said to con­tain 400 bits of com­plex­ity, or how­ever long the origi­nal pro­gram was.

But where did it get its math pow­ers, from in­side the closed sys­tem?

If we trace back the ori­gins of the hy­per­galac­tic civ­i­liza­tion, then ev­ery be­lief it ever adopted about math, came from up­dat­ing on some ob­served event. That might be a star ex­plod­ing, or it might be the out­put of a calcu­la­tor, or it might be an ob­served event within some mind’s brain… but in all cases, the up­date will oc­cur be­cause of a logic that says, “If I see this, it means that”. Be­fore you can learn, you must have the prior that struc­tures learn­ing. If you see some­thing that makes you de­cide to change the way you learn, then you must be­lieve that see­ing X im­plies you should learn a differ­ent way Y. That’s how it would be for su­per­in­tel­li­gences, I ex­pect.

If you keep trac­ing back through that simu­lated uni­verse, you ar­rive at some­thing be­fore the dawn of su­per­in­tel­li­gence—the first in­tel­li­gent be­ings, pro­duced by evolu­tion. Th­ese first minds are the ones who’ll look at Peano Arith­metic and think, “This has never pro­duced a con­tra­dic­tion, so it prob­a­bly never will—I’ll go ahead and pro­gram that into my AI.” Th­ese first minds are the ones who’ll think, “In­duc­tion seems like it works pretty well, but how do I for­mal­ize the no­tion of in­duc­tion?” And these first minds are the ones who’ll think, “If I build a self-im­prov­ing AI, how should it up­date it­self—in­clud­ing chang­ing its own up­dat­ing pro­cess—from the re­sults of ob­ser­va­tion?”

And how did the first minds get the abil­ity to think those thoughts? From nat­u­ral se­lec­tion, that gen­er­ated the adap­ta­tions that ex­e­cuted to think all those thoughts, us­ing the sim­ple evolu­tion­ary rule: “keep what works”.

And in turn, nat­u­ral se­lec­tion in this uni­verse popped out of the laws of physics.

So ev­ery­thing that this hy­per­galac­tic civ­i­liza­tion ever be­lieves about math, is re­ally just in­duc­tion in one form or an­other. All the evolved adap­ta­tions that do in­duc­tion, pro­duced by in­duc­tive nat­u­ral se­lec­tion; and all the gen­er­al­iza­tions made from ex­pe­rience, in­clud­ing gen­er­al­iza­tions about how to form bet­ter gen­er­al­iza­tions. It would all just un­fold out of the in­duc­tive prin­ci­ple...

...run­ning in a box sealed as tightly shut as our own uni­verse ap­pears to be.

And I don’t see how we, in our own closed uni­verse, are go­ing to do any bet­ter. Even if we have the abil­ity to look up at the stars, it’s not go­ing to give us the abil­ity to go out­side that in­duc­tive chain to ob­tain jus­tified math­e­mat­i­cal be­liefs.

If you wrote that 400-bit simu­lated uni­verse over the course of a cou­ple of months us­ing hu­man-level in­tel­li­gence and some mys­te­ri­ous source of un­limited com­put­ing power, then you are much more com­plex than that hy­per­galac­tic civ­i­liza­tion. You take much more than 400 bits to find within the space of pos­si­bil­ities, be­cause you are only one par­tic­u­lar brain.

But y’know, I think that your mind, and the up-ar­row mind of that in­con­ceiv­able civ­i­liza­tion, would still be worth dis­t­in­guish­ing as Pow­ers. Even if you can figure out how to ask them ques­tions. And even if you’re ask­ing them ques­tions by run­ning an in­ter­nal simu­la­tion, which makes it all part of your own “com­plex­ity” as defined in the math.

To lo­cate a up-ar­row-sized mind within an up-ar­row-sized civ­i­liza­tion, would re­quire up-ar­row bits—even if the en­tire civ­i­liza­tion un­folded out of a 400-bit ma­chine as com­pact as our own laws of physics. But which would be more pow­er­ful, that one “com­plex” mind, or the “sim­ple” civ­i­liza­tion it was part of?

None of this vi­o­lates Godelian limi­ta­tions. You can trans­mit to the hy­per­galac­tic civ­i­liza­tion a similar Tur­ing ma­chine to the one that built it, and ask it how that Tur­ing ma­chine be­haves. If you can fold a hy­per­galac­tic civ­i­liza­tion into a 400-bit Tur­ing ma­chine, then even a hy­per­galac­tic civ­i­liza­tion can con­front ques­tions about the be­hav­ior of 400-bit Tur­ing ma­chines that are real stumpers.

And 400 bits is go­ing to be an over­es­ti­mate. I bet there’s at least one up-ar­row-sized hy­per­galac­tic civ­i­liza­tion folded into a halt­ing Tur­ing ma­chine with 15 states, or some­thing like that. If that seems un­rea­son­able, you are not ac­quainted with the Busy-Beaver prob­lem.

You can get a hell of a lot of math­e­mat­i­cal abil­ity out of small Tur­ing ma­chines that un­fold into pan­galac­tic hy­per­civ­i­liza­tions. But un­for­tu­nately, there are other small Tur­ing ma­chines that are hellishly difficult prob­lems—per­haps un­fold­ing into hy­per­civ­i­liza­tions them­selves, or things even less com­pre­hen­si­ble. So even the tremen­dous math­e­mat­i­cal minds that can un­fold out of small Tur­ing ma­chines, won’t be able to solve all Tur­ing ma­chines up to a larger size bound. Hence no Godelian con­tra­dic­tion.

(I won­der: If hu­man­ity un­folded into a fu­ture civ­i­liza­tion of in­finite space and in­finite time, cre­at­ing de­scen­dants and hy­per­de­scen­dants of un­limit­edly grow­ing size, what would be the largest Busy Beaver num­ber ever agreed upon? 15? Maybe even as large as 20? Surely not 100 - you could en­code a civ­i­liza­tion of similar ori­gins and equiv­a­lent power into a smaller Tur­ing ma­chine than that.)

Olie Lamb said: “I don’t see any­thing good about com­plex­ity. There’s noth­ing art­ful about com­plex­ity. There’s noth­ing mys­ti­cal about com­plex­ity. It’s just com­plex.” This is true even when you’re talk­ing about wheels and gears, never mind Kol­mogorov com­plex­ity. It’s sim­plic­ity, not com­plex­ity, that is the ad­mirable virtue.

The real force be­hind this whole de­bate is that the word “com­plex­ity” sounds im­pres­sive and can act as an ex­pla­na­tion for any­thing you don’t un­der­stand. Then the word gets cap­tured by a math­e­mat­i­cal prop­erty that’s spel­led us­ing the same let­ters, which hap­pens to be prov­ably con­stant for closed sys­tems.

That, I think, is where the ar­gu­ment re­ally comes from, as it ra­tio­nal­izes the even more prim­i­tive in­tu­ition of some blind hel­pless thing in a box.

This ar­gu­ment is pro­mul­gated even by some peo­ple who can write proofs about com­plex­ity—but frankly, I do not think they have picked up the habit of vi­su­al­iz­ing what their math sym­bols mean in real life. That thing Richard Feyn­man did, where he vi­su­al­ized two balls turn­ing col­ors or grow­ing hairs? I don’t think they do that. On the last step of in­ter­pre­ta­tion, they just make a quick ap­peal to the sound of the words.

But I will agree that, if the laws we know are true, then a self-im­prov­ing mind which lacks sen­sory in­puts, shouldn’t be able to im­prove its math­e­mat­i­cal abil­ities be­yond the level of a up-ar­row-sized civ­i­liza­tion—for ex­am­ple, it shouldn’t be able to solve Busy-Beaver(100).

It might per­haps be more limited than this in mere prac­tice, if it’s just run­ning on a lap­top com­puter or some­thing. But if the­o­ret­i­cal math­e­mat­i­cal ar­gu­ments about closed sys­tems show any­thing, that is what they show.

• I would drop dead of shock if there were a 500-state Tur­ing ma­chine that solved the halt­ing prob­lem for all the Tur­ing ma­chines up to 50 states.
There ex­ists a con­stant C so that if you want to solve the halt­ing prob­lem for all Tur­ing ma­chines up to N states, you can use a par­tic­u­lar Tur­ing ma­chine with N+C states. More­over, this con­stant C is small (definitely less than 100). In other words, there ex­ists a 500-state Tur­ing ma­chine that solves the halt­ing prob­lem for all the Tur­ing ma­chines up to 400 states.

Here’s the al­gorithm: given as in­put a Tur­ing ma­chine M with less than 400 states,

1. Com­pute a num­ber k greater than BB(400).

2. Si­mu­late M for k steps.

3. If it halts by then, then out­put “M halts”. If it doesn’t, then out­put “M doesn’t halt”.

Cor­rect­ness proof: If it halts in less than k steps, then ob­vi­ously it halts. If it doesn’t, then by the defi­ni­tion of BB(400), it never will.

Anal­y­sis of num­ber of states: This is pos­si­ble with 400+C states be­cause you use about 400 states for step 1, and a con­stant num­ber of over­head states for simu­lat­ing Tur­ing ma­chines. Be­cause there ex­ist uni­ver­sal Tur­ing ma­chines with less than 25 states, you can ar­range for C to be less than 100.

There’s a small amount of sub­tlety in ac­tu­ally do­ing step 1. Let me know if you want more de­tail. How­ever, I be­lieve the over­all re­sult and method should be clear; more­over, I hope the re­sult is un­sur­pris­ing once you see the method. As such, please don’t drop dead of shock.

Oh… well, at least there’s no rea­son­able way for a su­per­in­tel­li­gence that can perform any finite com­pu­ta­tion, to ob­tain that par­tic­u­lar 500-bit Tur­ing ma­chine, through any amount of in­ter­nal think­ing or in­duc­tion from sen­sory ex­pe­rience, with­out us­ing a rea­son­ing method guaran­teed at some point to make a mis­take.

• Oops, I mis-stated the re­sult. If we fix a uni­ver­sal Tur­ing ma­chine (equiv­a­lently, a pro­gram­ming lan­guage), then there ex­ists a con­stant C so that de­cid­ing the halt­ing prob­lem for pro­grams of length less than N can be done by a pro­gram of length less than N+C. Peng­vado’s solu­tion works here.

Un­for­tu­nately, as Richard ob­served, you can’t do this with Tur­ing ma­chines, es­sen­tially be­cause Tur­ing ma­chines can not in­spect their own finite con­trol, un­like the stored pro­gram on a von Neu­mann ma­chine. In fact, it ap­pears that my claim—in Richard’s no­ta­tion, that BBB(N) is N+O(1)---is open. It is known that BBB(N) is O(N), and more­over, a 2002 pa­per “Im­proved Bounds for Func­tions Re­lated to Busy Beavers” by Ben-Am­ran and Petersen showed that BBB(N) is N+o(N). I haven’t found bet­ter re­sults, but on the off-chance that I do, I’ll post an­other com­ment here.

Now, be­cause my claim is open, we still have to check if Eliezer’s origi­nal in­tu­ition (that 500-state Tur­ing ma­chines can’t solve the halt­ing prob­lem for 50-state ones) holds up. Luck­ily for me, it doesn’t. It’s known that BBB(N) is less than 3N+6 (that’s the O(N) re­sult above), so at worst, there ex­ists a 500-state Tur­ing ma­chine that solves the halt­ing prob­lem for 100-state ones. This is less im­pres­sive than what I origi­nally claimed, be­cause it’s off by a mul­ti­plica­tive con­stant rather than an ad­di­tive one, but it still does the trick.

What’s the moral here? I think it’s that you re­ally should define Kol­mogorov com­plex­ity in terms of the num­ber of bits in the short­est pro­gram do­ing what­ever you’re an­a­lyz­ing, as is stan­dard, rather than the num­ber of states in a Tur­ing ma­chine. In­deed, this is how Eliezer defines it in this post; fur­ther­more, Eliezer al­ludes to it when he re­sponds to my post by men­tion­ing a “500-bit Tur­ing ma­chine” [em­pha­sis mine]. Only in the aside be­gin­ning “I would drop dead of shock” was the num­ber of states ac­tu­ally men­tioned.

• In fact, it’s just bloody hard to fun­da­men­tally in­crease your abil­ity to solve math prob­lems in a way that “no closed sys­tem can do” just by open­ing the sys­tem. So far as I can tell, it ba­si­cally re­quires that the en­vi­ron­ment be magic and that you be born with faith in this fact.

Eliezer, you’re mak­ing an im­por­tant er­ror here. I don’t think it af­fects the main ar­gu­ment you’re mak­ing in this ar­ti­cle (that con­sid­er­a­tions of “com­plex­ity” doesn’t rule out self-im­prov­ing minds), but this er­ror may have grave con­se­quences el­se­where. The er­ror is that while the en­vi­ron­ment does have to be magic, you don’t need to have faith in this, just not have faith that it’s im­pos­si­ble.

Sup­pose you get a hold of a black box that seems to act as a halt­ing-prob­lem or­a­cle. You’ve thrown thou­sands of prob­lems at it, and have never seen in in­cor­rect or in­con­sis­tent an­swer. What are the pos­si­bil­ities here? Either (A) the en­vi­ron­ment re­ally is magic (i.e. there is un­com­putable physics that en­ables im­ple­men­ta­tion of ac­tual halt­ing-prob­lem or­a­cles), or (B) the box is just giv­ing ran­dom an­swers that hap­pen to be cor­rect by chance, or (C) you’re part of a simu­la­tion where the box is giv­ing all pos­si­ble com­bi­na­tions of an­swers and you hap­pen to be in the part of the simu­la­tion where the box is giv­ing cor­rect an­swers. As long as your prior prob­a­bil­ity for (A) is not zero, as you do more and more tests and keep get­ting cor­rect an­swers, it’s pos­te­rior prob­a­bil­ity will even­tu­ally dom­i­nate (B) and (C).

Why is this so im­por­tant? Well in stan­dard Solomonoff In­duc­tion, the prior for (A) is zero, and if we pro­gram that into an AI, it won’t do the right thing in this situ­a­tion. This may have a large effect on ex­pected util­ity (of us, peo­ple liv­ing to­day), be­cause while the like­li­hood of us liv­ing in an un­com­putable uni­verse with halt­ing-prob­lem or­a­cles is low, the util­ity we gain from cor­rectly rec­og­niz­ing and ex­ploit­ing such a uni­verse could be huge.

• In fact, it’s just bloody hard to fun­da­men­tally in­crease your abil­ity to solve math prob­lems in a way that “no closed sys­tem can do” just by open­ing the sys­tem. So far as I can tell, it ba­si­cally re­quires that the en­vi­ron­ment be magic and that you be born with faith in this fact.

As Wei men­tioned, P≠NP is ba­si­cally the con­jec­ture that this isn’t true: i.e., that you can ex­po­nen­tially in­crease your abil­ity to solve math prob­lems by your en­vi­ron­ment be­ing magic and your not be­ing born with faith in that fact. So for ex­am­ple, if your en­vi­ron­ment im­me­di­ately in­verted any one-way func­tion, that would be ev­i­dence (no faith re­quired) that your en­vi­ron­ment is not merely ‘slightly’ smarter than you are, but as­tound­ingly smarter. In qual­i­ta­tive terms, I think it would be al­most as as­tound­ing as if the en­vi­ron­ment solved the halt­ing prob­lem.

• I don’t re­call if I’ve men­tioned this be­fore, but Solomonoff in­duc­tion in the mix­ture form makes no men­tion of the truth of its mod­els. It just says that any com­putable prob­a­bil­ity dis­tri­bu­tion is in the mix­ture some­where, so you can do as well as any com­putable form of cog­ni­tive un­cer­tainty up to a con­stant. Eliezer, if what you say is true, then it shouldn’t be pos­si­ble for any­one, us­ing just a Tur­ing ma­chine, to beat Solomonoff In­duc­tion at a pure pre­dic­tion game (by more than a con­stant), even if the in­put se­quence is un­com­putable. But here is a coun­terex­am­ple. Sup­pose the in­put se­quence con­sists of the unary en­cod­ings of Busy Beaver num­bers BB(1), BB(2), BB(3), …, that is, BB(1) num­ber of 1s fol­lowed by a zero, then BB(2) num­ber of 1s fol­lowed by a 0, and so on. Let’s ask the pre­dic­tor, af­ter see­ing n in­put sym­bols, what is the prob­a­bil­ity that it will even­tu­ally see a 0 again, and call this p(n). With Solomonoff In­duc­tion, p(n) will ap­proach ar­bi­trar­ily close to 0 as you in­crease n. A hu­man math­e­mat­i­cian on the other hand will rec­og­nize that the in­put se­quence may not be com­putable and won’t let p(n) fall be­low some non-zero bound.

I don’t quite see this. With Solomonoff in­duc­tion, as with a com­putable hu­man math­e­mat­i­cian, the prob­a­bil­ity of the next sym­bol be­ing 0 will ap­proach 0. I don’t see why a Solomonoff in­duc­tor us­ing mix­tures (that is, eval­u­at­ing com­putable prob­a­bil­ity dis­tri­bu­tions rather than com­putable se­quences) will ever as­sign a prob­a­bil­ity ar­bi­trar­ily close to 0 of see­ing an­other 0, ever.

Ask the hu­man math­e­mat­i­cian, over and over, what their prob­a­bil­ity of the next sym­bol be­ing 0 is. They’re com­putable, so this dis­tri­bu­tion is in the mix­ture. What other dis­tri­bu­tion is it nec­es­sar­ily dom­i­nated by, in the Solomonoff mix­ture?

Or are we al­low­ing the hu­man math­e­mat­i­cian to have an in­con­sis­tent prob­a­bil­ity dis­tri­bu­tion where he says “You’ll see an­other 0 even­tu­ally, I’m sure of it, but I’m also pretty sure that no mat­ter how large a num­ber of 1s I pick, it won’t be high enough.” If so, to be fair, we should fac­tor out the sym­bol for “see an­other 0 even­tu­ally” and just ask the Solomonoff in­duc­tor about that sep­a­rately via some in­put en­cod­ing, the same way we ask the hu­man about it sep­a­rately.

• Nick wrote: Good point, but when the box says “doesn’t halt”, how do I know it’s cor­rect?

A halt­ing-prob­lem or­a­cle can be used for all kinds of things be­sides just check­ing whether an in­di­vi­d­ual Tur­ing ma­chine will halt or not. For ex­am­ple you can use it to an­swer var­i­ous math­e­mat­i­cal ques­tions and pro­duce proofs of the an­swers, and then ver­ify the proofs your­self. You should be able to ob­tain enough proofs to con­vince your­self that the black box is not just giv­ing ran­dom an­swers or just be­ing slightly smarter than you are.

If P!=NP, you should be able to con­vince your­self that the black box has at least ex­po­nen­tially more com­pu­ta­tional power than you do. So if you are an AI with say the com­pu­ta­tional re­sources of a so­lar sys­tem, you should be able to ver­ify that the black box ei­ther con­tains ex­otic physics or has ac­cess to more re­sources than the rest of the uni­verse put to­gether.

Eliezer wrote: So once again I say: it is re­ally hard to im­prove your math abil­ities with eyes open in a way that you couldn’t the­o­ret­i­cally do with eyes closed.

It seems to me that an AI should/​can never com­pletely rule out the pos­si­bil­ity that the uni­verse con­tains physics that is math­e­mat­i­cally more pow­er­ful than what it has already in­cor­po­rated into it­self, so it should always keep its eyes open. Even if it has ab­sorbed the en­tire uni­verse into it­self, it might still be liv­ing in­side a simu­la­tion, right?

• Sup­pose, for ex­am­ple, that you had a com­puter pro­gram that defined a syn­thetic biol­ogy and a syn­thetic gene sys­tem. And this com­puter pro­gram ran through an ar­tifi­cial evolu­tion that simu­lated 10^44 or­ganisms (which is one es­ti­mate for the num­ber of crea­tures who have ever lived on Earth), and sub­jected them to some kind of com­pe­ti­tion. And fi­nally, af­ter some pre­defined num­ber of gen­er­a­tions, it se­lected the high­est-scor­ing or­ganism, and printed it out. In an in­tu­itive sense, you would (ex­pect to) say that the best or­ganisms on each round were get­ting more com­pli­cated, their biol­ogy more in­tri­cate, as the ar­tifi­cial biosys­tem evolved. But from the stand­point of Kol­mogorov com­plex­ity, the fi­nal or­ganism, even af­ter a trillion years, is no more “com­plex” than the pro­gram code it takes to spec­ify the tour­na­ment and the crite­rion of com­pe­ti­tion. The or­ganism that wins is im­plicit in the speci­fi­ca­tion of the tour­na­ment, so it can’t be any more “com­plex” than that.

and later: …then this in­trau­ni­ver­sal civ­i­liza­tion—and ev­ery­thing it’s ever learned by the­ory or ex­per­i­ment over the last up-ar­row years—is said to con­tain 400 bits of com­plex­ity, or how­ever long the origi­nal pro­gram was.

“You say it as if it were a good thing.”

But this seems, on the con­trary, to in­di­cate that Kol­mogorov com­plex­ity might not be a good can­di­date for the for­mal­ized no­tion of in­tu­itive com­plex­ity, in­clud­ing the one used by Oc­cam’s Ra­zor. The fact that it of­ten can be anti-in­tu­itive doesn’t nec­es­sar­ily mean that it’s our in­tu­ition of what is sim­ple and what is com­pli­cated that must change; the mere fact that the no­tion of Kol­mogorov com­plex­ity is it­self very sim­ple and easy to play with math­e­mat­i­cally doesn’t by it­self en­ti­tle it to any­thing.

There ex­ists a proof of Fer­mat’s Last The­o­rem with very small Kol­mogorov com­plex­ity, much smaller than, say, the size of this post of yours in char­ac­ters, or the Kol­mogorov com­plex­ity of some well-known for­mal proof of a ba­sic the­o­rem in calcu­lus. Does that mean that this proof of FLT is a much sim­pler ob­ject than these oth­ers? It does if you in­sist on in­ter­pret­ing “sim­ple” to mean “small Kol­mogorov com­plex­ity”, but why must you so in­sist? If you show this proof to a math­e­mat­i­cian, he or she won’t say “oh, this is very sim­ple, sim­pler than ba­sic calcu­lus proofs.” If you then tell the math­e­mat­i­cian that in fact this very com­plex proof was ob­tained by run­ning this very short Tur­ing ma­chine, they’ll just shrug in re­sponse—so what? As a piece of math­e­mat­ics, the proof is still ev­i­dently ex­tremely com­pli­cated. Maybe your no­tion of com­plex­ity just isn’t so goo d at cap­tur­ing that.

• Good ques­tion, Eliezer. If the hu­man math­e­mat­i­cian is com­putable, why isn’t it already in­cor­po­rated into Solomonoff In­duc­tion? It seems to me that the hu­man math­e­mat­i­cian does not be­have like a Bayesian. Let H be the hy­poth­e­sis that the in­put se­quence is the unary en­cod­ings of Busy Beaver num­bers. The math­e­mat­i­cian will try to es­ti­mate, as best as he can, P(next sym­bol is 0|H). But when the next sym­bol turns out to be 1, he doesn’t do a Bayesian up­date and de­crease P(H), but in­stead says “Ok, so I was wrong. The next Busy Beaver num­ber is big­ger than I ex­pected.”

I’m not sure I un­der­stand what you wrote af­ter “to be fair”. If you think a Solomonoff in­duc­tor can du­pli­cate the above be­hav­ior with an al­ter­na­tive setup, can you elab­o­rate how?

• Um, ex­cept that we also don’t know whether there are com­pu­ta­tions that can be checked in N time but only performed in Ngoogol time. The situ­a­tion is qual­i­ta­tively the same as for N ver­sus 2N.

• Wei, es­sen­tially the black box has to be such that it seems more likely for it to be a halt­ing or­a­cle than some­thing only slightly smarter than you are—since when it says “doesn’t halt”, and you don’t know if it halts or not, it’s not like you can eas­ily check.

Sup­pose, though, that the black box has the ap­par­ent qual­ities of a closed timelike curve in a uni­verse that per­mits CTCs to be con­structed with perfec­tion (even if our own uni­verse per­mit­ted CTCs they would break down well be­fore up-ar­row time). Then it would seem pretty plau­si­ble, af­ter con­struct­ing the CTC and ver­ify­ing its ap­par­ent perfor­mance on the cases you knew how to test, that the CTC wasn’t a trick­ster SI in dis­guise.

But -

• in this case, you can just build a CTC in­side your­self, as a closed sys­tem, and ex­am­ine the re­sults with your eyes shut. Your “com­plex­ity” won’t change, it’ll just have to be defined out­side the Tur­ing for­mal­ism.

So once again I say: it is re­ally hard to im­prove your math abil­ities with eyes open in a way that you couldn’t the­o­ret­i­cally do with eyes closed.

• After fur­ther thought, I need to re­tract my last com­ment. Con­sider P(next sym­bol is 0|H) again, and sup­pose you’ve seen 100 0′s so far, so es­sen­tially you’re try­ing to pre­dict BB(101). The hu­man math­e­mat­i­cian knows that any non-zero num­ber he writes down for this prob­a­bil­ity would be way too big, un­less he re­sorts to non-con­struc­tive no­ta­tion like 1/​BB(101). If you force him to an­swer “over and over, what their prob­a­bil­ity of the next sym­bol be­ing 0 is” and don’t al­low him to use no­ta­tion like 1/​BB(101) then he’d be forced to write down an in­con­sis­tent prob­a­bil­ity dis­tri­bu­tion. But in fact the dis­tri­bu­tion he has in mind is not com­putable, and that ex­plains how he can beat Solomonoff In­duc­tion.

• Henry: “Both of those con­cepts seem com­pletely apt for de­scribing perfectly de­ter­minis­tic sys­tems. But, in de­scribing the “com­plex­ity” of the uni­verse even in some­thing as sim­ple as the ‘pat­tern of stars that ex­ists’ one would still have to take into ac­count po­ten­tial non-de­ter­minis­tic fac­tors such as hu­man be­hav­ior. [...] [A]re you say­ing that you are a strict de­ter­minist?”

I’ll take this one. Yes, we’re pre­sum­ing de­ter­minism here, al­though the de­ter­minism of the Many Wor­lds In­ter­pre­ta­tion is a lit­tle differ­ent from the sin­gle-world de­ter­minism we’re used to think­ing about. Also, I no­tice that in an ear­lier com­ment, you spoke of free will and de­ter­minism as if the con­cepts were op­posed, but de­pend­ing on ex­actly what you mean by free will, this does is not nec­es­sar­ily the case. For Eliezer’s take, see, e.g., “Time­less Con­trol,” or for the philo­soph­i­cal main­stream, google com­pat­i­bil­ism.

• Other­wise, of course a larger en­vi­ron­ment can out­smart you math­e­mat­i­cally.

No, not of course. For ex­am­ple, sup­pose P were equal to PSPACE. Then even though a larger en­vi­ron­ment could fun­da­men­tally out­smart you math­e­mat­i­cally (say by solv­ing the halt­ing prob­lem), it couldn’t prove to you that it was do­ing so. In other words, the situ­a­tion with polyno­mial-time com­pu­ta­tion would be more-or-less the same as it is with un­limited com­pu­ta­tion: su­per­in­tel­li­gent ma­chines could only prove their su­per­in­tel­li­gence to other su­per­in­tel­li­gent ma­chines.

That the situ­a­tion with effi­cient com­pu­ta­tion ap­pears to be differ­ent—i.e., that it ap­pears su­per­in­tel­li­gent ma­chines can in­deed prove their su­per­in­tel­li­gence to fun­da­men­tally dumber ma­chines—is (if true) a profound fact about the world that seems worth call­ing at­ten­tion to. Sure, of course you can nul­lify it by as­sum­ing away all com­plex­ity con­sid­er­a­tions, but why? :-)

• @Toby: Why, yes, I was feel­ing rather grate­ful at that point that I hadn’t quan­tified the prob­a­bil­ity. It’s fair to as­sume that it would have been low enough that I couldn’t plau­si­bly re­cover from the cal­ibra­tion hit, like 1% or some­thing.

@Scott: This en­tire dis­cus­sion as­sumes un­limited finite in­ter­nal com­put­ing power, so P and NP cut no dice here. Other­wise, of course a larger en­vi­ron­ment can out­smart you math­e­mat­i­cally.

@Will: Math­e­mat­i­cal truths are about which ax­ioms im­ply which the­o­rems.

@Wei: A halt­ing or­a­cle is usu­ally said to out­put 1s or 0s, not proofs or halt­ing times, right?

Also @Wei: I don’t re­call if I’ve men­tioned this be­fore, but Solomonoff in­duc­tion in the mix­ture form makes no men­tion of the truth of its mod­els. It just says that any com­putable prob­a­bil­ity dis­tri­bu­tion is in the mix­ture some­where, so you can do as well as any com­putable form of cog­ni­tive un­cer­tainty up to a con­stant.

In other words, if there’s any com­putable re­ac­tion that you have to dis­cov­er­ing what looks like a black box halt­ing solver—any com­putable rea­son­ing that de­cides that “this looks like an un­com­putable halt­ing solver” and pro­duces new dis­tri­bu­tions over com­putably re­lated events as a re­sult—then that’s in the Solomonoff mix­ture.

Solomonoff is not re­ally as bad as it sounds.

But when it comes to mak­ing use of the re­sults to in­cor­po­rate the halt­ing or­a­cle via self-mod­ifi­ca­tion—then Solomonoff blows a fuse, of course, be­cause it was never de­signed for self-mod­ifi­ca­tion in the first place; it’s a Carte­sian for­mal­ism that puts the uni­verse ir­re­vo­ca­bly on the out­side.

• Eliezer, I’m not quite sure why you are ob­sessed with the abil­ity to find math­e­mat­i­cal truths. You can differ­ent sets of math­e­mat­i­cal truths if you take the par­allel pos­tu­late true or not. Which ones are ap­pli­ca­ble in the real world de­pends where you are.

Un­less you know where you are in the uni­verse can not know what is friendly for you to do (or even use­ful).

• Boris: There’s a small amount of sub­tlety in ac­tu­ally do­ing step 1.

Dou­glas: Isn’t it sim­ply im­pos­si­ble? That doesn’t in­terfere with your claim that such a Tur­ing ma­chine ex­ists, but step 1 claims that it’s com­putable.

It’s im­pos­si­ble to bound BB(n) com­putably in n, but that leaves open the fol­low­ing ques­tion, on which Boris’s ar­gu­ment de­pends.

Let BBB(n), the Busy Beaver Bound func­tion, be the size of the small­est Tur­ing ma­chine that, start­ing from a blank tape, prints out an up­per bound for BB(n). Boris’s step (1) claims that BBB(n) is bounded by n+c for some con­stant c. It is not ob­vi­ous to me ei­ther that this is true or that this is false.

• Boris: There’s a small amount of sub­tlety in ac­tu­ally do­ing step 1.

Isn’t it sim­ply im­pos­si­ble? That doesn’t in­terfere with your claim that such a Tur­ing ma­chine ex­ists, but step 1 claims that it’s com­putable.

• Will Pear­son: Show us sim­ple laws which pre­dict how many galax­ies worth of atoms there would be in a uni­verse (why not one?), and I will take 500 bits for gen­er­at­ing the uni­verse some­what se­ri­ously...

Been done. It didn’t work out, and Ed­ding­ton’s the­ory is now only his­tory, but the ex­am­ple shows that it is not out of the ques­tion that there be such laws.

• Tim: The ini­tial wave­func­tion may be com­pletely de­ter­mined by the laws, and un­der MWI that’s all you need.

Roko: I ac­tu­ally don’t no­tice much du­pli­ca­tion be­tween OB and SL4. It’s definitely worth read­ing.

Will: Why don’t you take it se­ri­ously? You know the Man­delbrot set is gen­er­ated from very few bits, and I don’t imag­ine you ex­pect there to be a sim­ple deriva­tion of its struc­tures. Taken liter­ally, your re­quest is highly un­rea­son­able be­cause it in­volves so many lev­els of poorly un­der­stood pro­cesses—in­fla­tion, baryo­ge­n­e­sis, galaxy for­ma­tion… and the an­swer is prob­a­bly in­finite any­way. Still, I wouldn’t be sur­prised if a bet­ter physi­cist could pro­duce a rea­son­able first-prin­ci­ples es­ti­mate of the num­ber of baryons in the ob­serv­able uni­verse.

Jed: Or many-wor­lds.

• Tim, I already speci­fied I was talk­ing about the whole uni­verse through­out time and space (and in­fla­tion and de­co­her­ence). The laws prob­a­bly don’t have any ini­tial con­di­tions un­der that speci­fi­ca­tion, even if differ­ent things hap­pen in differ­ent places.

Henry, fol­low the link to the Man­delbrot set, then look up how the Man­delbrot set is com­puted (a sin­gle equa­tion).

• You need tostate and jus­tify your as­sump­tions, since it is pos­si­ble to get just about any an­swer fro 0 to in­finity to the ques­tion “how much in­for­ma­tion does the uni­verse con­tain”, de­pend­ing on choice of the­ory.

• As­so­ci­at­ing Kol­mogorov com­plex­ity with in­tel­li­gence causes brain dam­age: Ex­am­ple.

• You cite the same rhetoric by Dawk­ins that I do. Dawk­ins is one of those whom I am ar­gu­ing against. The is­sue is sim­ple enough not to re­quire an ap­peal to au­thor­ity to re­solve. Nat­u­ral se­lec­tion does not in­volve pre­dic­tion if it acts on a sys­tem which does not make pre­dic­tions. It does in­volve pre­dic­tion when it acts on a sys­tem which does make pre­dic­tions. If the se­lect­ing agent is in­tel­li­gent, and thinks about the fu­ture, you can­not fully model the re­sult­ing pro­cess with­out mod­el­ling that pre­dic­tive pro­cess. Tree leaves do not make pre­dic­tions—and I never claimed that they did, so that ex­am­ple is ir­rele­vant. It is not plants but or­ganisms with com­plex brains that make pre­dic­tions.

As for my re­jec­tion of Pop­per mean­ing that I sup­port Kuhn: philos­o­phy of sci­ence has moved on a bit since the 1960s.

• A halt­ing or­a­cle is usu­ally said to out­put 1s or 0s, not proofs or halt­ing times, right?

It’s easy to use such an or­a­cle to pro­duce proofs and halt­ing times. The fol­low­ing as­sumes that the or­a­cle out­puts 1 if the in­put TM halts, and 0 oth­er­wise.

For proofs: Write a pro­gram p which on in­puts x and i, enu­mer­ates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, oth­er­wise it loops for­ever. Now query the or­a­cle with (p,x,0), (p,x,1), …, and you get a proof for x if it has a proof.

Halt­ing times: Write a pro­gram p which on in­puts x and i, runs x for i steps. If x halts be­fore i steps, then it halts, oth­er­wise it loops for­ever. Now query the or­a­cle with (p,x,2), (p,x,4), (p,x,8), …, un­til you get an out­put of “1” and then use bi­nary search to get the ex­act halt­ing time.

I don’t re­call if I’ve men­tioned this be­fore, but Solomonoff in­duc­tion in the mix­ture form makes no men­tion of the truth of its mod­els. It just says that any com­putable prob­a­bil­ity dis­tri­bu­tion is in the mix­ture some­where, so you can do as well as any com­putable form of cog­ni­tive un­cer­tainty up to a con­stant.

Eliezer, if what you say is true, then it shouldn’t be pos­si­ble for any­one, us­ing just a Tur­ing ma­chine, to beat Solomonoff In­duc­tion at a pure pre­dic­tion game (by more than a con­stant), even if the in­put se­quence is un­com­putable. But here is a coun­terex­am­ple. Sup­pose the in­put se­quence con­sists of the unary en­cod­ings of Busy Beaver num­bers BB(1), BB(2), BB(3), …, that is, BB(1) num­ber of 1s fol­lowed by a zero, then BB(2) num­ber of 1s fol­lowed by a 0, and so on. Let’s ask the pre­dic­tor, af­ter see­ing n in­put sym­bols, what is the prob­a­bil­ity that it will even­tu­ally see a 0 again, and call this p(n). With Solomonoff In­duc­tion, p(n) will ap­proach ar­bi­trar­ily close to 0 as you in­crease n. A hu­man math­e­mat­i­cian on the other hand will rec­og­nize that the in­put se­quence may not be com­putable and won’t let p(n) fall be­low some non-zero bound.

• (the su­per­scripts didn’t show up: that was N^googol and 2^N)

• Scott, while I don’t mean to deflate in any sense the shat­ter­ing im­port of your wis­dom...

...there’s a pretty large seg­ment of the plane­tary pop­u­la­tion upon whom the dis­tinc­tion be­tween “polyno­mi­ally more com­put­ing power” and “ex­po­nen­tially more com­put­ing power” would be lost.

Th­ese un­en­light­ened fools, if they had com­put­ing power N, would be just as im­pressed by a su­per­in­tel­li­gence that had N^googol com­put­ing power as a su­per­in­tel­li­gence that had 2^N com­put­ing power. They’d just lump both of those to­gether as “the en­vi­ron­ment is smarter than me, and can prove things I can’t and then ex­hibit the proof”. Poor bas­tards.

• I would drop dead of shock

Eliezer, just as it was in­ter­est­ing to ask what prob­a­bil­ity es­ti­mate ‘Nuts!’ amounted to, I think it would be very use­ful for the fo­rum of Over­com­ing Bias to ask what your im­plicit prob­a­bil­ity es­ti­mate for a 500 state TM be­ing able to solve the halt­ing prob­lem for all TMs of up to 50 states.

I imag­ine that ‘I would drop dead of shock’ was in­tended to con­vey a prob­a­bil­ity of less than 1 in 10,000, or maybe 1 in 1,000,000?

• Richard: Boris’s step (1) claims that BBB(n) is bounded by n+c for some con­stant c. It is not ob­vi­ous to me ei­ther that this is true or that this is false. c is the size of a uni­ver­sal Tur­ing ma­chine which, in ad­di­tion to simu­lat­ing its ar­gu­ment, keeps track of how many steps it simu­lates. BB(n) is the max­i­mum num­ber of steps that any halt­ing TM of size n performs, there­fore there is some par­tic­u­lar TM of size n that performs BB(n) steps. Give that TM as in­put the the afore­men­tioned UTM, and it will com­pute BB(n). The un­com­putabil­ity of BB pre­vents you from find­ing the right TM, and from prov­ing it cor­rect if it’s handed to you by a magic light in the sky. But the TM still ex­ists even if you’ll never know which it is.

• If hu­man­ity un­folded into a fu­ture civ­i­liza­tion of in­finite space and in­finite time, cre­at­ing de­scen­dants and hy­per­de­scen­dants of un­limit­edly grow­ing size, what would be the largest Busy Beaver num­ber ever agreed upon?

Sup­pose they run a BB eval­u­a­tor for all of time. They would, in­deed, have no way at any point of be­ing cer­tain that the cur­rent cham­pion 100-bit pro­gram is the ac­tual cham­pion that pro­duces BB(100). How­ever, if they de­cide to an­throp­i­cally rea­son that “for any time t, I am prob­a­bly al­ive af­ter time t, even though I have no di­rect ev­i­dence one way or the other once t be­comes too large”, then they will be­lieve (with ar­bi­trar­ily high prob­a­bil­ity) that the cur­rent cham­pion pro­gram is the ac­tual cham­pion pro­gram, and an ar­bi­trar­ily high per­centage of them will be cor­rect in their be­lief.

• The “500 bits” only works if you take a hid­den vari­able or Bohmian po­si­tion on quan­tum me­chan­ics. If (as the cur­rent con­sen­sus would say) non-lin­ear dy­nam­ics can am­plify quan­tum noise then enor­mous amounts of new in­for­ma­tion are be­ing “pro­duced” lo­cally ev­ery­where all the time. The cur­rent state of the uni­verse in­cor­po­rates much or all of that in­for­ma­tion. (Some­one who un­der­stands the de­bates about black holes and the holo­graphic prin­ci­ple should chime in with more pre­cise anal­y­sis.)

I couldn’t fol­low the whole ar­gu­ment so I’m not sure how this af­fects it, but given that Eliezer keeps refer­ring to this claim I guess it is im­por­tant.

• ″...the Kol­mogorov com­plex­ity of a pat­tern is the size of the pro­gram code of the short­est Tur­ing ma­chine that pro­duces the pat­tern as an out­put, given un­limited tape as work­ing mem­ory.”

I am con­fused. If each pat­tern gets its own short­est pro­gram to de­scribe it, why can’t com­plex­ity num­bers be com­puted (wikipe­dia says they can’t be)?

I had thought that the rea­son they could not be was that each string has its own best pro­gram as a pre­fix that min­i­mizes the num­ber of bits needed to ex­press the pre­fix plus the string, and there was no neu­tral way to com­pare be­tween similarly com­plex strings since each might be shorter un­der its per­sonal pro­gram than the other un­der that same pro­gram.

Thanks for the site!

• why can’t com­plex­ity num­bers be computed

If you could com­pute them, you could use that com­pu­ta­tion (in a com­pli­cated way) to pro­duce strings that could only be pro­duced by much longer pro­grams. This is a con­tra­dic­tion, so it’s not pos­si­ble to com­pute com­plex­ity.

• And I guess you can’t just enu­mer­ate all pro­grams and take the length of the short­est (=first) one that pro­duces the string be­cause of the halt­ing prob­lem?

I won­der if I could turn that around and spin a com­pu­ta­tion of com­plex­ity into a halt­ing or­a­cle...

• So, any string can be made by some pro­grams, each pro­gram has a math­e­mat­i­cal com­plex­ity of K, we say each string has a K of the pro­gram with the small­est K that makes that string. If know­ing a string gives us a pre­cise value for K, know­ing a pre­cise value for K lets us gen­er­ate some strings that are more com­plex than K but no more com­plex than any other strings higher than K. Some strings are ran­dom, and have Ks that are strings more com­plex than the strings they de­scribe. For some string the short­est pro­gram that makes it will be “The short­est string with K greater than K1”, and “The short­est string with K greater than K1″ is a pro­gram with a low K. Yet that string was cho­sen by the pro­gram be­cause it had a higher K than “the short­est string with com­plex­ity greater than K” does, so the string thus delineated “re­ally” has a very small K, not a high one, so it must be un­s­e­lected, and thus un­s­e­lected it “re­ally” has a high K again, falsify­ing that pro­gram’s se­lec­tion of any­thing else.

The prob­lem is that pro­grams can be made out of self refer­enc­ing words that se­lect their com­po­nents and tar­gets based on what other pro­grams are do­ing.

Can’t we still calcu­late K in a math­e­mat­i­cal uni­verse for­bid­ding self refer­enc­ing sen­tences? In our world we are com­fortable say­ing K1 is greater than K2 if and only if we know it is or­ders of mag­ni­tude greater. What would be wrong with say­ing K1 is greater than K2 be­cause K1=100 and K2=10, where K is com­plex­ity in the math­e­mat­i­cal world with the ex­tra ax­iom?

• Can’t we still calcu­late K* in a math­e­mat­i­cal uni­verse for­bid­ding self refer­enc­ing sen­tences?

I’m not math­e­mat­i­cian enough to say yes, but I think so. I don’t know how well/​sim­ple a no-self-refer­ence ver­sion of some the­o­ries stacks up against the self-refer­enc­ing ver­sions (I am in the mind of lisp’s re­cur­sion fea­tures; with­out that abil­ity lisp pro­grams would be … longer and more com­pli­cated).

• Is this right:

For ev­ery string there is a pro­gram that gives it its small­est k, s+p=K. For ev­ery s there is a k big­ger than s but no big­ger than any other k, yet “For some sn there is a Kn big­ger than sn but no big­ger than any other K” < pn?

So com­put­ing K for ev­ery s means there is a para­dox for some s.

Can we make our lives easy by pick­ing a s, ac­cept­ing it is a para­dox, and mov­ing on to com­pute all other Ks? Or more in­tensely, by ran­domly pick­ing a rep­re­sen­ta­tive sam­ple of strings to be para­doxes, calcu­late each (non-para­dox) string’s K in each para­dox world, and then av­er­age out the Ks?

It seems like Eliezer only af­fir­ma­tively says one K is big­ger than an­other when they are of vastly differ­ent or­ders of mag­ni­tude.

If this is ob­vi­ously wrong, sorry but I have no math back­ground!

• Allow me to try again.

Any string can be made by some pro­grams, each pro­gram has a math­e­mat­i­cal com­plex­ity of K, we say each string has a K of the pro­gram with the small­est K that makes that string. If know­ing a string gives us a pre­cise value for K, know­ing a pre­cise value for K lets us gen­er­ate some strings that are more com­plex than K but no more com­plex than any other strings higher than K. Some strings are ran­dom, and have Ks that are strings more com­plex than the strings they de­scribe. For some string the short­est pro­gram that makes it will be “The short­est string with K greater than K1”, and “The short­est string with K greater than K1″ is a pro­gram with a low K. Yet that string was cho­sen by the pro­gram be­cause it had a higher K than “the short­est string with com­plex­ity greater than K” does, so the string thus delineated “re­ally” has a very small K, not a high one, so it must be un­s­e­lected, and thus un­s­e­lected it “re­ally” has a high K again, falsify­ing that pro­gram’s se­lec­tion of any­thing else.

The prob­lem is that pro­grams can be made out of self refer­enc­ing words that se­lect their com­po­nents and tar­gets based on what other pro­grams are do­ing.

Can’t we still calcu­late K in a math­e­mat­i­cal uni­verse for­bid­ding self refer­enc­ing sen­tences? In our world we are com­fortable say­ing K1 is greater than K2 if and only if we know it is or­ders of mag­ni­tude greater. What would be wrong with say­ing K1 is greater than K2 be­cause K1=100 and K2=10, where K is com­plex­ity in the math­e­mat­i­cal world with the ex­tra ax­iom?

• I hardly know where to be­gin—and I’m not sure I should bother, since we are not re­ally very on topic, and this has gone on long enough already. Very briefly:

1. No­body ever claimed that sex­ual se­lec­tion de­ter­mines which mu­ta­tions oc­cur. Sex­ual se­lec­tion pro­vides ex­am­ples of se­lec­tion by in­tel­li­gent agents af­fect­ing the course of evolu­tion;

2. Sex­ual se­lec­tion is not “hap­pen­stance”. Fe­males tend to pick traits that—among other things—re­flect male health, fit­ness, and re­sis­tance to par­a­site in­fes­ta­tions;

3. The cases where sex­ual se­lec­tion drives species ex­tinct does not prove that no pre­dic­tion is oc­cur­ring. Pre­dic­tions can fail. That doesn’t mean that they were not pre­dic­tions in the first place;

4. The Bald­win effect al­lows the fix­a­tion of char­ac­ter­is­tics which are re­peat­ably ac­quired in the genome. Traits may be re­peat­ably ac­quired as a re­sult of pre­dic­tion. For ex­am­ple an or­ganism may be­come able to pre­dict the on­set of win­ter—and so learn that bury­ing nuts is a good idea—and that could go on to in­fluence nu­clear evolu­tion;

5. Re­gard­ing progress—see my es­says Life’s Direc­tion and God’s Utility Func­tion.

• To me, this is all as silly as the Chi­nese Room. Naive in­tu­ition says that the Chi­nese Room doesn’t un­der­stand chi­nese in the way we do… be­cause it doesn’t, run­ning as it does on ex­po­nen­tially slower timescales. Similarly, I can close the box on my tur­ing ma­chine and it can, over mega-ex­po­nen­tial timescales and us­ing mega-ex­po­nen­tial amounts of tape, get just as much smarter as it could by look­ing around and talk­ing to peo­ple, but there are still sev­eral rea­sons why the lat­ter strat­egy is su­pe­rior.

Of course, I may just not un­der­stand what is sup­posed to be be­ing demon­strated (or re­futed) here. But if you’re ar­gu­ing that Godel doesn’t out­law sin­gu­lar­i­ties, then this is a mas­sive non-se­quitur. What you’re say­ing is that Godel doesn’t out­law sin­gu­lar­i­ties which are r-e-a-l-l-y s-l-o-w, which seems to me to vi­o­late any use­ful defi­ni­tion of sin­gu­lar­i­ties. (OK, sure, it shows the wall is breached, but don’t ex­pect it to con­vince any skep­tics.)

• The prob­lem is that there is no mechanism in the pro­cess of nat­u­ral se­lec­tion for stuffing that fore­sight gen­er­ated by brains back into the genome. Learn all you want but it isn’t passed onto you kids via your genes. That’s the rub.

There most cer­tainly are such mechanisms. Most ob­vi­ously sex­ual se­lec­tion and the Bald­win effect—but also via vanilla nat­u­ral se­lec­tion. I ex­plain this is­sue clearly enough in my es­say.

• Re: the philos­o­phy of sci­ence—you don’t seem to get it. Oh well, I tried.

Re: evolu­tion and nat­u­ral se­lec­tion—yes there is “the class of all things which in­volve in­her­i­tance, vari­a­tion and se­lec­tion”. How­ever, there is also the in­stance of evolu­tion via nat­u­ral se­lec­tion run­ning on our planet. You have con­sis­tently talked about the former. I have con­sis­tently talked about the lat­ter. I figure that once you stop at­tempt­ing to ap­ply my com­ments to the former no­tion—which they sim­ply do not ap­ply to—our dis­agree­ment here will evap­o­rate.

The point of em­pha­siz­ing that the pro­cess of evolu­tion that we wit­ness in­volves fore­sight is be­cause there is a wide­spread mi­s­un­der­stand­ing of evolu­tion which holds that the pro­cess that pro­duced hu­man be­ings is “blind to the fu­ture”—and can­not make pre­dic­tions. That idea in­ac­cu­rate—and that fact of­ten needs point­ing out. In­tel­li­gence and fore­sight are part of the evolu­tion­ary pro­cess on this planet—and they have been for many mil­lions of years.

• Since the dolphin has evolved flip­pers would you there­fore say that ânat­u­ral se­lec­tion op­er­ates on flip­pers?â

Yes—pro­vided we were talk­ing about a pro­cess that in­cluded dolphin evolu­tion. The pro­cess of nat­u­ral se­lec­tion in gen­eral does not nec­es­sar­ily in­volve pre­dic­tion—but from the be­gin­ning I have been dis­cussing nat­u­ral se­lec­tion in evolu­tion—which does in­volve pre­dic­tion.

Re­gard­ing Pop­per, I won’t say much—since I would be preach­ing to the choir on this blog. I recom­mend you start by go­ing and read­ing the bit be­gin­ning: “Pre­vi­ously, the most pop­u­lar philos­o­phy of sci­ence was prob­a­bly Karl Pop­per’s falsifi­ca­tion­ism—this is the old philos­o­phy that the Bayesian rev­olu­tion is cur­rently de­thron­ing.” on this page.

• Tim,

I think you are con­fus­ing an emer­gent prop­erty cre­ated by a sys­tem with how the sys­tem op­er­ates on its own. That ar­chi­tects draft blueprints that re­sult in pub­lic hous­ing pro­jects that leads to forced liv­ing re­la­tion­ships that makes it hard to evict drug deal­ers does­nât mean that the dis­ci­pline of ar­chi­tec­ture runs on drug deal­ing. Even if that drug deal­ing im­pacts the ar­chi­tects.

You seem to have writ­ten some simu­la­tions of nat­u­ral se­lec­tion. In writ­ing those al­gorithms did you have to code in the abil­ity to pre­dict? Of course not. Can nat­u­ral se­lec­tion op­er­ate with­out pre­dic­tion? Yes! Can nat­u­ral se­lec­tion gen­er­ate or­ganisms that have the abil­ity to pre­dict with­out pre­dic­tion be­ing built into nat­u­ral se­lec­tion? Of course!

âNat­u­ral se­lec­tion does not in­volve pre­dic­tion if it acts on a sys­tem which does not make pre­dic­tions. It does in­volve pre­dic­tion when it acts on a sys­tem which does make pre­dic­tions.â

Per­haps if I use a tech­nique of sub­sti­tu­tion for this you will quickly grasp the con­fu­sion here be­tween the pro­cess of nat­u­ral evolu­tion and the emer­gent prop­erty of âthe abil­ity to make pre­dic­tionsâ. You are mak­ing a cat­e­gory er­ror.

âNat­u­ral se­lec­tion does not in­volve flip­pers if it acts on a sys­tem which does not make flip­pers. It does in­volve flip­pers when it acts on a sys­tem which does make flip­pers.â

Since the dolphin has evolved flip­pers would you there­fore say that ânat­u­ral se­lec­tion op­er­ates on flip­pers?â That would be very mis­lead­ing. One could also go to your web page and sub­sti­tute pic­tures of flip­pers wher­ever youâve draw in lit­tle brains and it would still make as much sense to those who un­der­stand evolu­tion.

Of course, the ex­is­tence of brains and flip­pers in­fluence the di­rec­tion evolu­tion takes and so do as­ter­oids col­li­sions. That does­nât mean that nat­u­ral se­lec­tion op­er­ates on the ba­sis of as­ter­oid hits. As­teroids arenât the pri­mary cause for the emer­gent or­der of life. The pri­mary cause is also not pre­dic­tive.

If from poli­ti­cal events it was pre­dictable that nu­clear war was in­evitable and im­mi­nent then, do you ex­pect hu­mans to ex­pe­rience a sud­den in­crease in mu­ta­tion of alle­les in­creas­ing fit­ness to­wards nu­clear sur­viv­abil­ity? Would you ex­pect those alle­les to also be­come more fre­quent in the pop­u­la­tion? All this merely be­cause a brain some­where pre­dicted cor­rectly that a nu­clear event was about to hap­pen?

Think what you are say­ing here. How nat­u­ral se­lec­tion works is well known. It works on differ­en­tial sur­vival of ran­dom mu­ta­tions and on pre­dic­tion and proac­tive be­hav­ior. Nat­u­ral se­lec­tion it­self has no mechanism for mak­ing pre­dic­tions or act­ing proac­tively.

Dawk­ins is­nât say­ing that evolu­tion is­nât pow­er­ful. It is ex­tremely pow­er­ful. What heâs say­ing is that that power is not due a con­scious or un­con­scious abil­ity to pre­dict, or to act on pre­dic­tion.

As for my re­jec­tion of Pop­per mean­ing that I sup­port Kuhn: philos­o­phy of sci­ence has moved on a bit since the 1960s.

“The most gen­eral prob­lem con­fronting the Bayesian philos­o­phy is that sci­en­tists tend not to use prob­a­bil­ities when eval­u­at­ing their the­o­ries. In­stead, they tend to eval­u­ate them in terms of their em­piri­cal ad­e­quacy and their ex­plana­tory power. The prob­lem is that ex­plana­tory worth is not illu­mi­nated in terms of prob­a­bil­ities, so the Bayesian out­look can­not ex­plain this cen­tral fea­ture of mod­ern sci­ence.”

No kid­ding, and that’s a fatal flaw if you are claiming that sci­en­tists are choos­ing their the­o­ries pri­mar­ily based on prob­a­bil­ity.

“Charles Dar­win pro­duced large vol­umes of in­tel­li­gent and care­ful ob­ser­va­tions of an­i­mal habitat, form and be­hav­ior long be­fore he de­vel­oped his the­ory of species de­vel­op­ment by nat­u­ral se­lec­tion. It was no less sci­ence for that.”

How that re­futes Pop­per he doesn’t say. Pop­per ad­dressed such is­sues. Is this guy re­ally so ill in­formed as to think that Pop­per wasn’t aware that sci­en­tists col­lect data. Guess what, the re­li­gious do also, for ex­am­ple, lists of mir­a­cles.

“At best Pop­pe­rian ideas muddy the wa­ters and at worst they cor­rupt progress.”

I say “How so, and what a load of baloney.” Pop­per clar­ified a very im­por­tant is­sue in the de­mar­ca­tion of sci­ence from non-sci­ence. Col­lect­ing data is­nât one of the things that de­mark sci­ence from non-sci­ence. The only rea­son the writer of this ar­ti­cle would bring data col­lec­tion up is if he were to­tally ig­no­rant of Pop­pers the­o­ries.

One im­por­tant les­son that should be learned to âover­come bi­asâ is to un­der­stand the the­ory you are crit­i­ciz­ing be­fore you open your mouth.

“I have no­ticed that re­search coun­cils in­creas­ingly re­quire that re­search they sup­port be ‘hy­poth­e­sis driven’ ”

Oh, so it’s not so “1960s” is it?

Not sure how this is any kind of ob­sta­cle that would “muddy the wa­ters” or “cor­rupt progress”. Is that what itâs sup­posed to be an ex­am­ple of? Pop­per says that the fun­da­men­tal pro­cess is a se­ries of guesses and re­fu­ta­tions of those guess. Your hy­poth­e­sis can be any­thing in­clud­ing âI think that X is not ran­dom but caused by some­thing el­seâ. In which case you are free to go out and re­search any­thing you like.

This sounds more like a com­plaint that they can take other peo­ples money via taxes to pur­sue what­ever they feel like with out some sort of jus­tifi­ca­tion. Boo, hoo.

âThis is like com­mis­sion­ing a piece of fine fur­ni­ture on the ba­sis that it should be âchisel drivenâ.â

No, itâs like ex­pect­ing to get sci­ence, not art, when you are pay­ing for sci­ence, not art. If the au­thor does­nât want over­sight then per­haps you should raise his own funds pri­vately, or use his own money, in­stead of try­ing to di­vert tax money into his pet pro­ject.

I can see why some âscien­tistsâ are ob­ject­ing to Pop­per, itâs cut­ting into their abil­ity to pur­sue non-sci­ence on the pub­lic dole. Much of the anti-Pop­per back­lash has been in the area of the so­cial sci­ences where theyâd like to pur­sue things in a more post-mod­ernist way.

Not sure how the au­thor, us­ing his stan­dards, would ex­pect to re­ject a re­quest by the Catholic Church for sci­en­tific re­search funds from the gov­ern­ment in or­der to main­tain lists of saints and mir­a­cles. That is if he re­jects falsifi­ca­tion as an im­por­tant break­through in de­mark­ing sci­ence from non-sci­ence.

Oh, and I just now no­ticed this guy is from the “Depart­ment of Psy­chol­ogy”. How’s that for a pre­dic­tion.

• That does NOT mean that nat­u­ral se­lec­tion op­er­ates via a pre­dic­tive pro­cess. [...]

It most cer­tainly does—if we are talk­ing about nat­u­ral se­lec­tion in evolu­tion and na­ture.

Ex­am­ples of failure to pre­dict do not prove evolu­tion does not pre­dict—just that it does not pre­dict perfectly.

The pro­cess of nat­u­ral se­lec­tion is “Trial and er­ror” which is quite liter­ally analo­gous to Pop­pers the­ory of falsifi­ca­tion.

Heh. Not even sci­ence works via Pop­per’s the­ory of falsifi­ca­tion.

it is not at all clear the in­tel­li­gent agent op­er­ates pri­mar­ily us­ing “in­duc­tion” either

A straw man AFAICT—no­body said “pri­mar­ily”—and yes, or­ganisms’ be­lief that the sun will come up to­mor­row shows that they are perform­ing in­duc­tion.

• Nice try Tyler. What in­di­vi­d­u­als “do” does not define what nat­u­ral se­lec­tion does.

One could also say: “In prac­tice, nat­u­ral se­lec­tion pro­duces in­tel­li­gent agents, who can pre­dict, and then they make se­lec­tive choices that af­fect who lives, who dies and who re­pro­duces.”

That does NOT mean that nat­u­ral se­lec­tion op­er­ates via a pre­dic­tive pro­cess. Ask any good biol­o­gist and they will tell you than nat­u­ral se­lec­tion is not pre­dic­tive. That’s why species go ex­tinct all the time.

Nat­u­ral se­lec­tion is a non-in­duc­tive pro­ces that can pro­duce in­di­vi­d­ual or­ganisms that can “do in­duc­tion”. The pro­cess of nat­u­ral se­lec­tion is “Trial and er­ror” which is quite liter­ally analo­gous to Pop­pers the­ory of falsifi­ca­tion.

BTW, it is not at all clear the in­tel­li­gent agent op­er­ates pri­mar­ily us­ing “in­duc­tion” ei­ther. In­duc­tion is pri­mar­ily use­ful in learn­ing but that’s only part of in­tel­li­gence. Fur­ther­more, even in the sub-func­tion of learn­ing it is not at all clear that in­duc­tion is a pri­mary al­gorithm. Clearly your brain needs to first model the world in or­der to clas­sify ob­ser­va­tions in or­der to at­tribute them to any­thing. The in­duc­tion it­self is not pri­mary. The model build­ing ca­pa­bil­ities them­selves are the product of a non-in­duc­tive pro­cess.

Ac­tu­ally, see­ing as how hu­mans are so in­cred­ibly bad at Bayesian in­duc­tion it’s a won­der any­one be­lieves that we use in­duc­tion at all. One would think that if our pri­mary sys­tems work on Baysian in­duc­tion we’d be able to some­how tap into that.

Try ex­plain­ing to your­self how you do in­duc­tion and you will see that even you don’t do it in ar­eas that you think you do. Do you re­ally be­lieve the sun comes up to­mor­row be­cause of in­duc­tion? … or do you have a men­tal model of how the sun op­er­ates that you con­tengently be­lieve in? When you learn some new as­pect about the sun do you try to de­vise a new men­tal model or do you just change the odds it op­er­ates one way or an­other. My brain cer­tainly doesn’t op­er­ate on odds.

• In prac­tice, nat­u­ral se­lec­tion pro­duces in­tel­li­gent agents, who can do in­duc­tion, and then they make se­lec­tive choices that af­fect who lives, who dies and who re­pro­duces. What that means is that any idea that evolu­tion (or nat­u­ral se­lec­tion) in the real world does not in­volve in­duc­tion is not cor­rect.

• “pro­duced by in­duc­tive nat­u­ral se­lec­tion;”

Nat­u­ral se­lec­tion is not an in­duc­tive pro­cess.

• Since our whole uni­verse may be defined by some­thing as fru­gal as hun­dreds of bits of com­plex­ity then the fact that there’s an up­per bound on speed is an in­ter­est­ing thing since it makes lo­cal­ity a very tan­gible thing, and not just a ques­tion of adding some bits of in­for­ma­tion to define that where in the uni­verse that “lo­cal­ity” is.

“If you want to print out just Earth by it­self, then it’s not enough to spec­ify the laws of physics. You also have to point to just Earth within the uni­verse. You have to nar­row it down some­how. And, in a big uni­verse, that’s go­ing to take a lot of in­for­ma­tion.”

• Yes. To put it an­other way: it takes much more than 500 bits of in­for­ma­tion to pre­dict a life­time’s ex­pe­riences. You have to no where you are in the uni­verse/​mul­ti­verse. The coun­ter­in­tu­itive fact is that a set can con­tain much less in­for­ma­tion (down to 0 bits) [*] than one of it sub­sets..and what we see is always a sub­set.

[*]”A skep­tic wor­ries about all the in­for­ma­tion nec­es­sary to spec­ify all those un­seen wor­lds. But an en­tire en­sem­ble is of­ten much sim­pler than one of its mem­bers. This prin­ci­ple can be stated more for­mally us­ing the no­tion of al­gorith­mic in­for­ma­tion con­tent. The al­gorith­mic in­for­ma­tion con­tent in a num­ber is, roughly speak­ing, the length of the short­est com­puter pro­gram that will pro­duce that num­ber as out­put. For ex­am­ple, con­sider the set of all in­te­gers. Which is sim­pler, the whole set or just one num­ber? Naively, you might think that a sin­gle num­ber is sim­pler, but the en­tire set can be gen­er­ated by quite a triv­ial com­puter pro­gram, whereas a sin­gle num­ber can be hugely long. There­fore, the whole set is ac­tu­ally sim­pler. Similarly, the set of all solu­tions to Ein­stein’s field equa­tions is sim­pler than a spe­cific solu­tion. The former is de­scribed by a few equa­tions, whereas the lat­ter re­quires the speci­fi­ca­tion of vast amounts of ini­tial data on some hy­per­sur­face. The les­son is that com­plex­ity in­creases when we re­strict our at­ten­tion to one par­tic­u­lar el­e­ment in an en­sem­ble, thereby los­ing the sym­me­try and sim­plic­ity that were in­her­ent in the to­tal­ity of all the el­e­ments taken to­gether. In this sense, the higher-level mul­ti­verses are sim­pler. Go­ing from our uni­verse to the Level I mul­ti­verse elimi­nates the need to spec­ify ini­tial con­di­tions, up­grad­ing to Level II elimi­nates the need to spec­ify phys­i­cal con­stants, and the Level IV mul­ti­verse elimi­nates the need to spec­ify any­thing at all.”—Tegmark

• Rolf, I was im­plic­itly as­sum­ing that even know­ing BB(k), it still takes O(k) bits to learn BB(k+1). But if this as­sump­tion is in­cor­rect, then I need to change the setup of my pre­dic­tion game so that the in­put se­quence con­sists of the unary en­cod­ings of BB(1), BB(2), BB(4), BB(8), â¦, in­stead. This should­nât af­fect my over­all point, I think.

• @Wei: p(n) will ap­proach ar­bi­trar­ily close to 0 as you in­crease n.

This doesn’t seem right. A se­quence that re­quires knowl­edge of BB(k), has O(2^-k) prob­a­bil­ity ac­cord­ing to our Solomonoff In­duc­tor. If the in­duc­tor com­pares a BB(k)-based model with a BB(k+1)-based model, then BB(k+1) will on av­er­age be about half as prob­a­ble as BB(k).

In other words, P(a par­tic­u­lar model of K-com­plex­ity k is cor­rect) goes to 0 as k goes to in­finity, but the con­di­tional prob­a­bil­ity, P(a par­tic­u­lar model of K-com­plex­ity k is cor­rect | a sub-model of that par­tic­u­lar model with K-com­plex­ity k-1 is cor­rect), does not go to 0 as k goes to in­finity.

• POSTSCRIPT—Strike that stray does in the penul­ti­mate sen­tence. And re com­pat­i­bil­ism, the short ver­sion is that no, you don’t have free will, but you shouldn’t feel bad about it.

• The ini­tial wave­func­tion may be com­pletely de­ter­mined by the laws, and un­der MWI that’s all you need.

Bundling the ini­tial state into the laws of physics seems like a con­fused way of look­ing at things to me. The state and the laws are differ­ent things—if you mud­dle them to­gether con­cep­tu­ally, your ac­count­ing is likely to get equally mud­dled. Also, most of the ra­tio­nale for the size of the “laws of physics” be­ing small evap­o­rates. The ini­tial state might be enor­mous, for all any­one knows. The laws of physics could be in­finite too—but there the hope that they are small and finite seems slightly more rea­son­able.

• Beau­tiful post, Eli. Se­ri­ously, one of the bet­ter—and more use­ful, at least ped­a­gog­i­cally—ones. Should be re­quired read­ing for, well, just about ev­ery­one.

• Let me ex­pand my point. As­sume you run your hy­per civil­i­sa­tion for 1 googol years, you ask it, “Do the an­gles in a three straight sided shape add up to 180?”

This can be an­swered yes or no, de­pend­ing upon what ax­iom scheme you adopt, carte­sian, hy­per­bolic or some­thing more ob­scure. Hav­ing a sys­tem that has thought of all math­e­mat­i­cal truths doesn’t get you a Pow­er­ful sys­tem, un­less it knows which ap­ply to solv­ing the prob­lems you ask.

• Good point, but when the box says “doesn’t halt”, how do I know it’s cor­rect?

• Kol­mogorov com­plex­ity of uni­verse is more than just a de­scrip­tion of its phys­i­cal laws.

First of all, be­sides the phys­i­cal laws, K should in­clude the ini­tial state of the uni­verse. It may be large or even in­finite. And it should be prop­erly ex­pressed in qubits.

Se­cond, you’d need to pick the “pre­sent state”, what­ever that means, our of all pos­si­ble fu­ture and past states. In a clas­si­cal uni­verse it may be only a 100bits or so, in a quan­tum mul­ti­verse, it’s at least 1 bit per ev­ery branch.

Third, it is not at all ob­vi­ous that the laws of uni­verse are com­putable. Maybe it uses an or­a­cle (the mul­ti­verse branch­ing pro­cess can provide such an or­a­cle) or maybe it uses real val­ues in a non-triv­ial way. if this is true then we should ei­ther say that K is in­finite, or use K rel­a­tive to an or­a­cle.

Fourth, K is defined only up to an ad­di­tive con­stant, so it’s not re­ally cor­rect to talk about The Kol­mogorov Com­plex­ity. You can only talk about Kol­mogorov com­plex­ity rel­a­tive to a fixed Tur­ing ma­chine. For differ­ent turn­ing ma­chines the differ­ence be­tween the Ks will be roughly equal to the num­ber of bits in a trans­la­tor be­tween their ma­chine lan­guages. In the ex­treme case when the refer­ence Tur­ing ma­chine IS the uni­verse then K(uni­verse) is just 0.

• @pk I don’t un­der­stand. Am I too dumb or is this gib­ber­ish?

It’s not so com­pli­cated; it’s just that we’re so for­mal...

• I thought you had some rea­son for giv­ing us ex­tra de­tail, why the num­ber 500. Yeah it could be short as 500 bits, but it could equally be very long.

I will like shorter ex­pla­na­tion for the uni­verse if I am pre­sented with one com­pared to a longer one, but there is no rea­son to ex­pect one of a cer­tain length. Since we don’t have any that ac­tu­ally man­age to do so, why at­tach any num­ber to it?

• @Nick Tar­leton:

Thanks, Nick. I can see I’m go­ing to have to spend an ex­cit­ing evening or 10 alone with the SL4 archives...

Per­haps one could con­sider Eliezer’s OB posts as an “ed­ited and san­i­tized sum­mary of the morass of SL4”?

• You’re right about what you said about the in­stants. How­ever, we don’t know how big the ini­tial con­di­tions were—and I’m not clear on what it would mean for them not to ex­ist. Maybe you are talk­ing about eter­nal re­turn? Even then there’s a speci­fi­ca­tion of which path you are on.

• Can you give an ex­am­ple of the kind of ar­gu­ment that you are try­ing to re­fute here?

The ar­gu­ment:

“a (real-world) agent such as a hu­man or AI can­not self im­prove be­cause the K-com­plex­ity of a closed sys­tem is con­stant”

seems ob­vi­ously silly. A hu­man or AI is not a closed sys­tem, it re­ceives sen­sory in­puts, and no-one would ar­gue that self-im­prove­ment has to hap­pen with­out any sen­sory in­puts.

Aside: I would cau­tion, echo­ing Eliezer, that the­o­ret­i­cal re­sults about what is com­putable and what isn’t, and about Tur­ing ma­chines with in­finite mem­ory and how many 1′s they can out­put be­fore they halt are hard to ap­ply to the real world with­out mak­ing sub­tle er­rors caus­ing you to falsely pump your in­tu­ition. The real world prob­lems that we face rely upon finite ma­chines op­er­at­ing in finite time.

• No, sex­ual se­lec­tion does not de­ter­mine which mu­ta­tions oc­cur. It’s merely a re­in­forc­ing feed­back loop that is ac­tu­ally con­sid­ered an ex­am­ple of how evolu­tion can run off the rails, so to speak.

Sure fe­males might by ac­ci­dent hap­pen to pick a fea­ture in males that might prove adap­tive. Un­for­tu­nately for your ar­gu­ment it is not based on pre­dic­tion, but is hap­pen­stance. Even were the “cor­rect” fea­ture choosen ini­tially there is then the ten­dency of sex­ual se­lec­tion to over se­lect for the fea­ture merely be­cause other fe­males find the fea­ture at­trac­tive.

So it might be that slightly longer horns might be more adap­tive in fight­ing off preda­tors. How­ever once fe­males start mat­ing with males based on horn lenght for the sake of horn length then they just keep get­ting longer to the point of detri­ment to the sur­vival of the males. This is quite ob­vi­ously a very bad ex­am­ple of pre­dic­tion. Again, it’s all in ret­ro­spect, and if no mu­ta­tions hap­pen to oc­cur in the di­rec­tion of longer horns then no mat­ter how much fe­males want longer horns it isn’t hap­pen­ing.

Fur­ther­more, sex­ual se­lec­tion op­er­ates in re­verse on the fe­males and that’s why it also gets out of hand. Mu­ta­tions that hap­pen to drive fe­males brains to de­sire longer horns even more will tend to make them more likely bear sons that are at­trac­tive to other fe­males. No pre­dic­tion here, just a run away pro­cess that ends up be­ing limited by the abil­ity of males to suffer un­der their over­size horns.

No­tice that there is no mechan­ims here for the fe­male prefer­ences to in­vent new mu­ta­tions for ei­ther longer horns or in­creased prefer­ence for longer horns. If a fe­male hap­pens to have an ex­tra heavy fetish for long horns that was en­vi­ron­men­tally driven that can­not and will not cause any mu­ta­tion that she could pass on to her offspring to make them have the same level of pas­sion for long horns.

It’s the genes that build the fe­male brain to pre­fer long horns in the first place, and not some in­duc­tive pro­cess in the brain that gen­er­ated the prefer­ence. By defi­ni­tion there must be pre­ex­ist­ing gene or the trait wouldn’t be her­i­ta­ble and by defi­ni­tion sex­ual se­lec­tion could NOT oc­cur.

The Balwin effect is merely the be­lieve that so­cially passed be­hav­iors can lead to fix­a­tion of ge­netic traits. Perfectly pos­si­ble and again it has noth­ing to do with pre­dic­tion. Ge­netic fix­a­tion could only oc­cur given the ran­dom cre­ation of the cor­rect mu­ta­tions, plus a static en­vi­ron­ment with re­gard to the trait in ques­tion. This re­ally is no differ­ent than ge­net­i­cially me­di­ated be­hav­ioral changes driv­ing changes in other traits and be­hav­ior.

Once plants took to grow­ing on land by minor adap­ta­tions to say dry­ing then se­lec­tion pres­sures on other traits change. Traits like tall stems to over­shadow com­peti­tors on land are se­lected for. That’s not pre­dic­tive. The new se­lec­tive pres­sures are merely the re­sult of the change in habitat. The adap­ta­tion to dry­ing didn’t “pre­dict” that longer stems would be needed, nor did it gen­er­ate the mu­ta­tions for longer stems.

Like­wise a be­hav­ioral change of say a fish hunt­ing on land like the mud skip­per will nat­u­rally lead to new se­lec­tive pres­sures on other traits like, abil­ity to with­stand sun, or dry­ing, or what­ever. That doesn’t mean that the fish be­hav­io­ri­ally de­cid­ing to hunt fur­ther ashore in any way pre­dicted the need for the other traits, nor does it mean that it’s brain cre­ated new mu­ta­tions that were stuffed back into the genome. It’s perfectly pos­si­ble that the ran­dom pro­cess of mu­ta­tions just never pro­duces the proper mu­ta­tions to al­low that mud skip­per to fully adapt to the land.

The mu­ta­tions are the ran­dom guesses as to what might work and are en­tirely un­in­ten­tional. In fact if you’ve read Dawk­ins there is se­lec­tive pres­sure against mu­ta­tion. Those ran­dom mis­takes how­ever al­low nat­u­ral se­lec­tion to ex­plore hap­haz­ardly through differ­ent body plans and some­times things go in the right di­rec­tion, and some­times not.

Even if fe­males liked males with big­ger brains as ev­i­denced by say singing. That doesn’t necce­sar­ily mean that males spend­ing lots of time singing, and fe­males listen­ing to that singing is pre­dic­tive of any­thing. Big brains are just one more trait and it might be that the se­lec­tive pres­sures in the en­vi­ron­ment are ac­tu­ally chang­ing in a way that is se­lect­ing for smaller brains just as sex­ual se­lec­tion is op­er­at­ing in the op­po­site di­rec­tion. Rich food sources needed to sup­prot big brains might be de­creas­ing over time as the habitat be­comes more arid. In which case ex­tinc­tion is a likely out­come.

Which is an­other les­son I think you need to learn from biol­o­gists. You seem to be­lieve in some kind of in­her­ent “progress” to all this. That’s not the case. It’s perfectly pos­si­ble for or­ganisms to be sub­jected to se­lec­tive pres­sures that move them to what most peo­ple would see as re­gres­sion. That’s why there are so many “back­wards” par­a­sites from what are more “ad­vanced” lineages. Often in an­i­mals that have brains that pre­dict.

Many a species with a pre­dic­tive brain has walked the path down spe­cial­iza­tion to ex­tinc­tion.

• Tim,

The prob­lem is that there is no mechanism in the pro­cess of nat­u­ral se­lec­tion for stuffing that fore­sight gen­er­ated by brains back into the genome. Learn all you want but it isn’t passed onto you kids via your genes. That’s the rub. That’s why nat­u­ral se­lec­tion is blind to the fu­ture. The idea that nat­u­ral se­lec­tion is blind is perfectly ac­cu­rate.

That’s also true for the small minor­ity of or­ganisms on the planet that ac­tu­ally have brains that pre­dict the fu­ture. So I am talk­ing about the in­stance of nat­u­ral se­lec­tion op­er­at­ing on this planet. Your idea is not a valid no­tion even when re­stricted to down to the minor­ity in­stance of hu­man evolu­tion. I gave you that spe­cific ex­am­ple of hu­mans in the last com­ment.

We do not pre­dict the fu­ture via nat­u­ral se­lec­tion, and our pre­dic­tions about the fu­ture do not be­come trans­lated into a form which can be trans­mit­ted via the genome for fu­ture se­lec­tion. Nat­u­ral se­lec­tion can only se­lect with hind­sight. It can only se­lect af­ter a pre­dic­tion works and only on the genes that pro­duced the pre­dic­tor and not the pre­dic­tion.

BTW, the leaf fall that trees perform is a kind of pre­dic­tion. It wasn’t gen­er­ated by in­duc­tion, or fore­sight. It was gen­er­ated by sim­ple prun­ing, falsifi­ca­tion of bad hy­poth­e­sis (mu­ta­tions) on whether and when to drop leaves. Nat­u­ral se­lec­tion pro­duced an or­ganism that does pre­dic­tion with­out do­ing any pre­dic­tion on it’s own.

• “Since the dolphin has evolved flip­pers would you there­fore say that ânat­u­ral se­lec­tion op­er­ates on flip­pers?”

“Yes—pro­vided we were talk­ing about a pro­cess that in­cluded dolphin evolu­tion.”

I’m flab­ber­gasted by this re­sponse. There is noth­ing in­her­ent about nat­u­ral se­lec­tion that is go­ing to give you flip­pers. That’s de­pen­dent on the en­vi­ron­ment, the mu­ta­tions, and other fac­tors. The pro­cess it­self has no “flip­pers”. It’s a pro­cess that works fine with­out flip­pers, yet you in­sist that nat­u­ral se­lec­tion “op­er­ates on flip­pers” just be­cause of a quite con­tengent pos­si­ble out­come of that pro­cess.

Mean­while even where flip­pers evolve they can also go ex­tinct. The nat­u­ral se­lec­tion con­tinues, was in­fluenced by flip­per ex­is­tence at one point, but now is no longer in­fluenced by flip­per ex­is­tence be­cause all such an­i­mals are ex­tinct. Nat­u­ral se­lec­tion was there all along, seems to op­er­ate just fine with or with­out flip­pers and yet you want to say that it “op­er­ates on flip­pers”. Seems us­ing your rather strange logic one would have to say that it “Doesn’t op­er­ate on flip­pers” when there are none around.

Do you fur­ther claim that nat­u­ral se­lec­tion “op­er­ates on flip­pers” in the case of for ex­am­ple, chimps, just be­cause in par­allel there are dophins in the sea? How re­mote or close do these things have to be. If there is an alien on an­other planet with wingdings does that mean that one can say about Nat­u­ral Selec­tion while stand­ing on Earth, “Nat­u­ral Selec­tion op­er­ates on wingdings?”

You see, to me, the term “op­er­ates on flip­pers” means that the flip­per is an in­te­gral and manda­tory re­quire­ment for the thing to work. Not some­thing un­necce­sary and con­tengent. Other­wise, the list is end­less and mean­ingless, and the defi­ni­tion pointless to say­ing ex­actly what Nat­u­ral Selec­tion is or isn’t.

Boy, I can just imag­ine try­ing to teach a class of kids Nat­u­ral Selec­tion us­ing your con­cept of “op­er­ates on” as a ba­sis. This is so far out there I’m not even go­ing to as­sume Yud­kowski meant this in his origi­nal quote “nat­u­ral se­lec­tion op­er­ates on in­duc­tion”. I’m sure he’d re­ject this in­ter­pre­ta­tion also. It would make his state­ment as profound as say­ing “nat­u­ral se­lec­tion op­er­ates on ba­nana peels.” and as silly.

• +1 to Ana­toly Vorobey. Us­ing K-com­plex­ity to cap­ture the hu­man no­tion of com­plex­ity seems to be even worse than us­ing game-the­o­retic ra­tio­nal­ity to cap­ture hu­man ra­tio­nal­ity—some­thing that’s been at­tacked to death already.

• I don’t un­der­stand. Am I too dumb or is this gib­ber­ish?

• It might be worth­while to note that co­gent cri­tiques of the propo­si­tion that a ma­chine in­tel­li­gence might very sud­denly “be­come a sin­gle­ton Power” do not deny the in­effi­ca­cies of the hu­man cog­ni­tive ar­chi­tec­ture offer­ing im­prove­ment via re­cur­sive in­tro­spec­tion and re­cod­ing, nor do they deny the im­prove­ments eas­ily available via hard­ware sub­sti­tu­tion and ex­pan­sion of more ca­pa­ble hard­ware and I/​O.

The do, how­ever, high­light the dis­tinc­tion be­tween a vastly pow­er­ful ma­chine madly ex­plor­ing vast reaches of a much vaster “up-ar­row” space of math­e­mat­i­cal com­plex­ity, and a ma­chine of the same power bounded in growth of in­tel­li­gence—by defi­ni­tion nec­es­sar­ily rele­vant—due to star­va­tion for rele­vant nov­elty in its en­vi­ron­ment of in­ter­ac­tion.

If, Feyn­man-like, we imag­ine the pre­sent state of knowl­edge about our world in terms of a dis­tri­bu­tion of ver­ti­cal do­mains, like silos, some broader with rele­vance to many di­verse facets of real-world in­ter­ac­tion, some thin and tow­er­ing into the haze of lead­ing-edge math­e­mat­i­cal re­al­ity, then we can imag­ine the pow­er­ful ma­chine quickly iden­ti­fy­ing and mak­ing a mul­ti­tude of la­tent con­nec­tions and meta-con­nec­tions, filling in the space be­tween the silos and even some­what above—but to what ex­tent, given the in­evitable diminish­ing re­turns among the la­tent, and the re­sult­ing star­va­tion for the novel?

Given such bound­ed­ness, spec­u­la­tion is redi­rected to growth in ecolog­i­cal terms, and the Red Queen’s Race con­tinues ever faster.

• what ex­actly do you mean by the phrase “the ‘com­plex­ity’ of that en­tire uni­verse… would be 500 bits.”

He means 500 bits plus log2 the size of the ini­tial con­di­tions, plus log2 the num­ber of in­stants elapsed since then—which is the num­ber of bits you would need to spec­ify a cur­rent mul­ti­verse state un­der the speci­fied as­sump­tions.

• @ Wei. Thanks for the re­sponse. I will look at the refs but haven’t yet done so. I’m skep­ti­cal about whether they’ll change my mind on the sub­ject, but I’ll take a look.

It seems to me that the be­lief in pure de­ter­minism is ax­io­matic (I’m not even sure what it means to be a Bayesian in a purely de­ter­minis­tic sys­tem!), so most of this dis­cus­sion ap­pears to be pure con­jec­ture. I’m not sure that it has any mean­ingful ap­pli­ca­tion.

• 4 Nov 2008 0:31 UTC
−2 points

I don’t think there is a The­ory of Every­thing for the uni­verse. Just as that there isn’t one for Math­e­mat­ics.

What if the uni­verse is not con­tin­u­ous, the simu­la­tion does lazy eval­u­a­tion, and the laws of na­ture can only be de­scribed by an in­finite num­ber of bit? How would this change your rea­son­ing?

• Show us sim­ple laws which pre­dict how many galax­ies worth of atoms there would be in a uni­verse (why not one?), and I will take 500 bits for gen­er­at­ing the uni­verse some­what se­ri­ously...

• Tim,

Bayes? To para­phrase you, the philos­o­phy of sci­ence has moved on a bit since the 1700s.

I’ve already read Yud­kowsky’s ar­ti­cle. I’m familar with Bayes since high school which is thirty years ago. I was one of those kids who found it very easy to grasp. It’s just a sim­ple math­e­mat­i­cal fact and cer­tainly noth­ing to build a philos­o­phy of sci­ence around.

Yud­kowsky writes:

“What is the so-called Bayesian Revolu­tion now sweep­ing through the sci­ences, which claims to sub­sume even the ex­per­i­men­tal method it­self as a spe­cial case?”

Really? So the “ex­per­i­men­tal method” which by which I as­sume he means the “sci­en­tific method” boils down to a spe­cial case of what, a prob­a­bil­ity calcu­la­tion?

Come on, how gullible do you think I am? So un­der this view sci­en­tists have been, all along, calcu­lat­ing prob­a­bil­ity fre­quen­cies based on ob­ser­va­tions and just pick­ing the the­ory that had the high­est prob­a­bil­ity. Heck that sounds easy. Guess I’ll just write an sci­en­tist simu­lat­ing AI pro­gram over the week­end with that. We’ll just re­place all the real sci­en­tists with Bayesian de­ci­sion mod­els.

So, ac­cord­ing to this view, if I go back to and do some his­tor­i­cal in­ves­ti­ga­tion on The Great Devo­nian Con­tro­versy I should find that the sci­en­tists in­volved were calcu­lat­ing and com­par­ing notes on the prob­a­bil­ities they were get­ting on all the Bayesian calcu­la­tions they were do­ing. Right?

This, be­sides be­ing lu­dicrous, is just not how peo­ple rea­son, as ad­mit­ted by Yud­kowsky in the very same ar­ti­cle:

“Bayesian rea­son­ing is very coun­ter­in­tu­itive. Peo­ple do not em­ploy Bayesian rea­son­ing in­tu­itively, find it very difficult to learn Bayesian rea­son­ing when tu­tored, and rapidly for­get Bayesian meth­ods once the tu­tor­ing is over. This holds equally true for novice stu­dents and highly trained pro­fes­sion­als in a field. Bayesian rea­son­ing is ap­par­ently one of those things which, like quan­tum me­chan­ics or the Wa­son Selec­tion Test, is in­her­ently difficult for hu­mans to grasp with our built-in men­tal fac­ul­ties.”

As my other com­ment pointed out in a quote from your own ar­ti­cle it’s pretty clear that sci­en­tists do not choose to be­lieve or not to be­lieve based on calcu­lat­ing Bayes prob­a­bil­ities. They do no use Bayes for good rea­sons. Often they don’t have any un­der­ly­ing prob­a­bil­ities and have un­known dis­tri­bu­tions, but fur­ther­more most prob­lems don’t re­duce to a Bayesian prob­a­bil­ity.

It’s hard to imag­ine that Dar­win did a ex­plicit Bayesian calcu­la­tion in or­der to choose his the­ory over La­mar­ck­ism, let alone come up with the the­ory in the first place. It’s even harder to imag­ine that he did it im­plic­itly in his mind when it is quite clear that a) “Peo­ple do not em­ploy Bayesian rea­son­ing in­tu­itively”
b) Bayes the­o­rem only ap­plies in spe­cial cases where you have known prob­a­bil­ities, and dis­tri­bu­tions.

In the Devo­nian Con­tro­versy no prob­a­bil­ities or dis­tri­bu­tions were known, no prob­a­bil­ity calcu­la­tions done, and fi­nally the win­ning hy­poth­e­sis was NOT picked on the ba­sis of great­est prob­a­bil­ity. There was no great­est prob­a­bil­ity and there was no win­ner. All com­pet­ing hy­pothe­ses were re­jected.

If in­tel­li­gence and un­der­stand­ing of the nat­u­ral world were just a mat­ter of ap­ply­ing Bayesian logic don’t you think that not just hu­man brains, but brains in gen­eral would likely be se­lected for that already? We should be good at it, not bad.

The hu­man brain has evolved lots of sub­units that are good as solv­ing lots of differ­ent kinds of prob­lems. Heck, even crows seem to be able to count. We seem to be able to clas­sify and model things as con­sis­tent or in­con­sis­tent, pe­ri­odic, dan­ger­ous, slow, fast, level, slanted, near, far, etc. Th­ese are men­tal mod­els or mod­ules that are prob­a­bly pre­fabri­cated. Yet we hu­mans don’t seem to have any pre­fabri­cated Bayesian de­duc­tion unit built in (by the way Bayes in­duc­tion is based on de­duc­tion not in­duc­tion, funny that). It’s ac­tu­ally the other way round from what he wrote. Bayesian In­duc­tion is sub­sumed by a sci­en­tific method char­ac­ter­ized by Pop­pe­rian Falsifi­ca­tion (ac­tu­ally pan-crit­i­cal ra­tio­nal­ism).

Don’t be con­fused by the name. Pop­pe­rian falsifi­ca­tion is not merely about falsifi­ca­tion any more than the The­ory of Nat­u­ral Selec­tion is merely about “sur­vival of the fittest”. Pop­pe­rian falsifi­ca­tion is about hold­ing be­liefs ten­ta­tively and test­ing them. Thus one might hold as a be­lief that Bayesian In­duc­tion works. Although you will find that this in­duc­tion is more about de­duc­ing from some as­sumed base prob­a­bil­ities. Prob­a­bil­ities that are founded on ten­ta­tively be­liev­ing you did your mea­sure­ments cor­rectly. So on and so forth.

In his ex­am­ple there is plenty of room for falsifi­ca­tion:

“1% of women at age forty who par­ti­ci­pate in rou­tine screen­ing have breast can­cer. 80% of women with breast can­cer will get pos­i­tive mam­mo­gra­phies. 9.6% of women with­out breast can­cer will also get pos­i­tive mam­mo­gra­phies. A woman in this age group had a pos­i­tive mam­mog­ra­phy in a rou­tine screen­ing. What is the prob­a­bil­ity that she ac­tu­ally has breast can­cer?”

The above can be ad­dressed from many Pop­pe­rian an­gles. Are our as­sump­tions cor­rect? How did we come up with that num­ber 1%. That could be falsified by point­ing out that our sam­ple was bi­ased. Per­haps the women were ly­ing about their ages. What mea­sures were taken to ver­ify their ages? Where did the 80% claim come from. Is that from a par­tic­u­lar de­mo­graphic that matches the woman we are talk­ing about? So on and so forth.

It’s not just a mat­ter of plug­ging num­bers into a Bayesian equa­tion and the an­swer falls out. Yeah, you can use Bayesian logic to cre­ate a de­ci­sion model. That doesn’t mean you should be­lieve or act on the re­sult. It doesn’t mean it’s the pri­mary method be­ing used.

• “It most cer­tainly does—if we are talk­ing about nat­u­ral se­lec­tion in evolu­tion and na­ture.”

Un­for­tu­nately, that has things back­wards. Dawk­ins is ex­actly right when he says.

“Nat­u­ral se­lec­tion, the blind, un­con­scious, au­to­matic pro­cess which Dar­win dis­cov­ered, and which we now know is the ex­pla­na­tion for the ex­is­tence and ap­par­ently pur­pose­ful form of all life, has no pur­pose in mind. It has no mind and no mind’s eye. It does not plan for the fu­ture. It has no vi­sion, no fore­sight, no sight at all. If it can be said to play the role of watch­maker in na­ture, it is the blind watch­maker.”

He’s cer­tainly an ex­pert on the sub­ject. As are Gould and the vast ma­jor­ity of evolu­tion­ary biol­o­gists. In fact, this is the first I’ve heard of this idea and you are link­ing to your­self as an au­thor­ity. I chalk this up to your mi­s­un­der­stand­ings.

“or­ganisms’ be­lief that the sun will come up to­mor­row ”

Do or­ganism’s “be­lieve” the sun will come up to­mor­row? I’m not even sure I be­lieve it in the in­duc­tion­ist sense. Nor did I even con­sider it as a be­lief as a child. It wasn’t even some­thing I con­sid­ered. I cer­tainly op­er­ated as if the sun would come up to­mor­row but that doesn’t mean I ar­rived at my be­hav­ior via in­duc­tivist be­lief.

Let’s con­sider broadleaf trees drop­ping their leaves in the fall. Do they “be­lieve” that win­ter will come? Did they ar­rive at that be­lief by “in­duc­tion”. Not if you un­der­stand evolu­tion. There is ab­solutely no re­quire­ment for any kind of in­duc­tion in or­der to get an or­ganism, a plant, that drops it’s leaves at the on­set of win­ter. It’s not a pre­dic­tion but more analo­gous to the copy­ing of an suc­cess­ful ac­ci­dent.

Every thing has a na­ture and will tend to be­have ac­cord­ing to that na­ture. That doesn’t mean that the be­hav­ior be­ing fol­lowed is nec­es­sar­ily held on the ba­sis of in­duc­tion. Every morn­ing when I get out of bed I be­have as if the floor will be there and has not mag­i­cally trans­formed into a hoard of were­wolves. That does not mean I de­cided by in­duc­tion that the floor wasn’t go­ing to turn into were­wolves. In fact, the pos­si­bil­ity need not even cross my mind or any mind.

When trees drop their leaves in au­tumn they are be­hav­ing post­dic­tively, not pre­dic­tively. That post­dic­tive be­hav­ior is not about clas­si­cal in­duc­tion, gen­er­at­ing uni­ver­sals from ob­ser­va­tions.

Nat­u­ral se­lec­tion op­er­ated for a very long time be­fore there were the kinds of brains that make pre­dic­tions. Nat­u­ral se­lec­tion op­er­ates just fine with­out brains, with­out pre­dic­tion and with­out in­duc­tion.

Even in the case of ar­tifi­cial se­lec­tion hu­mans, in the past, have been highly re­strained by what mu­ta­tions na­ture doles out. I’ve bred an­i­mals and you can’t just get to where you want to go. It would be great to breed a guppy that sur­vives out­doors in the win­ter. My brain pre­dicts that would be a best sel­ler. How­ever, it isn’t hap­pen­ing via stan­dard meth­ods of breed­ing.

Now per­haps the brains will some day use their pre­dic­tive ca­pa­bil­ity to cre­ate some­thing called ge­netic en­g­ineer­ing and per­haps that will gen­er­ate cold hardy gup­pys. How­ever to la­bel that “nat­u­ral se­lec­tion” is to mi­s­un­der­stand the defi­ni­tions in­volved.

“Not even sci­ence works via Pop­per’s the­ory of falsifi­ca­tion.”
Ac­tu­ally, it does. This is prob­a­bly an­other area you don’t fully com­pre­hend. Kuhn was falsified long ago.

“A straw man AFAICT—no­body said “pri­mar­ily”—and yes, or­ganisms’ be­lief that the sun will come up to­mor­row shows that they are perform­ing in­duc­tion.”

When you say that some­thing “op­er­ates by in­duc­tion” the word pri­mar­ily is im­plicit. I was only mak­ing it ex­plicit. Ex­pos­ing the bias. Don’t we want to over­come that?

Now I must stop com­ment­ing lest I transgress the limits. Your con­fu­sion re­quires much more ver­biage than this but such is not al­lowed here. This policy tend­ing to main­tain the bias here in the di­rec­tion of the blog own­ers.

• Eliezer and Nick: I don’t think “Kol­mogorov com­plex­ity” and “Man­delbrot set” ac­tu­ally an­swers what I was try­ing to un­der­stand.

Both of those con­cepts seem com­pletely apt for de­scribing perfectly de­ter­minis­tic sys­tems. But, in de­scribing the “com­plex­ity” of the uni­verse even in some­thing as sim­ple as the “pat­tern of stars that ex­ists” one would still have to take into ac­count po­ten­tial non-de­ter­minis­tic fac­tors such as hu­man be­hav­ior. Un­less you are a strict de­ter­minist, you have to al­low for the po­ten­tial for a suffi­ciently ad­vanced hu­man (or non-hu­man) in­tel­li­gence to erad­i­cate a par­tic­u­lar star, or even a galaxy, or to pro­duce an ar­tifi­cial black hole that al­ters the pat­tern of stars.

How are these fac­tors ac­counted for in a “500 bit” code? Or, are you say­ing that you are a strict de­ter­minist?

• “Then the “Kol­mogorov com­plex­ity” of that en­tire uni­verse—through­out all of space and all of time, from the Big Bang to what­ever end, and all the life forms that ever evolved on Earth and all the de­co­her­ent branches of Earth and all the life-bear­ing planets any­where, and all the in­tel­li­gences that ever de­vised galac­tic civ­i­liza­tions, and all the art and all the tech­nol­ogy and ev­ery ma­chine ever built by those civ­i­liza­tions...would be 500 bits.”

I must be miss­ing some­thing… what ex­actly do you mean by the phrase “the ‘com­plex­ity’ of that en­tire uni­verse… would be 500 bits.”

I would think the “com­plex­ity of the uni­verse” would in­clude the lo­ca­tion of ev­ery bit of mat­ter at all times, among other things. So, maybe I just don’t un­der­stand what you mean by the “com­plex­ity of” some­thing.

Also, I’m not sure whether you be­lieve that all hu­man be­hav­ior is purely de­ter­minis­tic in the most ex­treme sense. Or, do you think peo­ple can ac­tu­ally make their own de­ci­sions, i.e., “free will”? Ob­vi­ously, if some­one can turn ei­ther left or right at any given time, then that de­ci­sion is still not ex­plained (in any pre­dic­tive sense) by the laws of physics.

Wouldn’t the 500 bits have to in­clude un­der­stand­ing all psy­cholog­i­cal/​eco­nomic be­hav­ior?