An Intuitive Explanation of Solomonoff Induction

This is the com­pleted ar­ti­cle that Luke wrote the first half of. My thanks go to the fol­low­ing for read­ing, edit­ing, and com­ment­ing; Luke Muehlhauser, Louie Helm, Ben­jamin Noble, and Fran­celle Wax.

Peo­ple dis­agree about things. Some say that tele­vi­sion makes you dumber; other say it makes you smarter. Some sci­en­tists be­lieve life must ex­ist el­se­where in the uni­verse; oth­ers be­lieve it must not. Some say that com­pli­cated fi­nan­cial deriva­tives are es­sen­tial to a mod­ern com­pet­i­tive econ­omy; oth­ers think a na­tion’s econ­omy will do bet­ter with­out them. It’s hard to know what is true.

And it’s hard to know how to figure out what is true. Some ar­gue that you should as­sume the things you are most cer­tain about and then de­duce all other be­liefs from your origi­nal be­liefs. Others think you should ac­cept at face value the most in­tu­itive ex­pla­na­tions of per­sonal ex­pe­rience. Still oth­ers think you should gen­er­ally agree with the sci­en­tific con­sen­sus un­til it is dis­proved.

Wouldn’t it be nice if de­ter­min­ing what is true was like bak­ing a cake? What if there was a recipe for find­ing out what is true? All you’d have to do is fol­low the writ­ten di­rec­tions ex­actly, and af­ter the last in­struc­tion you’d in­evitably find your­self with some sweet, tasty truth!

In this tu­to­rial, we’ll ex­plain the clos­est thing we’ve found so far to a recipe for find­ing truth: Solomonoff in­duc­tion.

There are some qual­ifi­ca­tions to make. To de­scribe just one: roughly speak­ing, you don’t have time to fol­low the recipe. To find the truth to even a sim­ple ques­tion us­ing this recipe would re­quire you to fol­low one step af­ter an­other un­til long af­ter the heat death of the uni­verse, and you can’t do that.

But we can find short­cuts. Sup­pose you know that the ex­act recipe for bak­ing a cake asks you to count out one molecule of H2O at a time un­til you have ex­actly 0.5 cups of wa­ter. If you did that, you might not finish the cake be­fore the heat death of the uni­verse. But you could ap­prox­i­mate that part of the recipe by mea­sur­ing out some­thing very close to 0.5 cups of wa­ter, and you’d prob­a­bly still end up with a pretty good cake.

Similarly, once we know the ex­act recipe for find­ing truth, we can try to ap­prox­i­mate it in a way that al­lows us to finish all the steps some­time be­fore the sun burns out.

This tu­to­rial ex­plains that best-we’ve-got-so-far recipe for find­ing truth, Solomonoff in­duc­tion. Don’t worry, we won’t be us­ing any equa­tions, just qual­i­ta­tive de­scrip­tions.

Like Eliezer Yud­kowsky’s In­tu­itive Ex­pla­na­tion of Bayes’ The­o­rem and Luke Muehlhauser’s Crash Course in the Neu­ro­science of Hu­man Mo­ti­va­tion, this tu­to­rial is long. You may not have time to read it; that’s fine. But if you do read it, we recom­mend that you read it in sec­tions.

Con­tents:

Background

1. Al­gorithms — We’re look­ing for an al­gorithm to de­ter­mine truth.

2. In­duc­tion — By “de­ter­mine truth”, we mean in­duc­tion.

3. Oc­cam’s Ra­zor — How we judge be­tween many in­duc­tive hy­pothe­ses.

4. Prob­a­bil­ity — Prob­a­bil­ity is what we usu­ally use in in­duc­tion.

5. The Prob­lem of Pri­ors — Prob­a­bil­ities change with ev­i­dence, but where do they start?

The Solution

6. Bi­nary Se­quences — Every­thing can be en­coded as bi­nary.

7. All Al­gorithms — Hy­pothe­ses are al­gorithms. Tur­ing ma­chines de­scribe these.

8. Solomonoff’s Lightsaber — Put­ting it all to­gether.

9. For­mal­ized Science — From in­tu­ition to pre­ci­sion.

10. Ap­prox­i­ma­tions — On­go­ing work to­wards prac­ti­cal­ity.

11. Un­re­solved De­tails — Prob­lems, philo­soph­i­cal and math­e­mat­i­cal.

Algorithms

At an early age you learned a set of pre­cisely-defined steps — a ‘recipe’ or, more for­mally, an al­gorithm — that you could use to find the largest num­ber in a list of num­bers like this:

21, 18, 4, 19, 55, 12, 30

The al­gorithm you learned prob­a­bly looked some­thing like this:

  1. Look at the first item. Note that it is the largest you’ve seen on this list so far. If this is the only item on the list, out­put it as the largest num­ber on the list. Other­wise, pro­ceed to step 2.

  2. Look at the next item. If it is larger than the largest item noted so far, note it as the largest you’ve seen in this list so far. Pro­ceed to step 3.

  3. If you have not reached the end of the list, re­turn to step 2. Other­wise, out­put the last noted item as the largest num­ber in the list.

Other al­gorithms could be used to solve the same prob­lem. For ex­am­ple, you could work your way from right to left in­stead of from left to right. But the point is that if you fol­low this al­gorithm ex­actly, and you have enough time to com­plete the task, you can’t fail to solve the prob­lem. You can’t get con­fused about what one of the steps means or what the next step is. Every in­struc­tion tells you ex­actly what to do next, all the way through to the an­swer.

You prob­a­bly learned other al­gorithms, too, like how to find the great­est com­mon di­vi­sor of any two in­te­gers (see image on right).

But not just any set of in­struc­tions is a pre­cisely-defined al­gorithm. Some­times, in­struc­tions are un­clear or in­com­plete. Con­sider the fol­low­ing in­struc­tions based on an ar­ti­cle about the sci­en­tific method:

  1. Make an ob­ser­va­tion.

  2. Form a hy­poth­e­sis that ex­plains the ob­ser­va­tion.

  3. Con­duct an ex­per­i­ment that will test the hy­poth­e­sis.

  4. If the ex­per­i­men­tal re­sults dis­con­firm the hy­poth­e­sis, re­turn to step #2 and form a hy­poth­e­sis not yet used. If the ex­per­i­men­tal re­sults con­firm the hy­poth­e­sis, pro­vi­sion­ally ac­cept the hy­poth­e­sis.

This is not an al­gorithm.

First, many of the terms are not clearly defined. What counts as an ob­ser­va­tion? What counts as a hy­poth­e­sis? What would a hy­poth­e­sis need to be like in or­der to ‘ex­plain’ the ob­ser­va­tion? What counts as an ex­per­i­ment that will ‘test’ the hy­poth­e­sis? What does it mean for ex­per­i­men­tal re­sults to ‘con­firm’ or ‘dis­con­firm’ a hy­poth­e­sis?

Se­cond, the in­struc­tions may be in­com­plete. What do we do if we reach step 4 and the ex­per­i­men­tal re­sults nei­ther ‘con­firm’ nor ‘dis­con­firm’ the hy­poth­e­sis un­der con­sid­er­a­tion, but in­stead are in some sense ‘neu­tral’ to­ward the hy­poth­e­sis? Th­ese in­struc­tions don’t tell us what to do in that case.

An al­gorithm is a well-defined pro­ce­dure that takes some value or val­ues as in­put and, af­ter a finite se­ries of steps, gen­er­ates some value or val­ues as out­put.

For ex­am­ple, the ‘find the largest num­ber’ al­gorithm above could take the in­put {21, 18, 4, 19, 55, 12, 30} and would, af­ter 13 steps, pro­duce the fol­low­ing out­put: {55}. Or it could take the in­put {34} and, af­ter 1 step, pro­duce the out­put: {34}.

An al­gorithm is so well writ­ten, that we can con­struct ma­chines that fol­low them. To­day, the ma­chines that fol­low al­gorithms are mostly com­put­ers. This is why all com­puter sci­ence stu­dents take a class in al­gorithms. If we con­struct our al­gorithm for truth, then we can make a com­puter pro­gram that finds truth—an Ar­tifi­cial In­tel­li­gence.

Induction

Let’s clar­ify what we mean. In movies, sci­en­tists will re­veal “truth ma­chines”. In­put a state­ment, and the truth ma­chine will tell you whether it is true or false. This is not what Solomonoff in­duc­tion does. In­stead, Solomonoff in­duc­tion is our ul­ti­mate “in­duc­tion ma­chine”.

Whether we are a de­tec­tive try­ing to catch a thief, a sci­en­tist try­ing to dis­cover a new phys­i­cal law, or a busi­ness­man at­tempt­ing to un­der­stand a re­cent change in de­mand, we are all in the pro­cess of col­lect­ing in­for­ma­tion and try­ing to in­fer the un­der­ly­ing causes.

-Shane Legg

The prob­lem of in­duc­tion is this: We have a set of ob­ser­va­tions (or data), and we want to find the un­der­ly­ing causes of those ob­ser­va­tions. That is, we want to find hy­pothe­ses that ex­plain our data. We’d like to know which hy­poth­e­sis is cor­rect, so we can use that knowl­edge to pre­dict fu­ture events. Our al­gorithm for truth will not listen to ques­tions and an­swer yes or no. Our al­gorithm will take in data (ob­ser­va­tions) and out­put the rule by which the data was cre­ated. That is, it will give us the ex­pla­na­tion of the ob­ser­va­tions; the causes.

Sup­pose your data con­cern a large set of stock mar­ket changes and other events in the world. You’d like to know the pro­cesses re­spon­si­ble for the stock mar­ket price changes, be­cause then you can pre­dict what the stock mar­ket will do in the fu­ture, and make some money.

Or, sup­pose you are a par­ent. You come home from work to find a chair propped against the re­friger­a­tor, with the cookie jar atop the fridge a bit emp­tier than be­fore. You like cook­ies, and you don’t want them to dis­ap­pear, so you start think­ing. One hy­poth­e­sis that leaps to mind is that your young daugh­ter used the chair to reach the cook­ies. How­ever, many other hy­pothe­ses ex­plain the data. Per­haps a very short thief broke into your home and stole some cook­ies. Per­haps your daugh­ter put the chair in front of the fridge be­cause the fridge door is bro­ken and no longer stays shut, and you for­got that your friend ate a few cook­ies when he vis­ited last night. Per­haps you moved the chair and ate the cook­ies your­self while sleep­walk­ing the night be­fore.

All these hy­pothe­ses are pos­si­ble, but in­tu­itively it seems like some hy­pothe­ses are more likely than oth­ers. If you’ve seen your daugh­ter ac­cess the cook­ies this way be­fore but have never been bur­gled, then the ‘daugh­ter hy­poth­e­sis’ seems more plau­si­ble. If some ex­pen­sive things from your bed­room and liv­ing room are miss­ing and there is hate­ful graf­fiti on your door at the eye level of a very short per­son, then the ‘short thief’ hy­poth­e­sis be­comes more plau­si­ble than be­fore. If you sud­denly re­mem­ber that your friend ate a few cook­ies and broke the fridge door last night, the ‘bro­ken fridge door’ hy­poth­e­sis gains cred­i­bil­ity. If you’ve never been bur­gled and your daugh­ter is out of town and you have a habit of mov­ing and eat­ing things while sleep­walk­ing, the ‘sleep­walk­ing’ hy­poth­e­sis be­comes less bizarre.

So the weight you give to each hy­poth­e­sis de­pends greatly on your prior knowl­edge. But what if you had just been hit on the head and lost all past mem­o­ries, and for some rea­son the most ur­gent thing you wanted to do was to solve the mys­tery of the chair and cook­ies? Then how would you weigh the like­li­hood of the available hy­pothe­ses?

When you have very lit­tle data but want to com­pare hy­pothe­ses any­way, Oc­cam’s Ra­zor comes to the res­cue.

Oc­cam’s Razor

Con­sider a differ­ent in­duc­tive prob­lem. A com­puter pro­gram out­puts the fol­low­ing se­quence of num­bers:

1, 3, 5, 7

Which num­ber comes next? If you guess cor­rectly, you’ll win $500.

In or­der to pre­dict the next num­ber in the se­quence, you make a hy­poth­e­sis about the pro­cess the com­puter is us­ing to gen­er­ate these num­bers. One ob­vi­ous hy­poth­e­sis is that it is sim­ply list­ing all the odd num­bers in as­cend­ing or­der from 1. If that’s true, you should guess that “9” will be the next num­ber.

But per­haps the com­puter is us­ing a differ­ent al­gorithm to gen­er­ate the num­bers. Sup­pose that n is the step in the se­quence, so that n=1 when it gen­er­ated ‘1’, n=2 when it gen­er­ated ‘3’, and so on. Maybe the com­puter used this equa­tion to calcu­late each num­ber in the se­quence:

2n − 1 + (n − 1)(n − 2)(n − 3)(n − 4)

If so, the next num­ber in the se­quence will be 33. (Go ahead, check the calcu­la­tions.)

But doesn’t the first hy­poth­e­sis seem more likely?

The prin­ci­ple be­hind this in­tu­ition, which goes back to William of Oc­cam, could be stated:

Among all hy­pothe­ses con­sis­tent with the ob­ser­va­tions, the sim­plest is the most likely.

The prin­ci­ple is called Oc­cam’s ra­zor be­cause it ‘shaves away’ un­nec­es­sary as­sump­tions.

For ex­am­ple, think about the case of the miss­ing cook­ies again. In most cases, the ‘daugh­ter’ hy­poth­e­sis seems to make fewer un­nec­es­sary as­sump­tions than the ‘short thief’ hy­poth­e­sis does. You already know you have a daugh­ter that likes cook­ies and knows how to move chairs to reach cook­ies. But in or­der for the short thief hy­poth­e­sis to be plau­si­ble, you have to as­sume that (1) a thief found a way to break in, that (2) the thief wanted in­ex­pen­sive cook­ies from your home, that (3) the thief was, un­usu­ally, too short to reach the top of the fridge with­out the help of a chair, and (4) many other un­nec­es­sary as­sump­tions.

Oc­cam’s ra­zor sounds right, but can it be made more pre­cise, and can it be jus­tified? How do we find all con­sis­tent hy­pothe­ses, and how do we judge their sim­plic­ity? We will re­turn to those ques­tions later. Be­fore then, we’ll de­scribe the area of math­e­mat­ics that usu­ally deals with rea­son­ing: prob­a­bil­ity.

Probability

You’re a sol­dier in com­bat, crouch­ing in a trench. You know for sure there is just one en­emy sol­dier left on the bat­tlefield, about 400 yards away. You also know that if the re­main­ing en­emy is a reg­u­lar army troop, there’s only a small chance he could hit you with one shot from that dis­tance. But if the re­main­ing en­emy is a sniper, then there’s a very good chance he can hit you with one shot from that dis­tance. But snipers are rare, so it’s prob­a­bly just a reg­u­lar army troop.

You peek your head out of the trench, try­ing to get a bet­ter look.

Bam! A bul­let glances off your helmet and you duck down again.

“Okay,” you think. “I know snipers are rare, but that guy just hit me with a bul­let from 400 yards away. I sup­pose it might still be a reg­u­lar army troop, but there’s a se­ri­ously good chance it’s a sniper, since he hit me from that far away.”

After an­other minute, you dare to take an­other look, and peek your head out of the trench again.

Bam! Another bul­let glances off your helmet! You duck down again.

“Whoa,” you think. “It’s definitely a sniper. No mat­ter how rare snipers are, there’s no way that guy just hit me twice in a row from that dis­tance if he’s a reg­u­lar army troop. He’s gotta be a sniper. I’d bet­ter call for sup­port.”

This is an ex­am­ple of rea­son­ing un­der un­cer­tainty, of up­dat­ing un­cer­tain be­liefs in re­sponse to ev­i­dence. We do it all the time.

You start with some prior be­liefs, and all of them are un­cer­tain. You are 99.99% cer­tain the Earth re­volves around the sun, 90% con­fi­dent your best friend will at­tend your birth­day party, and 40% sure that the song you’re listen­ing to on the ra­dio was played by The Tur­tles.

Then, you en­counter new ev­i­dence—new ob­ser­va­tions—and you up­date your be­liefs in re­sponse.

Sup­pose you start out 85% con­fi­dent that the one re­main­ing en­emy sol­dier is not a sniper. That leaves only 15% cre­dence to the hy­poth­e­sis that he is a sniper. But then, a bul­let glances off your helmet — an event far more likely if the en­emy sol­dier is a sniper than if he is not. So now you’re only 40% con­fi­dent he’s a non-sniper, and 60% con­fi­dent he is a sniper. Another bul­let glances off your helmet, and you up­date again. Now you’re only 2% con­fi­dent he’s a non-sniper, and 98% con­fi­dent he is a sniper.

Prob­a­bil­ity the­ory is the math­e­mat­ics of rea­son­ing with un­cer­tainty. The key­stone of this sub­ject is called Bayes’ The­o­rem. It tells you how likely some­thing is given some other knowl­edge. Un­der­stand­ing this sim­ple the­o­rem is more use­ful and im­por­tant for most peo­ple than Solomonoff in­duc­tion. If you haven’t learned it already, you may want to read ei­ther tu­to­rial #1, tu­to­rial #2, tu­to­rial #3, or tu­to­rial #4 on Bayes’ The­o­rem. The ex­act math of Bayes’ The­o­rem is not re­quired for this tu­to­rial. We’ll just de­scribe its re­sults qual­i­ta­tively.

Bayes’ The­o­rem can tell us how likely a hy­poth­e­sis is, given ev­i­dence (or data, or ob­ser­va­tions). This is helpful be­cause we want to know which model of the world is cor­rect so that we can suc­cess­fully pre­dict the fu­ture. It calcu­lates this prob­a­bil­ity based on the prior prob­a­bil­ity of the hy­poth­e­sis alone, the prob­a­bil­ity of the ev­i­dence alone, and the prob­a­bil­ity of the ev­i­dence given the hy­poth­e­sis. Now we just plug the num­bers in.

Of course, it’s not easy to “just plug the num­bers in.” You aren’t an all-know­ing god. You don’t know ex­actly how likely it is that the en­emy sol­dier would hit your helmet if he’s a sniper, com­pared to how likely that is if he’s not a sniper. But you can do your best. With enough ev­i­dence, it will be­come over­whelm­ingly clear which hy­poth­e­sis is cor­rect.

But guesses are not well-suited to an ex­act al­gorithm, and so our quest to find an al­gorithm for truth-find­ing must con­tinue. For now, we turn to the prob­lem of choos­ing pri­ors.

The Prob­lem of Priors

In the ex­am­ple above where you’re a sol­dier in com­bat, I gave you your start­ing prob­a­bil­ities: 85% con­fi­dence that the en­emy sol­dier was a sniper, and 15% con­fi­dence he was not. But what if you don’t know your “pri­ors”? What then?

Most situ­a­tions in real life are com­plex, so that your “pri­ors” (as used in Bayes’ The­o­rem) are ac­tu­ally prob­a­bil­ities that have been up­dated sev­eral times with past ev­i­dence. You had an idea that snipers were rare be­cause you saw many sol­diers, but only a few of them were snipers. Or you read a re­li­able re­port say­ing that snipers were rare. But what would our ideal rea­son­ing com­puter do be­fore it knew any­thing? What would the prob­a­bil­ities be set to be­fore we turned it on? How can we de­ter­mine the prob­a­bil­ity of a hy­poth­e­sis be­fore see­ing any data?

The gen­eral an­swer is Oc­cam’s ra­zor; sim­pler hy­pothe­ses are more likely. But this isn’t rigor­ous. It’s usu­ally difficult to find a mea­sure of com­plex­ity, even for math­e­mat­i­cal hy­pothe­ses. Is a nor­mal curve sim­pler than an ex­po­nen­tial curve? Bayesian prob­a­bil­ity the­ory doesn’t have any­thing to say about choos­ing pri­ors. Thus, many stan­dard “prior dis­tri­bu­tions” have been de­vel­oped. Gen­er­ally, they dis­tribute prob­a­bil­ity equally across hy­pothe­ses. Of course this is a good ap­proach if all the hy­pothe­ses are equally likely. But as we saw above, it seems that some hy­pothe­ses are more com­plex than oth­ers, and this makes them less likely than the other hy­pothe­ses. So when dis­tribut­ing your prob­a­bil­ity across sev­eral hy­pothe­ses, you shouldn’t nec­es­sar­ily dis­tribute it evenly. There’s also a grow­ing body of work around an idea called the Max­i­mum En­tropy Prin­ci­ple. This prin­ci­ple helps you choose a prior that makes the least as­sump­tions given the con­straints of the prob­lem. But this prin­ci­ple can’t be used to han­dle all pos­si­ble types of hy­pothe­ses, only ones for which “en­tropy” can be math­e­mat­i­cally eval­u­ated.

We need a method that ev­ery­one can agree pro­vides the cor­rect pri­ors in all situ­a­tions. This helps us perform in­duc­tion cor­rectly. It also helps ev­ery­one be more hon­est. Since pri­ors partly de­ter­mine what peo­ple be­lieve, they can some­times choose pri­ors that help “prove” what they want to prove. This can hap­pen in­ten­tion­ally or un­in­ten­tion­ally. It can also hap­pen in for­mal situ­a­tions, such as an aca­demic pa­per defend­ing a pro­posed pro­gram.

To solve the prob­lem of pri­ors once and for all, we’d like to have an ac­cept­able, uni­ver­sal prior dis­tri­bu­tion, so that there’s no vague­ness in the pro­cess of in­duc­tion. We need a recipe, an al­gorithm, for se­lect­ing our pri­ors. For that, we turn to the sub­ject of bi­nary se­quences.

Bi­nary Sequences

At this point, we have col­lected a lot of back­ground ma­te­rial. We know about al­gorithms, and we know we need an al­gorithm that does in­duc­tion. We know that in­duc­tion also uses Oc­cam’s ra­zor and prob­a­bil­ity. We know that one of the prob­lems in prob­a­bil­ity is se­lect­ing pri­ors. Now we’re ready to for­mal­ize it.

To start, we need a lan­guage in which we can ex­press all prob­lems, all data, all hy­pothe­ses. Bi­nary is the name for rep­re­sent­ing in­for­ma­tion us­ing only the char­ac­ters ‘0’ and ‘1’. In a sense, bi­nary is the sim­plest pos­si­ble alpha­bet. A two-char­ac­ter alpha­bet is the small­est alpha­bet that can com­mu­ni­cate a differ­ence. If we had an alpha­bet of just one char­ac­ter, our “sen­tences” would be uniform. With two, we can be­gin to en­code in­for­ma­tion. Each 0 or 1 in a bi­nary se­quence (e. g. 01001011) can be con­sid­ered the an­swer to a yes-or-no ques­tion.

In the above ex­am­ple about sort­ing num­bers, it’s easy to con­vert it to bi­nary just by writ­ing a com­puter pro­gram to fol­low the al­gorithm. All pro­gram­ming lan­guages are based on bi­nary. This also ap­plies to any­thing you’ve ever ex­pe­rienced us­ing a com­puter. From the great­est movie you’ve ever seen to emo­tional in­stant mes­sag­ing con­ver­sa­tions, all of it was en­coded in bi­nary.

In prin­ci­ple, all in­for­ma­tion can be rep­re­sented in bi­nary se­quences. If this seems like an ex­treme claim, con­sider the fol­low­ing...

All your ex­pe­riences, whether from your eyes, ears, other senses or even mem­o­ries and mus­cle move­ments, oc­cur be­tween neu­rons (your nerve cells and brain cells). And it was dis­cov­ered that neu­rons com­mu­ni­cate us­ing a digi­tal sig­nal called the ac­tion po­ten­tial. Be­cause it is digi­tal, a neu­ron ei­ther sends the sig­nal, or it doesn’t. There is no half-send­ing the ac­tion po­ten­tial. This can be trans­lated di­rectly into bi­nary. An ac­tion po­ten­tial is a 1, no ac­tion po­ten­tial is a 0. All your sen­sa­tions, thoughts, and ac­tions can be en­coded as a bi­nary se­quence over time. A re­ally long se­quence.

Or, if neu­ron com­mu­ni­ca­tion turns out to be more com­pli­cated than that, we can look to a deeper level. All events in the uni­verse fol­low the laws of physics. We’re not quite done dis­cov­er­ing the true laws of physics; there are some in­con­sis­ten­cies and un­ex­plained phe­nom­ena. But the cur­rently pro­posed laws are in­cred­ibly ac­cu­rate. And they can be rep­re­sented as a sin­gle bi­nary se­quence.

You might be think­ing, “But I see and do mul­ti­ple things si­mul­ta­neously, and in the uni­verse there are trillions of stars all burn­ing at the same time. How can par­allel events turn into a sin­gle se­quence of bi­nary?”

This is a perfectly rea­son­able ques­tion. It turns out that, at least for­mally, this poses no prob­lem at all. The ma­chin­ery we will use to deal with bi­nary se­quences can turn mul­ti­ple se­quences into one just by dove­tailing them to­gether and ad­just­ing how it pro­cesses them so the re­sults are the same. Be­cause it is eas­iest to deal with a sin­gle se­quence, we do this in the for­mal recipe. Any good im­ple­men­ta­tion of Solomonoff in­duc­tion will use mul­ti­ple se­quences just to be faster.

A pic­ture of your daugh­ter can be rep­re­sented as a se­quence of ones and ze­ros. But a pic­ture is not your daugh­ter. A video of all your daugh­ter’s ac­tions can also be rep­re­sented as a se­quence of ones and ze­ros. But a video isn’t your daugh­ter, ei­ther; we can’t nec­es­sar­ily tell if she’s think­ing about cook­ies, or po­etry. The po­si­tion of all the sub­atomic par­ti­cles that make up your daugh­ter as she lives her en­tire life can be rep­re­sented as a se­quence of bi­nary. And that re­ally is your daugh­ter.

Hav­ing a com­mon and sim­ple lan­guage can some­times be the key to progress. The an­cient Greek math­e­mat­i­cian Archimedes dis­cov­ered many spe­cific re­sults of calcu­lus, but could not gen­er­al­ize the meth­ods be­cause he did not have the lan­guage of calcu­lus. After this lan­guage was de­vel­oped in the late 1600s, hun­dreds of math­e­mat­i­ci­ans were able to pro­duce new re­sults in the field. Now, calcu­lus forms an im­por­tant base of our mod­ern civ­i­liza­tion.

Be­ing able to do ev­ery­thing in the lan­guage of bi­nary se­quences sim­plifies things greatly, and gives us great power. Now we don’t have to deal with com­plex con­cepts like “daugh­ter” and “sol­dier.” It’s all still there in the data, only as a large se­quence of 0s and 1s. We can treat it all the same.

All Algorithms

Now that we have a sim­ple way to deal with all types of data, let’s look at hy­pothe­ses. Re­call that we’re look­ing for a way to as­sign prior prob­a­bil­ities to hy­pothe­ses. (And then, when we en­counter new data, we’ll use Bayes’ The­o­rem to up­date the prob­a­bil­ities we as­sign to those hy­pothe­ses). To be com­plete, and guaran­tee we find the real ex­pla­na­tion for our data, we have to con­sider all pos­si­ble hy­pothe­ses. But how could we ever find all pos­si­ble ex­pla­na­tions for our data? We could sit in a room for days, mak­ing a list of all the ways the cook­ies could be miss­ing, and still not think of the pos­si­bil­ity that our wife took some to work.

It turns out that math­e­mat­i­cal ab­strac­tion can save the day. By us­ing a well-tested model, and the lan­guage of bi­nary, we can find all hy­pothe­ses.

This piece of the puz­zle was dis­cov­ered in 1936 by a man named Alan Tur­ing. He cre­ated a sim­ple, for­mal model of com­put­ers called “Tur­ing ma­chines” be­fore any­one had ever built a com­puter.

In Tur­ing’s model, each ma­chine’s lan­guage is—you guessed it—bi­nary. There is one bi­nary se­quence for the in­put, a sec­ond bi­nary se­quence that con­stantly gets worked on and re-writ­ten, and a third bi­nary se­quence for out­put. (This de­scrip­tion is called a three-tape Tur­ing ma­chine, and is eas­iest to think about for Solomonoff in­duc­tion. The nor­mal de­scrip­tion of Tur­ing ma­chines in­cludes only one tape, but it turns out that they are equiv­a­lent.)

The rules that de­ter­mine how the ma­chine re­acts to and changes the bits on these tapes are very sim­ple. An ex­am­ple is shown on the di­a­gram above. Ba­si­cally, ev­ery Tur­ing ma­chine has a finite num­ber of “states”, each of which is a lit­tle rule. Th­ese rules seem bland and bor­ing at first, but in a few para­graphs, you’ll find out why they’re so ex­cit­ing. First, the ma­chine will start out in a cer­tain state, with some bi­nary on the in­put tape, all ze­ros on the work tape, and all ze­ros on the out­put tape. The rules for that first state will tell it to look at the in­put tape and the work tape. Depend­ing on what bi­nary num­ber is on those two tapes, the rules will say to perform cer­tain ac­tions. It will say to;

  1. feed the in­put tape (or not)

  2. write 0 or 1 to the work tape

  3. move the work tape left or right

  4. write 0 or 1 to the out­put tape

  5. feed the out­put tape (or not).

After that, the rules will say which state to move to next, and the pro­cess will re­peat. Re­mem­ber that the rules for these states are fixed; they could be writ­ten out on pieces of pa­per. All that changes are the tapes, and what rule the ma­chine is cur­rently fol­low­ing. The ba­sic math­e­mat­ics be­hind this is fairly sim­ple to un­der­stand, and can be found in books on com­pu­ta­tional the­ory.

This model is in­cred­ibly pow­er­ful. Given the right rules, Tur­ing ma­chines can

  • calcu­late the square of a num­ber,

  • run a spread­sheet pro­gram,

  • com­press large files,

  • es­ti­mate the prob­a­bil­ity of rain to­mor­row,

  • con­trol the flight of an air­plane,

  • play chess bet­ter than a hu­man,

  • and much, much more.

You may have no­ticed that this sounds like a list of what reg­u­lar com­put­ers do. And you would be cor­rect; the model of Tur­ing ma­chines came be­fore, and served as an in­valuable guide to, the in­ven­tion of elec­tronic com­put­ers. Every­thing they do is within the model of Tur­ing ma­chines.

Even more ex­cit­ing is the fact that all at­tempts to for­mal­ize the in­tu­itive idea of “al­gorithm” or “pro­cess” have been proven to be at most equally as pow­er­ful as Tur­ing ma­chines. If a sys­tem has this prop­erty, it is called Tur­ing com­plete. For ex­am­ple, math equa­tions us­ing alge­bra have a huge range of al­gorithms they can ex­press. Mul­ti­ply­ing is an al­gorithm, find­ing the hy­potenuse of a right tri­an­gle is an al­gorithm, and the quadratic for­mula is an al­gorithm. Tur­ing ma­chines can run all these al­gorithms, and more. That is, Tur­ing ma­chines can be used to calcu­late out all alge­braic al­gorithms, but there are some al­gorithms Tur­ing ma­chines can run that can’t be rep­re­sented by alge­bra. This means alge­bra is not Tur­ing com­plete. For an­other ex­am­ple, math­e­mat­i­ci­ans of­ten in­vent “games” where se­quences of sym­bols can be rewrit­ten us­ing cer­tain rules. (They will then try to prove things about what se­quences these rules can cre­ate.) But no mat­ter how cre­ative their rules, ev­ery one of them can be simu­lated on a Tur­ing ma­chine. That is, for ev­ery set of re-writ­ing rules, there is a bi­nary se­quence you can give a Tur­ing ma­chine so that the ma­chine will rewrite the se­quences in the same way.

Re­mem­ber how limited the states of a Tur­ing ma­chine are; ev­ery ma­chine has only a finite num­ber of states with “if” rules like in the figure above. But some­how, us­ing these and the tape as mem­ory, they can simu­late ev­ery set of rules, ev­ery al­gorithm ever thought up. Even the dis­tinc­tively differ­ent the­ory of quan­tum com­put­ers is at most Tur­ing com­plete. In the 80 years since Tur­ing’s pa­per, no su­pe­rior sys­tems have been found. The idea that Tur­ing ma­chines truly cap­ture the idea of “al­gorithm” is called the Church-Tur­ing the­sis.

So the model of Tur­ing ma­chines cov­ers reg­u­lar com­put­ers, but that is not all. As men­tioned above, the cur­rent laws of physics can be rep­re­sented as a bi­nary se­quence. That is, the laws of physics are an al­gorithm that can be fed into a Tur­ing ma­chine to com­pute the past, pre­sent and fu­ture of the uni­verse. This in­cludes stars burn­ing, the cli­mate of the earth, the ac­tion of cells, and even the ac­tions and thoughts of hu­mans. Most of the power here is in the laws of physics them­selves. What Tur­ing dis­cov­ered is that these can be com­puted by a math­e­mat­i­cally sim­ple ma­chine.

As if Tur­ing’s model wasn’t amaz­ing enough, he went on to prove that one spe­cific set of these rules could simu­late all other sets of rules.

The com­puter with this spe­cial rule set is called a uni­ver­sal Tur­ing ma­chine. We simu­late an­other cho­sen ma­chine by giv­ing the uni­ver­sal ma­chine a com­piler bi­nary se­quence. A com­piler is a short pro­gram that trans­lates be­tween com­puter lan­guages or, in this case, be­tween ma­chines. Some­times a com­piler doesn’t ex­ist. For ex­am­ple, you couldn’t trans­late Su­per Mario Brothers onto a com­puter that only plays tic-tac-toe. But there will always be a com­piler to trans­late onto a uni­ver­sal Tur­ing ma­chine. We place this com­piler in front of the in­put which would have been given to the cho­sen ma­chine. From one per­spec­tive, we are just giv­ing the uni­ver­sal ma­chine a sin­gle, longer se­quence. But from an­other per­spec­tive, the uni­ver­sal ma­chine is us­ing the com­piler to set it­self up to simu­late the cho­sen ma­chine. While the uni­ver­sal ma­chine (us­ing its own, fixed rules) is pro­cess­ing the com­piler, it will write var­i­ous things on the work tape. By the time it has passed the com­piler and gets to the origi­nal in­put se­quence, the work tape will have some­thing writ­ten on it to help simu­late the cho­sen ma­chine. While pro­cess­ing the in­put, it will still fol­low its own, fixed rules, only the bi­nary on the work tape will guide it down a differ­ent “path” through those rules than if we had only given it the origi­nal in­put.

For ex­am­ple, say that we want to calcu­late the square of 42 (or in bi­nary, 101010). As­sume we know the rule set for the Tur­ing ma­chine which squares num­bers when given the num­ber in bi­nary. Given all the speci­fics, there is an al­gorith­mic way to find the “com­piler” se­quence based on these rules. Let’s say that the com­piler is 1011000. Then, in or­der to com­pute the square of 42 on the uni­ver­sal Tur­ing ma­chine, we sim­ply give it the in­put 1011000101010, which is just the com­piler 1011000 next to the num­ber 42 as 101010. If we want to calcu­late the square of 43, we just change the sec­ond part to 101011 (which is 101010 + 1). The com­piler se­quence doesn’t change, be­cause it is a prop­erty of the ma­chine we want to simu­late, e. g. the squar­ing ma­chine, but not of the in­put to that simu­lated ma­chine, e. g. 42.

In sum­mary: al­gorithms are rep­re­sented by Tur­ing ma­chines, and Tur­ing ma­chines are rep­re­sented by in­puts to the uni­ver­sal Tur­ing ma­chine. There­fore, al­gorithms are rep­re­sented by in­puts to the uni­ver­sal Tur­ing ma­chine.

In Solomonoff in­duc­tion, the as­sump­tion we make about our data is that it was gen­er­ated by some al­gorithm. That is, the hy­poth­e­sis that ex­plains the data is an al­gorithm. There­fore, a uni­ver­sal Tur­ing ma­chine can out­put the data, as long as you give the ma­chine the cor­rect hy­poth­e­sis as in­put. There­fore, the set of all pos­si­ble in­puts to our uni­ver­sal Tur­ing ma­chine is the set of all pos­si­ble hy­pothe­ses. This in­cludes the hy­poth­e­sis that the data is a list of the odd num­bers, the hy­poth­e­sis that the en­emy sol­dier is a sniper, and the hy­poth­e­sis that your daugh­ter ate the cook­ies. This is the power of for­mal­iza­tion and math­e­mat­ics.

Solomonoff’s Lightsaber

Now we can find all the hy­pothe­ses that would pre­dict the data we have ob­served. This is much more pow­er­ful than the in­for­mal state­ment of Oc­cam’s ra­zor. Be­cause of its pre­ci­sion and com­plete­ness, this pro­cess has been jok­ingly dubbed “Solomonoff’s Lightsaber”. Given our data, we find po­ten­tial hy­pothe­ses to ex­plain it by run­ning ev­ery hy­poth­e­sis, one at a time, through the uni­ver­sal Tur­ing ma­chine. If the out­put matches our data, we keep it. Other­wise, we throw it away.

By the way, this is where Solomonoff in­duc­tion be­comes in­com­putable. It would take an in­finite amount of time to check ev­ery al­gorithm. And even more prob­le­matic, some of the al­gorithms will run for­ever with­out pro­duc­ing out­put—and we can’t prove they will never stop run­ning. This is known as the halt­ing prob­lem, and it is a deep fact of the the­ory of com­pu­ta­tion. It’s the sheer num­ber of al­gorithms, and these pesky non-halt­ing al­gorithms that stop us from ac­tu­ally run­ning Solomonoff in­duc­tion.

The ac­tual pro­cess above might seem a lit­tle un­der­whelming. We just check ev­ery sin­gle hy­poth­e­sis? Really? Isn’t that a lit­tle mind­less and in­effi­cient? This will cer­tainly not be how the first true AI op­er­ates. But don’t for­get that be­fore this, no­body had any idea how to do ideal in­duc­tion, even in prin­ci­ple. Devel­op­ing fun­da­men­tal the­o­ries, like quan­tum me­chan­ics, might seem ab­stract and waste­ful. But his­tory has proven that it doesn’t take long be­fore such the­o­ries and mod­els change the world, as quan­tum me­chan­ics did with mod­ern elec­tron­ics. In the fu­ture, men and women will de­velop ways to ap­prox­i­mate Solomonoff in­duc­tion in a sec­ond. Per­haps they will de­velop meth­ods to elimi­nate large num­bers of hy­pothe­ses all at once. Maybe hy­pothe­ses will be bro­ken into dis­tinct classes. Or maybe they’ll use meth­ods to statis­ti­cally con­verge to­ward the right hy­pothe­ses.

So now, at least in the­ory, we have the whole list of hy­pothe­ses that might be the true cause be­hind our ob­ser­va­tions. Th­ese hy­pothe­ses, since they are al­gorithms, look like bi­nary se­quences. For ex­am­ple, the first few might be 01001101, 0011010110000110100100110, and 1000111110111111000111010010100001. That is, for each of these three, when you give them to the uni­ver­sal Tur­ing ma­chine as in­put, the out­put is our data. Which of these three do you think is more likely to be the true hy­poth­e­sis that gen­er­ated our data in the first place?

We have a list, but we’re try­ing to come up with a prob­a­bil­ity, not just a list of pos­si­ble ex­pla­na­tions. So how do we de­cide what the prob­a­bil­ity is of each of these hy­pothe­ses? Imag­ine that the true al­gorithm is pro­duced in a most un­bi­ased way: by flip­ping a coin. For each bit of the hy­poth­e­sis, we flip a coin. Heads will be 0, and tails will be 1. In the ex­am­ple above, 01001101, the coin landed heads, tails, heads, heads, tails, and so on. Be­cause each flip of the coin has a 50% prob­a­bil­ity, each bit con­tributes ½ to the fi­nal prob­a­bil­ity.

There­fore an al­gorithm that is one bit longer is half as likely to be the true al­gorithm. No­tice that this in­tu­itively fits Oc­cam’s ra­zor; a hy­poth­e­sis that is 8 bits long is much more likely than a hy­poth­e­sis that is 34 bits long. Why bother with ex­tra bits? We’d need ev­i­dence to show that they were nec­es­sary.

So, why not just take the short­est hy­poth­e­sis, and call that the truth? Be­cause all of the hy­pothe­ses pre­dict the data we have so far, and in the fu­ture we might get data to rule out the short­est one. We keep all con­sis­tent hy­pothe­ses, but weigh the shorter ones with higher prob­a­bil­ity. So in our eight-bit ex­am­ple, the prob­a­bil­ity of 01001101 be­ing the true al­gorithm is ½^8, or 1256. It’s im­por­tant to say that this isn’t an ab­solute prob­a­bil­ity in the nor­mal sense. It hasn’t been nor­mal­ized—that is, the prob­a­bil­ities haven’t been ad­justed so that they add up to 1. This is com­pu­ta­tion­ally much more difficult, and might not be nec­es­sary in the fi­nal im­ple­men­ta­tion of Solomonoff in­duc­tion. Th­ese prob­a­bil­ities can still be used to com­pare how likely differ­ent hy­pothe­ses are.

To find the prob­a­bil­ity of the ev­i­dence alone, all we have to do is add up the prob­a­bil­ity of all these hy­pothe­ses con­sis­tent with the ev­i­dence. Since any of these hy­pothe­ses could be the true one that gen­er­ates the data, and they’re mu­tu­ally ex­clu­sive, adding them to­gether doesn’t dou­ble count any prob­a­bil­ity.

For­mal­ized Science

Let’s go back to the pro­cess above that de­scribes the sci­en­tific method. We’ll see that Solomonoff in­duc­tion is this pro­cess made into an al­gorithm.

1. Make an ob­ser­va­tion.

Our ob­ser­va­tion is our bi­nary se­quence of data. Only bi­nary se­quences are data, and all bi­nary se­quences can qual­ify as data.

2. Form a hy­poth­e­sis that ex­plains the ob­ser­va­tion.

We use a uni­ver­sal Tur­ing ma­chine to find all pos­si­ble hy­pothe­ses, no fuzzi­ness in­cluded. The hy­poth­e­sis “ex­plains” the ob­ser­va­tion if the out­put of the ma­chine matches the data ex­actly.

3. Con­duct an ex­per­i­ment that will test the hy­poth­e­sis.

The only “ex­per­i­ment” is to ob­serve the data se­quence for longer, and run the uni­ver­sal ma­chine for longer. The hy­pothe­ses whose out­put con­tinues to match the data are the ones that pass the test.

4. If the ex­per­i­men­tal re­sults dis­con­firm the hy­poth­e­sis, re­turn to step #2 and form a hy­poth­e­sis not yet used. If the ex­per­i­men­tal re­sults con­firm the hy­poth­e­sis, pro­vi­sion­ally ac­cept the hy­poth­e­sis.

This step tells us to re­peat the “ex­per­i­ment” with each bi­nary se­quence in our match­ing col­lec­tion. How­ever, in­stead of “pro­vi­sion­ally” ac­cept­ing the hy­poth­e­sis, we ac­cept all match­ing hy­pothe­ses with a prob­a­bil­ity weight ac­cord­ing to its length.

Now we’ve found the truth, as best as it can be found. We’ve ex­cluded no pos­si­bil­ities. There’s no place for a sci­en­tist to be bi­ased, and we don’t need to de­pend on our cre­ativity to come up with the right hy­poth­e­sis or ex­per­i­ment. We know how to mea­sure how com­plex a hy­poth­e­sis is. Our only prob­lem left is to effi­ciently run it.

Approximations

As men­tioned be­fore, we ac­tu­ally can’t run all the hy­pothe­ses to see which ones match. There are in­finitely many, and some of them will never halt. So, just like our cake recipe, we need some very helpful ap­prox­i­ma­tions that still de­liver very close to true out­puts. Tech­ni­cally, all pre­dic­tion meth­ods are ap­prox­i­ma­tions of Solomonoff in­duc­tion, be­cause Solomonoff in­duc­tion tries all pos­si­bil­ities. But most meth­ods use a very small set of hy­pothe­ses, and many don’t use good meth­ods for es­ti­mat­ing their prob­a­bil­ities. How can we more di­rectly ap­prox­i­mate our recipe? At pre­sent, there aren’t any out­stand­ingly fast and ac­cu­rate ap­prox­i­ma­tions to Solomonoff in­duc­tion. If there were, we would be well on our way to true AI. Below are some ideas that have been pub­lished.

In Univer­sal Ar­tifi­cial In­tel­li­gence, Mar­cus Hut­ter pro­vides a full for­mal ap­prox­i­ma­tion of Solomonoff in­duc­tion which he calls AIXI-tl. This model is op­ti­mal in a tech­ni­cal and sub­tle way. Also it can always re­turn an an­swer in a finite amount of time, but this time is usu­ally ex­tremely long and dou­bles with ev­ery bit of data. It would still take longer than the life of the uni­verse be­fore we got an an­swer for most ques­tions. How can we do even bet­ter?

One gen­eral method would be to use a Tur­ing ma­chine that you know will always halt. Be­cause of the halt­ing prob­lem, this means that you won’t be test­ing some halt­ing al­gorithms that might be the cor­rect hy­pothe­ses. But you could still con­ceiv­ably find a large set of hy­pothe­ses that all halted.

Another pop­u­lar method of ap­prox­i­mat­ing any in­tractable al­gorithm is to use ran­dom­ness. This is called a Monte Carlo method. We can’t test all hy­pothe­ses, so we have to se­lect a sub­set. But we don’t want our se­lec­tion pro­cess to bias the re­sult. There­fore we could ran­domly gen­er­ate a bunch of hy­pothe­ses to test. We could use an evolu­tion­ary al­gorithm, where we test this seed set of hy­pothe­ses, and keep the ones that gen­er­ate data clos­est to ours. Then we would vary these hy­pothe­ses, run them again through the Tur­ing ma­chine, keep the clos­est fits, and con­tinue this pro­cess un­til the hy­pothe­ses ac­tu­ally pre­dicted our data ex­actly.

The math­e­mat­i­cian Jür­gen Sch­mid­hu­ber pro­poses a differ­ent prob­a­bil­ity weigh­ing which also gives high prob­a­bil­ity to hy­pothe­ses that can be quickly com­puted. He demon­strates how this is “near-op­ti­mal”. This is vastly faster, but risks mak­ing an­other as­sump­tion, that faster al­gorithms are in­her­ently more likely.

Un­re­solved Details

Solomonoff in­duc­tion is an ac­tive area of re­search in mod­ern math­e­mat­ics. While it is uni­ver­sal in a broad and im­pres­sive sense, some choices can still be made in the math­e­mat­i­cal defi­ni­tion. Many peo­ple also have philo­soph­i­cal con­cerns or ob­jec­tions to the claim that Solomonoff in­duc­tion is ideal and uni­ver­sal.

The first ques­tion many math­e­mat­i­ci­ans ask is, “Which uni­ver­sal Tur­ing ma­chine?” I have writ­ten this tu­to­rial as if there is only one, but there are in fact in­finitely many sets of rules that can simu­late all other sets of rules. Just as the length of a pro­gram will de­pend on which pro­gram­ming lan­guage you write it in, the length of the hy­poth­e­sis as a bi­nary se­quence will de­pend on which uni­ver­sal Tur­ing ma­chine you use. This means the prob­a­bil­ities will be differ­ent. The change, how­ever, is very limited. There are the­o­rems that well-define this effect and it is gen­er­ally agreed not to be a con­cern. Speci­fi­cally, go­ing be­tween two uni­ver­sal ma­chines can­not in­crease the hy­poth­e­sis length any more than the length of the com­piler from one ma­chine to the other. This length is fixed, in­de­pen­dent of the hy­poth­e­sis, so the more data you use, the less this differ­ence mat­ters.

Another con­cern is that the true hy­poth­e­sis may be in­com­putable. There are known defi­ni­tions of bi­nary se­quences which make sense, but which no Tur­ing ma­chine can out­put. Solomonoff in­duc­tion would con­verge to this se­quence, but would never pre­dict it ex­actly. It is also gen­er­ally agreed that noth­ing could ever pre­dict this se­quence, be­cause all known pre­dic­tors are equiv­a­lent to Tur­ing ma­chines. If this is a prob­lem, it is similar to the prob­lem of a finite uni­verse; there is noth­ing that can be done about it.

Lastly, many peo­ple, math­e­mat­i­ci­ans in­cluded, re­ject the ideas be­hind the model, such as that the uni­verse can be rep­re­sented as a bi­nary se­quence. This of­ten delves into com­plex philo­soph­i­cal ar­gu­ments, and of­ten re­volves around con­scious­ness.

Many more de­tails can be found the more one stud­ies. Many math­e­mat­i­ci­ans work on mod­ified ver­sions and ex­ten­sions where the com­puter learns how to act as well as pre­dict. How­ever these open ar­eas are re­solved, Solomonoff in­duc­tion has pro­vided an in­valuable model and per­spec­tive for re­search into solv­ing the prob­lem of how to find truth. (Also see Open Prob­lems Re­lated to Solomonoff In­duc­tion.)

Fun­da­men­tal the­o­ries have an effect of unify­ing pre­vi­ously sep­a­rate ideas. After Tur­ing dis­cov­ered the ba­sics of com­putabil­ity the­ory, it was only a mat­ter of time be­fore these ideas made their way across philos­o­phy and math­e­mat­ics. Statis­ti­ci­ans could only find ex­act prob­a­bil­ities in sim­plified situ­a­tions; now they know how to find them in all situ­a­tions. Scien­tists won­dered how to re­ally know which hy­poth­e­sis was sim­pler; now they can find a num­ber. Philoso­phers won­dered how in­duc­tion could be jus­tified. Solomonoff in­duc­tion gives them an an­swer.

So, what’s next? To build our AI truth-ma­chine, or even to find the pre­cise truth to a sin­gle real-world prob­lem, we need con­sid­er­able work on ap­prox­i­mat­ing our recipe. Ob­vi­ously a sci­en­tist can­not presently down­load a pro­gram to tell him the com­plex­ity of his hy­poth­e­sis re­gard­ing deep-sea life be­hav­ior. But now that we know how to find truth in prin­ci­ple, math­e­mat­i­ci­ans and com­puter sci­en­tists will work on find­ing it in prac­tice.

(How does this fit with Bayes’ the­o­rem? A fol­lowup.)