Decoherence is Simple

An epis­tle to the physi­cists:

When I was but a lit­tle lad, my father, a PhD physi­cist, warned me sternly against med­dling in the af­fairs of physi­cists; he said that it was hope­less to try to com­pre­hend physics with­out the for­mal math. Pe­riod. No es­cape clauses. But I had read in Feyn­man’s pop­u­lar books that if you re­ally un­der­stood physics, you ought to be able to ex­plain it to a non­physi­cist. I be­lieved Feyn­man in­stead of my father, be­cause Feyn­man had won the No­bel Prize and my father had not.

It was not un­til later—when I was read­ing the Feyn­man Lec­tures, in fact— that I re­al­ized that my father had given me the sim­ple and hon­est truth. No math = no physics.

By vo­ca­tion I am a Bayesian, not a physi­cist. Yet al­though I was raised not to med­dle in the af­fairs of physi­cists, my hand has been forced by the oc­ca­sional gross mi­suse of three terms: sim­ple, falsifi­able, and testable.

The fore­go­ing in­tro­duc­tion is so that you don’t laugh, and say, “Of course I know what those words mean!” There is math here. What fol­lows will be a restate­ment of the points in Belief in the Im­plied In­visi­ble, as they ap­ply to quan­tum physics.

Let’s be­gin with the re­mark that started me down this whole av­enue, of which I have seen sev­eral ver­sions; para­phrased, it runs:

The many-wor­lds in­ter­pre­ta­tion of quan­tum me­chan­ics pos­tu­lates that there are vast num­bers of other wor­lds, ex­ist­ing alongside our own. Oc­cam’s Ra­zor says we should not mul­ti­ply en­tities un­nec­es­sar­ily.

Now it must be said, in all fair­ness, that those who say this will usu­ally also con­fess:

But this is not a uni­ver­sally ac­cepted ap­pli­ca­tion of Oc­cam’s Ra­zor; some say that Oc­cam’s Ra­zor should ap­ply to the laws gov­ern­ing the model, not the num­ber of ob­jects in­side the model.

So it is good that we are all ac­knowl­edg­ing the con­trary ar­gu­ments, and tel­ling both sides of the story—

But sup­pose you had to calcu­late the sim­plic­ity of a the­ory.

The origi­nal for­mu­la­tion of William of Ock­ham stated:

Lex par­si­mo­niae: En­tia non sunt mul­ti­pli­canda praeter ne­ces­si­tatem.

“The law of par­si­mony: En­tities should not be mul­ti­plied be­yond ne­ces­sity.”

But this is qual­i­ta­tive ad­vice. It is not enough to say whether one the­ory seems more sim­ple, or seems more com­plex, than an­other—you have to as­sign a num­ber; and the num­ber has to be mean­ingful, you can’t just make it up. Cross­ing this gap is like the differ­ence be­tween be­ing able to eye­ball which things are mov­ing “fast” or “slow,” and start­ing to mea­sure and calcu­late ve­loc­i­ties.

Sup­pose you tried say­ing: “Count the words—that’s how com­pli­cated a the­ory is.”

Robert Hein­lein once claimed (tongue-in-cheek, I hope) that the “sim­plest ex­pla­na­tion” is always: “The woman down the street is a witch; she did it.” Eleven words—not many physics pa­pers can beat that.

Faced with this challenge, there are two differ­ent roads you can take.

First, you can ask: “The woman down the street is a what?” Just be­cause English has one word to in­di­cate a con­cept doesn’t mean that the con­cept it­self is sim­ple. Sup­pose you were talk­ing to aliens who didn’t know about witches, women, or streets—how long would it take you to ex­plain your the­ory to them? Bet­ter yet, sup­pose you had to write a com­puter pro­gram that em­bod­ied your hy­poth­e­sis, and out­put what you say are your hy­poth­e­sis’s pre­dic­tions—how big would that com­puter pro­gram have to be? Let’s say that your task is to pre­dict a time se­ries of mea­sured po­si­tions for a rock rol­ling down a hill. If you write a sub­rou­tine that simu­lates witches, this doesn’t seem to help nar­row down where the rock rolls—the ex­tra sub­rou­tine just in­flates your code. You might find, how­ever, that your code nec­es­sar­ily in­cludes a sub­rou­tine that squares num­bers.

Se­cond, you can ask: “The woman down the street is a witch; she did what?” Sup­pose you want to de­scribe some event, as pre­cisely as you pos­si­bly can given the ev­i­dence available to you—again, say, the dis­tance/​time se­ries of a rock rol­ling down a hill. You can pref­ace your ex­pla­na­tion by say­ing, “The woman down the street is a witch,” but your friend then says, “What did she do?,” and you re­ply, “She made the rock roll one me­ter af­ter the first sec­ond, nine me­ters af­ter the third sec­ond…” Pre­fac­ing your mes­sage with “The woman down the street is a witch,” doesn’t help to com­press the rest of your de­scrip­tion. On the whole, you just end up send­ing a longer mes­sage than nec­es­sary—it makes more sense to just leave off the “witch” pre­fix. On the other hand, if you take a mo­ment to talk about Gal­ileo, you may be able to greatly com­press the next five thou­sand de­tailed time se­ries for rocks rol­ling down hills.

If you fol­low the first road, you end up with what’s known as Kol­mogorov com­plex­ity and Solomonoff in­duc­tion. If you fol­low the sec­ond road, you end up with what’s known as Min­i­mum Mes­sage Length.

Ah, so I can pick and choose among defi­ni­tions of sim­plic­ity?

No, ac­tu­ally the two for­mal­isms in their most highly de­vel­oped forms were proven equiv­a­lent.

And I sup­pose now you’re go­ing to tell me that both for­mal­isms come down on the side of “Oc­cam means count­ing laws, not count­ing ob­jects.”

More or less. In Min­i­mum Mes­sage Length, so long as you can tell your friend an ex­act recipe they can men­tally fol­low to get the rol­ling rock’s time se­ries, we don’t care how much men­tal work it takes to fol­low the recipe. In Solomonoff in­duc­tion, we count bits in the pro­gram code, not bits of RAM used by the pro­gram as it runs. “En­tities” are lines of code, not simu­lated ob­jects. And as said, these two for­mal­isms are ul­ti­mately equiv­a­lent.

Now be­fore I go into any fur­ther de­tail on for­mal sim­plic­ity, let me digress to con­sider the ob­jec­tion:

So what? Why can’t I just in­vent my own for­mal­ism that does things differ­ently? Why should I pay any at­ten­tion to the way you hap­pened to de­cide to do things, over in your field? Got any ex­per­i­men­tal ev­i­dence that shows I should do things this way?

Yes, ac­tu­ally, be­lieve it or not. But let me start at the be­gin­ning.

The con­junc­tion rule of prob­a­bil­ity the­ory states:

For any propo­si­tions X and Y, the prob­a­bil­ity that “X is true, and Y is true,” is less than or equal to the prob­a­bil­ity that “X is true (whether or not Y is true).” (If this state­ment sounds not ter­ribly profound, then let me as­sure you that it is easy to find cases where hu­man prob­a­bil­ity as­ses­sors vi­o­late this rule.)

You usu­ally can’t ap­ply the con­junc­tion rule di­rectly to a con­flict be­tween mu­tu­ally ex­clu­sive hy­pothe­ses. The con­junc­tion rule only ap­plies di­rectly to cases where the left-hand-side strictly im­plies the right-hand-side. Fur­ther­more, the con­junc­tion is just an in­equal­ity; it doesn’t give us the kind of quan­ti­ta­tive calcu­la­tion we want.

But the con­junc­tion rule does give us a rule of mono­tonic de­crease in prob­a­bil­ity: as you tack more de­tails onto a story, and each ad­di­tional de­tail can po­ten­tially be true or false, the story’s prob­a­bil­ity goes down mono­ton­i­cally. Think of prob­a­bil­ity as a con­served quan­tity: there’s only so much to go around. As the num­ber of de­tails in a story goes up, the num­ber of pos­si­ble sto­ries in­creases ex­po­nen­tially, but the sum over their prob­a­bil­ities can never be greater than 1. For ev­ery story “X and Y,” there is a story “X and ¬Y.” When you just tell the story “X,” you get to sum over the pos­si­bil­ities Y and ¬Y.

If you add ten de­tails to X, each of which could po­ten­tially be true or false, then that story must com­pete with other equally de­tailed sto­ries for pre­cious prob­a­bil­ity. If on the other hand it suffices to just say X, you can sum your prob­a­bil­ity over stories

((X and Y and Z and …) or (X and ¬Y and Z and …) or …) .

The “en­tities” counted by Oc­cam’s Ra­zor should be in­di­vi­d­u­ally costly in prob­a­bil­ity; this is why we pre­fer the­o­ries with fewer of them.

Imag­ine a lot­tery which sells up to a mil­lion tick­ets, where each pos­si­ble ticket is sold only once, and the lot­tery has sold ev­ery ticket at the time of the draw­ing. A friend of yours has bought one ticket for $1—which seems to you like a poor in­vest­ment, be­cause the pay­off is only $500,000. Yet your friend says, “Ah, but con­sider the al­ter­na­tive hy­pothe­ses, ‘To­mor­row, some­one will win the lot­tery’ and ‘To­mor­row, I will win the lot­tery.’ Clearly, the lat­ter hy­poth­e­sis is sim­pler by Oc­cam’s Ra­zor; it only makes men­tion of one per­son and one ticket, while the former hy­poth­e­sis is more com­pli­cated: it men­tions a mil­lion peo­ple and a mil­lion tick­ets!”

To say that Oc­cam’s Ra­zor only counts laws, and not ob­jects, is not quite cor­rect: what counts against a the­ory are the en­tities it must men­tion ex­plic­itly, be­cause these are the en­tities that can­not be summed over. Sup­pose that you and a friend are puz­zling over an amaz­ing billiards shot, in which you are told the start­ing state of a billiards table, and which balls were sunk, but not how the shot was made. You pro­pose a the­ory which in­volves ten spe­cific col­li­sions be­tween ten spe­cific balls; your friend coun­ters with a the­ory that in­volves five spe­cific col­li­sions be­tween five spe­cific balls. What counts against your the­o­ries is not just the laws that you claim to gov­ern billiard balls, but any spe­cific billiard balls that had to be in some par­tic­u­lar state for your model’s pre­dic­tion to be suc­cess­ful.

If you mea­sure the tem­per­a­ture of your liv­ing room as 22 de­grees Cel­sius, it does not make sense to say: “Your ther­mome­ter is prob­a­bly in er­ror; the room is much more likely to be 20 °C. Be­cause, when you con­sider all the par­ti­cles in the room, there are ex­po­nen­tially vastly more states they can oc­cupy if the tem­per­a­ture is re­ally 22 °C—which makes any par­tic­u­lar state all the more im­prob­a­ble.” But no mat­ter which ex­act 22 °C state your room oc­cu­pies, you can make the same pre­dic­tion (for the su­per­vast ma­jor­ity of these states) that your ther­mome­ter will end up show­ing 22 °C, and so you are not sen­si­tive to the ex­act ini­tial con­di­tions. You do not need to spec­ify an ex­act po­si­tion of all the air molecules in the room, so that is not counted against the prob­a­bil­ity of your ex­pla­na­tion.

On the other hand—re­turn­ing to the case of the lot­tery—sup­pose your friend won ten lot­ter­ies in a row. At this point you should sus­pect the fix is in. The hy­poth­e­sis “My friend wins the lot­tery ev­ery time” is more com­pli­cated than the hy­poth­e­sis “Some­one wins the lot­tery ev­ery time.” But the former hy­poth­e­sis is pre­dict­ing the data much more pre­cisely.

In the Min­i­mum Mes­sage Length for­mal­ism, say­ing “There is a sin­gle per­son who wins the lot­tery ev­ery time” at the be­gin­ning of your mes­sage com­presses your de­scrip­tion of who won the next ten lot­ter­ies; you can just say “And that per­son is Fred Smith” to finish your mes­sage. Com­pare to, “The first lot­tery was won by Fred Smith, the sec­ond lot­tery was won by Fred Smith, the third lot­tery was…”

In the Solomonoff in­duc­tion for­mal­ism, the prior prob­a­bil­ity of “My friend wins the lot­tery ev­ery time” is low, be­cause the pro­gram that de­scribes the lot­tery now needs ex­plicit code that sin­gles out your friend; but be­cause that pro­gram can pro­duce a tighter prob­a­bil­ity dis­tri­bu­tion over po­ten­tial lot­tery win­ners than “Some­one wins the lot­tery ev­ery time,” it can, by Bayes’s Rule, over­come its prior im­prob­a­bil­ity and win out as a hy­poth­e­sis.

Any for­mal the­ory of Oc­cam’s Ra­zor should quan­ti­ta­tively define, not only “en­tities” and “sim­plic­ity,” but also the “ne­ces­sity” part.

Min­i­mum Mes­sage Length defines ne­ces­sity as “that which com­presses the mes­sage.”

Solomonoff in­duc­tion as­signs a prior prob­a­bil­ity to each pos­si­ble com­puter pro­gram, with the en­tire dis­tri­bu­tion, over ev­ery pos­si­ble com­puter pro­gram, sum­ming to no more than 1. This can be ac­com­plished us­ing a bi­nary code where no valid com­puter pro­gram is a pre­fix of any other valid com­puter pro­gram (“pre­fix-free code”), e.g. be­cause it con­tains a stop code. Then the prior prob­a­bil­ity of any pro­gram P is sim­ply where is the length of P in bits.

The pro­gram P it­self can be a pro­gram that takes in a (pos­si­bly zero-length) string of bits and out­puts the con­di­tional prob­a­bil­ity that the next bit will be 1; this makes P a prob­a­bil­ity dis­tri­bu­tion over all bi­nary se­quences. This ver­sion of Solomonoff in­duc­tion, for any string, gives us a mix­ture of pos­te­rior prob­a­bil­ities dom­i­nated by the short­est pro­grams that most pre­cisely pre­dict the string. Sum­ming over this mix­ture gives us a pre­dic­tion for the next bit.

The up­shot is that it takes more Bayesian ev­i­dence—more suc­cess­ful pre­dic­tions, or more pre­cise pre­dic­tions—to jus­tify more com­plex hy­pothe­ses. But it can be done; the bur­den of prior im­prob­a­bil­ity is not in­finite. If you flip a coin four times, and it comes up heads ev­ery time, you don’t con­clude right away that the coin pro­duces only heads; but if the coin comes up heads twenty times in a row, you should be con­sid­er­ing it very se­ri­ously. What about the hy­poth­e­sis that a coin is fixed to pro­duce HTTHTT… in a re­peat­ing cy­cle? That’s more bizarre—but af­ter a hun­dred coin­flips you’d be a fool to deny it.

Stan­dard chem­istry says that in a gram of hy­dro­gen gas there are six hun­dred billion trillion hy­dro­gen atoms. This is a startling state­ment, but there was some amount of ev­i­dence that sufficed to con­vince physi­cists in gen­eral, and you par­tic­u­larly, that this state­ment was true.

Now ask your­self how much ev­i­dence it would take to con­vince you of a the­ory with six hun­dred billion trillion sep­a­rately speci­fied phys­i­cal laws.

Why doesn’t the prior prob­a­bil­ity of a pro­gram, in the Solomonoff for­mal­ism, in­clude a mea­sure of how much RAM the pro­gram uses, or the to­tal run­ning time?

The sim­ple an­swer is, “Be­cause space and time re­sources used by a pro­gram aren’t mu­tu­ally ex­clu­sive pos­si­bil­ities.” It’s not like the pro­gram speci­fi­ca­tion, that can only have a 1 or a 0 in any par­tic­u­lar place.

But the even sim­pler an­swer is, “Be­cause, his­tor­i­cally speak­ing, that heuris­tic doesn’t work.”

Oc­cam’s Ra­zor was raised as an ob­jec­tion to the sug­ges­tion that neb­u­lae were ac­tu­ally dis­tant galax­ies—it seemed to vastly mul­ti­ply the num­ber of en­tities in the uni­verse. All those stars!

Over and over, in hu­man his­tory, the uni­verse has got­ten big­ger. A var­i­ant of Oc­cam’s Ra­zor which, on each such oc­ca­sion, would la­bel the vaster uni­verse as more un­likely, would fare less well un­der hu­man­ity’s his­tor­i­cal ex­pe­rience.

This is part of the “ex­per­i­men­tal ev­i­dence” I was al­lud­ing to ear­lier. While you can jus­tify the­o­ries of sim­plic­ity on mathy sorts of grounds, it is also de­sir­able that they ac­tu­ally work in prac­tice. (The other part of the “ex­per­i­men­tal ev­i­dence” comes from statis­ti­ci­ans /​ com­puter sci­en­tists /​ Ar­tifi­cial In­tel­li­gence re­searchers, test­ing which defi­ni­tions of “sim­plic­ity” let them con­struct com­puter pro­grams that do em­piri­cally well at pre­dict­ing fu­ture data from past data. Prob­a­bly the Min­i­mum Mes­sage Length paradigm has proven most pro­duc­tive here, be­cause it is a very adapt­able way to think about real-world prob­lems.)

Imag­ine a space­ship whose launch you wit­ness with great fan­fare; it ac­cel­er­ates away from you, and is soon trav­el­ing at . If the ex­pan­sion of the uni­verse con­tinues, as cur­rent cos­mol­ogy holds it should, there will come some fu­ture point where—ac­cord­ing to your model of re­al­ity—you don’t ex­pect to be able to in­ter­act with the space­ship even in prin­ci­ple; it has gone over the cos­molog­i­cal hori­zon rel­a­tive to you, and pho­tons leav­ing it will not be able to out­race the ex­pan­sion of the uni­verse.

Should you be­lieve that the space­ship liter­ally, phys­i­cally dis­ap­pears from the uni­verse at the point where it goes over the cos­molog­i­cal hori­zon rel­a­tive to you?

If you be­lieve that Oc­cam’s Ra­zor counts the ob­jects in a model, then yes, you should. Once the space­ship goes over your cos­molog­i­cal hori­zon, the model in which the space­ship in­stantly dis­ap­pears, and the model in which the space­ship con­tinues on­ward, give in­dis­t­in­guish­able pre­dic­tions; they have no Bayesian ev­i­den­tial ad­van­tage over one an­other. But one model con­tains many fewer “en­tities”; it need not speak of all the quarks and elec­trons and fields com­pos­ing the space­ship. So it is sim­pler to sup­pose that the space­ship van­ishes.

Alter­na­tively, you could say: “Over nu­mer­ous ex­per­i­ments, I have gen­er­al­ized cer­tain laws that gov­ern ob­served par­ti­cles. The space­ship is made up of such par­ti­cles. Ap­ply­ing these laws, I de­duce that the space­ship should con­tinue on af­ter it crosses the cos­molog­i­cal hori­zon, with the same mo­men­tum and the same en­ergy as be­fore, on pain of vi­o­lat­ing the con­ser­va­tion laws that I have seen hold­ing in ev­ery ex­am­inable in­stance. To sup­pose that the space­ship van­ishes, I would have to add a new law, ‘Things van­ish as soon as they cross my cos­molog­i­cal hori­zon.’ ”

The de­co­her­ence (a.k.a. many-wor­lds) ver­sion of quan­tum me­chan­ics states that mea­sure­ments obey the same quan­tum-me­chan­i­cal rules as all other phys­i­cal pro­cesses. Ap­ply­ing these rules to macro­scopic ob­jects in ex­actly the same way as micro­scopic ones, we end up with ob­servers in states of su­per­po­si­tion. Now there are many ques­tions that can be asked here, such as

“But then why don’t all bi­nary quan­tum mea­sure­ments ap­pear to have 5050 prob­a­bil­ity, since differ­ent ver­sions of us see both out­comes?”

How­ever, the ob­jec­tion that de­co­her­ence vi­o­lates Oc­cam’s Ra­zor on ac­count of mul­ti­ply­ing ob­jects in the model is sim­ply wrong.

De­co­her­ence does not re­quire the wave­func­tion to take on some com­pli­cated ex­act ini­tial state. Many-wor­lds is not spec­i­fy­ing all its wor­lds by hand, but gen­er­at­ing them via the com­pact laws of quan­tum me­chan­ics. A com­puter pro­gram that di­rectly simu­lates quan­tum me­chan­ics to make ex­per­i­men­tal pre­dic­tions, would re­quire a great deal of RAM to run—but simu­lat­ing the wave­func­tion is ex­po­nen­tially ex­pen­sive in any fla­vor of quan­tum me­chan­ics! De­co­her­ence is sim­ply more so. Many phys­i­cal dis­cov­er­ies in hu­man his­tory, from stars to galax­ies, from atoms to quan­tum me­chan­ics, have vastly in­creased the ap­par­ent CPU load of what we be­lieve to be the uni­verse.

Many-wor­lds is not a zillion wor­lds worth of com­pli­cated, any more than the atomic hy­poth­e­sis is a zillion atoms worth of com­pli­cated. For any­one with a quan­ti­ta­tive grasp of Oc­cam’s Ra­zor that is sim­ply not what the term “com­pli­cated” means.

As with the his­tor­i­cal case of galax­ies, it may be that peo­ple have mis­taken their shock at the no­tion of a uni­verse that large, for a prob­a­bil­ity penalty, and in­voked Oc­cam’s Ra­zor in jus­tifi­ca­tion. But if there are prob­a­bil­ity penalties for de­co­her­ence, the lar­ge­ness of the im­plied uni­verse, per se, is definitely not their source!

The no­tion that de­co­her­ent wor­lds are ad­di­tional en­tities pe­nal­ized by Oc­cam’s Ra­zor is just plain mis­taken. It is not sort-of-right. It is not an ar­gu­ment that is weak but still valid. It is not a defen­si­ble po­si­tion that could be shored up with fur­ther ar­gu­ments. It is en­tirely defec­tive as prob­a­bil­ity the­ory. It is not fix­able. It is bad math.