# An Undergraduate Reading Of: Macroscopic Prediction by E.T. Jaynes

I stum­bled across a pa­per from 1996 on Macro­scopic Pre­dic­tion by E.T. Jaynes which in­ter­ested me; I thought I would doc­u­ment my read­ing in the style I recom­mended in this post. I don’t have any ex­per­tise in Jaynes’ fields, so it will serve as a good check for in­tu­itions. It also may be of his­tor­i­cal in­ter­est to the com­mu­nity. Lastly, I don’t call it an un­der­grad­u­ate read­ing for nuthin’, so it may be di­rectly in­for­ma­tional for peo­ple with less math­e­mat­i­cal or sci­en­tific back­ground.

This pa­per is or­ga­nized a lit­tle differ­ently, with the fol­low­ing sec­tions:

1. INTRODUCTION

2. HISTORICAL BACKGROUND

3. THE BASIC IDEA

4. MATHEMATICAL FORMALISM

5. THE MAXIMUM CALIBER PRINCIPLE

6. BUBBLE DYNAMICS

7. CONCLUSION

8. REFERENCES

I will match these sec­tions up to the di­vi­sions I used origi­nally, and go from there.

Abstract

There isn’t one in the ver­sion I have, which makes sense on ac­count of this pa­per not prov­ing a par­tic­u­lar re­sult.

Introduction

This in­cludes sec­tions 1 and 2. The ques­tion is why macrophe­nom­ena are difficult to pre­dict. The goal is to find a prin­ci­ple suffi­ciently gen­eral that it can be used for physics ques­tions and for ques­tions of biol­ogy and eco­nomics. The lat­ter is an ex­am­ple where pre­dic­tions are very poor, but even physics has difficul­ties with the re­la­tion­ship be­tween macrophe­nom­ena and microphe­nom­ena, e.g. lasers. Some key points:

• Microphe­nom­ena and macrophe­nom­ena are defined in re­la­tion to each other: in gen­eral, un­der­stand­ing the el­e­ments that make up some­thing else is in­suffi­cient for un­der­stand­ing the some­thing else. An ad­di­tional prin­ci­ple is needed.

• Jaynes ar­gues that the Gibbs en­tropy from ther­mo­dy­nam­ics is that prin­ci­ple.

• Statis­ti­cal me­chan­ics does not work for this be­cause its the­o­rems treat the microstate as get­ting close to all pos­si­ble states al­lowed by the to­tal en­ergy. This does not match ob­ser­va­tions, e.g. solids and or­ganisms.

• Over hu­man-rele­vant timescales, we see far fewer con­figu­ra­tions of macrophe­nom­ena than al­lowed by their en­ergy.

• Given in­for­ma­tion about macro­scopic quan­tities A, other rele­vant in­for­ma­tion I, what can we say about other macro­scopic quan­tities B?

• Not enough in­for­ma­tion for de­duc­tion, there­fore inference

• Carnot → Kelvin → Clausius

• Clau­sius’ state­ment of the Se­cond Law of Ther­mo­dy­nam­ics gives lit­tle in­for­ma­tion about fu­ture macrostates, and only says en­tropy trends to­ward in­creas­ing. In­ter­me­di­ate states un­defined.

• En­ter Gibbs, with a vari­a­tional prin­ci­ple for de­ter­min­ing the fi­nal equil­ibrium state.

• No­body seems to have no­ticed un­til G. N. Lewis in 1923, 50 years later.

• 50 years af­ter G. N. Lewis, Jaynes thought we had only about half of the available in­sight from Gibbs.

• This is prob­a­bly be­cause Gibbs died young, with­out time for ex­pos­i­tory work and stu­dents to carry on. There­fore re-dis­cov­ery was nec­es­sary.

• A quote:

We enun­ci­ate a rather ba­sic prin­ci­ple, which might be dis­missed as an ob­vi­ous triv­ial­ity were it not for the fact that it is not rec­og­nized in any of the liter­a­ture known to this writer:

If any macrophe­nomenon is found to be re­pro­ducible, then it fol­lows that all micro­scopic de­tails that were not re­pro­duced, must be ir­rele­vant for un­der­stand­ing and pre­dict­ing it. In par­tic­u­lar, all cir­cum­stances that were not un­der the ex­per­i­menter’s con­trol are very likely not to be re­pro­duced, and there­fore are very likely not to be rele­vant.

• Con­trol of a few macro­scopic quan­tities is of­ten enough for a re­pro­ducible macro­scopic re­sult, e.g. heat con­duc­tion, vis­cu­ous lam­i­nar flow, shock­waves, lasers.

• DNA de­ter­mines most things about the or­ganism; this is highly re­pro­ducible; should be pre­dictable.

• We should ex­pect that progress since Clau­sius deals with how to rec­og­nize and deal with I. Gibbs does this. Physics re­mains stuck with Clau­sius’ for­mu­la­tion, de­spite bet­ter al­ter­na­tives be­ing available. [See a bit more on this in the com­ments]

• Phys­i­cal chemists have used Gibbs through G.N. Lewis for a long time, but rule-of-thumb ex­ten­sions to cover non-equil­ibrium cases are nu­mer­ous and un­satis­fac­tory.

Body

This in­cludes sec­tions 3-6. Skipped.

Conclusion

The con­clu­sion clar­ifies the re­la­tion­ship be­tween this idea and what is cur­rently (as of 1996) be­ing done on similar prob­lems.

• Pos­si­ble mis­con­cep­tion: re­cent work on macrostates is about dy­nam­ics, like micro­scopic equa­tions of mo­tion or higher-level dy­nam­i­cal mod­els; they ig­nore en­tropy.

• If the macrostates differ lit­tle in en­tropy, then en­tropy-less solu­tions are ex­pected to be suc­cess­ful. Areas where they do not work are a good can­di­date for this en­tropy method.

• It is ex­pected that dy­nam­ics will reap­pear au­to­mat­i­cally when us­ing the en­tropy method on re­al­is­tic prob­lems, through the Heisen­berg op­er­a­tor.

Re­turn to Body

Pick­ing back up with sec­tion 3, and car­ry­ing through.

• First thought: the macrostate is only a pro­jec­tion of the microstate with less de­tail, ergo microbe­hav­ior de­ter­mines mac­robe­hav­ior. There are no other con­sid­er­a­tions.

• This is wrong. We have to con­sider that we never know the microstate, only about the macrostate.

• Re­pro­ducibil­ity means that should be enough, if we can use the in­for­ma­tion right.

• Gibbs and Hetere­o­ge­neous Equil­ibrium: given a few macrovari­ables in non-equil­ibrium, pre­dict the fi­nal equil­ibrium macrostate.

• To solve this, Gibbs made the Se­cond Law a stronger state­ment: en­tropy will in­crease, to the max­i­mum al­lowed by ex­per­i­men­tal con­di­tions and con­ser­va­tion laws.

• This makes the Se­cond Law weaker than con­ser­va­tion laws: there are microstates al­lowed by the data for which the sys­tem will not go to the macrostate of max­i­mum en­tropy.

• If re­pro­ducible, then Gibbs’ rule pre­dicts quan­ti­ta­tively.

• En­tropy is only a prop­erty of the macrostate. Un­for­tu­nately, Gibbs did not elu­ci­date en­tropy it­self.

• From Boltz­mann, Ein­stein, and Planck: the ther­mo­dy­namic en­tropy is ba­si­cally the log­a­r­ithm of the phase vol­ume; the num­ber of ways it can be re­al­ized.

• Quote:

Gibbs’ vari­a­tional prin­ci­ple is, there­fore, so sim­ple in ra­tio­nale that one al­most hes­i­tates to ut­ter such a triv­ial­ity; it says “pre­dict that fi­nal state that can be re­al­ized by Na­ture in the great­est num­ber of ways, while agree­ing with your macro­scopic in­for­ma­tion.”

• Gen­er­al­izes: pre­dict the be­hav­ior that can hap­pen in the great­est num­ber of ways, while agree­ing with what­ever in­for­ma­tion you have.

• From sim­plic­ity, gen­er­al­ity. Then Jaynes chides sci­en­tists for de­mand­ing com­plex­ity to ac­cept things.

• Re­pro­ducibil­ity means that we have all the re­quired in­for­ma­tion.

• Macrostate in­for­ma­tion A means some class of microstates C, the ma­jor­ity of which have to agree for re­pro­ducibil­ity to hap­pen.

• A sub­set of microstates in C would not lead to the pre­dicted re­sult, there­fore it is in­fer­ence rather than de­duc­tion.

• In ther­mo­dy­nam­ics a small in­crease in the en­tropy of a macrostate leads to an enor­mous in­crease in the num­ber of ways to re­al­ize it; this is why Gibbs’ rule works.

• We can­not ex­pect as large a ra­tio in other fields, but that is not nec­es­sary to be use­ful and can be com­pen­sated for with more in­for­ma­tion.

• The in­for­ma­tion is use­ful in­so­far as it shrinks class C; how use­ful is how much en­tropy re­duc­tion it achieves.

• We need to lo­cate C and de­ter­mine which macrostate is con­sis­tent with most of them. En­ter prob­a­bil­ity the­ory.

[I haven’t figured out how to work LaTex in this in­ter­face, so I am skip­ping the bulk of the Math­e­mat­i­cal For­mal­ism sec­tion. It is also freely available in the link above]

• We use prob­a­bil­ity dis­tri­bu­tions over microstates. This be­ing the early stages of the Bayesian Wars, obli­ga­tory fre­quen­tism sux.

• Asymp­totic equipar­ti­tion the­o­rem of in­for­ma­tion the­ory, us­ing von Neu­mann-Shan­non in­for­ma­tion en­tropy from quan­tum the­ory.

• From ex­per­i­men­ta­tion, we see W = exp(H) is valid.

• Equil­ibrium statis­ti­cal me­chan­ics is con­tained in the rule as a spe­cial case.

• There is a prob­lem of “in­duc­tion time”; if all we have is t = 0, then our pre­dic­tions are already as good is pos­si­ble.

• Real phe­nom­ena have already per­sisted from some time in the past, so in­duc­tion time prob­lem is re­solved. Values of A at t != 0 im­prove pre­dic­tions.

• This mo­ti­vates the in­ter­pre­ta­tion of prob­a­bil­ities.

• The prob­a­bil­ity den­sity ma­trix, with max­i­mum en­tropy, for one mo­ment of time, as­signs equal prob­a­bil­ity to ev­ery com­pat­i­ble state re­gard­less of his­tory.

• Fad­ing mem­ory effects are char­ac­ter­is­tic of ir­re­versible pro­cesses; be­hav­ior de­pends on his­tory.

• There is an ex­ten­sion to al­low time-de­pen­dent in­for­ma­tion; this in­cludes the dreaded “ex­tended in the ob­vi­ous way.”

• At this point if you use reg­u­lar ther­mo­dy­namic pa­ram­e­ters, the Clau­sius ex­per­i­men­tal en­tropy falls out.

• Max­i­mum in­for­ma­tion en­tropy as a func­tion of the en­tire space-time his­tory of the macro­scopic pro­cess: the cal­iber.

• There’s a Max­i­mum Cal­iber Prin­ci­ple which flat defeats me be­cause I don’t know any­thing about the Fokker-Planck and On­sager work it makes refer­ence to.

• In the Bub­ble Dy­nam­ics sec­tion they offer a sketch of us­ing short term mem­ory effects.

So com­pletes read­ing a Jaynes pa­per from about an un­der­grad­u­ate level.

No nominations.
No reviews.
• Naive Comments

I find this pa­per pretty in­spira­tional. I’ve been play­ing with the in­tu­itions he lays out in the first two sec­tions for days.

It was writ­ten in 1996, and I am not al­to­gether sure where the Prin­ci­ple of Max­i­mum En­tropy fits in—he uses the phrase ‘max­i­mum en­tropy’ a lot. It oc­curs to me the Prin­ci­ple of Max­i­mum Cal­iber may have a re­la­tion­ship with MaxEnt similar to that be­tween Gibbs and Clau­sius’ state­ments of the Se­cond Law of Ther­mo­dy­nam­ics, but this isn’t clear to me in the main be­cause I know al­most noth­ing about MaxEnt.

I was also read­ing the re­ply to Fran­cois Chol­let where the im­prove­ment of AlphaGo Zero over AlphaGo was be­ing given as an ex­am­ple. In think­ing about that in re­la­tion to this pa­per, I have two feel­ings:

1) I no­tice not a lot of cov­er­age of what AlphaGo Zero was ac­tu­ally do­ing dur­ing the three day train­ing pe­riod, and I re­ally should look that up speci­fi­cally.

2) What I sus­pect hap­pened is AlphaGo Zero brute-force mapped the “phase-space” of Go for three days. The pos­si­ble com­bi­na­tions of piece po­si­tions (microstates) are com­pu­ta­tion­ally in­tractable, so I read—AlphaGo Zero went to work on a differ­ent level of macrophe­nom­ena. So given the rules of Go and cur­rent po­si­tion A, and vir­tu­ally all I, it con­fi­dently pre­dicts the win­ning end-game po­si­tions B.

This makes me think that the real trick to good pre­dic­tions is mak­ing the op­ti­mal choice of macrophe­nom­ena. In the pa­per Jaynes con­sis­tently high­lights nu­ance re­lated to dis­t­in­guish­ing his method from statis­ti­cal me­chan­ics, which makes sense as that is oth­er­wise how peo­ple know him. It seems pretty clear to me that his gen­er­al­iza­tions liber­ate us from the tra­di­tional as­so­ci­a­tions, which opens up a lot of room for new cat­e­gories of macrophe­nom­ena. For ex­am­ple, con­sider this from the pa­per:

On a differ­ent plane, we feel that we un­der­stand the gen­eral think­ing and eco­nomic mo­ti­va­tions of the in­di­vi­d­ual peo­ple who are the micro-el­e­ments of a so­ciety; yet mil­lions of those peo­ple com­bine to make a macroe­co­nomic sys­tem whose os­cilla­tions and un­sta­ble be­hav­ior, in defi­ance of equil­ibrium the­ory, leave us be­wil­dered.

So we have:

hu­mans (microphe­nom­ena) → econ­omy (macrophe­nom­ena)

But sup­pose we find hu­mans com­pu­ta­tion­ally in­tractable and the macrophe­nom­ena of the econ­omy too im­pre­cise. We could add a mid­dle layer of in­sti­tu­tions, like firms and gov­ern­ments, which are also made up of hu­mans. So now we have:

hu­mans (microphe­nom­ena) → in­sti­tu­tions (macrophe­nom­ena)

AND

in­sti­tu­tions (microphe­nom­ena) → econ­omy (macrophe­nom­ena)

So if it hap­pens that in­sti­tu­tions is some­thing you can get a good grip on, no one else will be able to sig­nifi­cantly out-pre­dict you about the econ­omy un­less they can get a bet­ter grip than you on in­sti­tu­tions, or they find a new macrophe­nom­ena above hu­mans that they can mas­ter com­pa­rably well and con­tains more in­for­ma­tion about the econ­omy than in­sti­tu­tions do.

Since we are start­ing from the per­spec­tive of macrophe­nom­ena, I keep want­ing to say re­s­olu­tion. So if we have our microphe­nom­ena on the bot­tom and the macrophe­nom­ena at the top, one strat­egy might be to look ‘down’ from the macrophe­nom­ena and try to iden­tify the low­est-level in­ter­me­di­ate-phe­nom­ena that can be rea­son­ably com­puted, and then get a de­ci­sive de­scrip­tion of that phe­nom­ena be­fore re­turn­ing to pre­dict­ing the macrophe­nom­ena.

Sort of the same way a Fast Fourier Trans­form works; by clev­erly choos­ing in­ter­me­di­ate steps, we can get to the an­swer we want faster (or in this case, more ac­cu­rately).

• To use LaTeX in our ed­i­tor: Press CTRL+4 (or cmd-4 on a Mac) and you will en­ter LaTeX mode.

• Su­perb! I will fid­dle with this and start adding the key equa­tions over the com­ing days.

• If any macrophe­nomenon is found to be re­pro­ducible, then it fol­lows that all micro­scopic de­tails that were not re­pro­duced, must be ir­rele­vant for un­der­stand­ing and pre­dict­ing it. In par­tic­u­lar, all cir­cum­stances that were not un­der the ex­per­i­menter’s con­trol are very likely not to be re­pro­duced, and there­fore are very likely not to be rele­vant.

I’m hav­ing trou­ble ex­press­ing in words just how use­ful that is. It clar­ifies a whole range of ques­tions and top­ics I think about reg­u­larly. Thankyou for shar­ing!

• Ex­tremely use­ful and con­ve­niently timely for my own work. Thanks for writ­ing this up.

• Note that this pa­per was first pub­lished in 1985, not 1996. The full source is in a foot­note at the bot­tom of the first page.

• Well spot­ted! This helps with putting the max­i­mum en­tropy com­ments in con­text.

• Pro­moted to front­page.

• A word on the bet­ter meth­ods Jaynes refers to: these are [1] and [2] in the Refer­ences. I ac­tu­ally en­coun­tered [2], Trues­dell’s Ra­tional Ther­mo­dy­nam­ics, pre­vi­ously as a con­se­quence of this com­mu­nity.

The pitch here is ba­si­cally tack­ling ther­mo­dy­nam­ics from ax­ioms with field equa­tions. In par­tic­u­lar Trues­dell ad­vo­cated a method of mo­ments, which is to say keep adding fields of the be­hav­ior you are con­cerned with to re­fine the an­swer.

Trues­dell was im­por­tant to the field of Con­tinuum Me­chan­ics, which is still used in en­g­ineer­ing. The idea here is that rather than calcu­lat­ing what hap­pens to each par­ti­cle of a ma­te­rial, the ma­te­rial is treated as a con­tinuum, and then you calcu­late the change in the par­tic­u­lar prop­erty you are con­cerned with. This is an effi­cient way to get nu­mer­i­cal an­swers about stress, shear, heat con­duc­tion, mem­ory effects, tear­ing, etc. Ra­tional Ther­mo­dy­nam­ics is gen­er­al­iz­ing the con­tinuum method. They have proved Navier-Stokes as a spe­cial case of the method of mo­ments, al­though they also demon­strated that many ad­di­tional mo­ments does not sig­nifi­cantly out­perform Navier-Stokes. The sug­ges­tion is that it may take hun­dreds or thou­sands of mo­ments to get an im­prove­ment here, which was im­prac­ti­cal at the time Trues­dell was writ­ing. How­ever, that was be­fore we had re­ally pow­er­ful com­put­ing tools for ad­dress­ing prob­lems like this.

The way I got to Trues­dell was through an anony­mous blog from a poster on ei­ther the old LessWrong or pos­si­bly even the joint Over­com­ing Bias posts, and in that blog they refer­enced a re­view Jaynes did of one of Trues­dell’s pa­pers. Jaynes was im­pressed that Trues­dell had ar­rived at vir­tu­ally the same for­mal­ism that Jaynes him­self had, through purely math­e­mat­i­cal means.

This is not im­por­tant, ex­cept for be­ing cool and mo­ti­vat­ing me to look into Ra­tional Ther­mo­dy­nam­ics.

• I liked this pa­per and sum­mary, and was able to fol­low most of it ex­cept for the ac­tual physics :)

I feel like I missed some­thing im­por­tant though:

If we are try­ing to judge , what’s the use of know­ing the en­tropy of state ? The thrust I got was “Give weight to pos­si­ble in ac­cor­dance with their en­tropy, and some­how con­strain that with info from ”, but I didn’t get a sense of what us­ing as con­straints looked like (I ex­pect that it would make more sense if I could do the physics ex­am­ples).

• I think this is cap­tured in Sec­tion 5, the Max­i­mum Cal­iber Prin­ci­ple:

We are given macro­scopic in­for­ma­tion A which might con­sist of val­ues of sev­eral phys­i­cal quan­tities . . . such as dis­tri­bu­tion of stress, mag­ne­ti­za­tion, con­cen­tra­tion of var­i­ous chem­i­cal com­po­nents, etc. in var­i­ous space time re­gions. This defines a cal­iber . . . which mea­sures the num­ber of time de­pen­dent microstates con­sis­tent with the in­for­ma­tion A.

So the idea is that you take the macro in­for­ma­tion A, use that to iden­tify the space of pos­si­ble microstates. For max­i­mum rigor you do this in­de­pen­dently for A and B, and if they do not share any microstates then B is im­pos­si­ble. When we make a pre­dic­tion about B, we choose the value of B that has the biggest over­lap with the pos­si­ble microstates of A.

He talks a lit­tle bit more about the mo­ti­va­tion for do­ing this in the Con­clu­sion, here:

We should cor­rect a pos­si­ble mis­con­cep­tion that the reader may have gained. Most re­cent dis­cus­sions of macrophe­nom­ena out­side of phys­i­cal chem­istry con­cen­trate en­tirely on the dy­nam­ics (micro­scopic equa­tions of mo­tion or an as­sumed dy­nam­i­cal model at a higher level, de­ter­minis­tic or stochas­tic) and ig­nore the en­tropy fac­tors of macrostates al­to­gether. In­deed, we ex­pect that such efforts will suc­ceed fairly well if the macrostates of in­ter­est do not differ greatly in en­tropy.

Em­pha­sis mine. So the idea here is that if you don’t need to ac­count for the en­tropy of A, you will be able to tackle the prob­lem us­ing nor­mal meth­ods. If the nor­mal meth­ods fail, it’s a sign that we need to ac­count for the en­tropy of A, and there­fore to use this method.

I can’t do the physics ex­am­ples ei­ther ex­cept in very sim­ple cases. I am com­forted by this line:

Although the math­e­mat­i­cal de­tails needed to carry it out can be­come al­most in­finitely com­pli­cated...
• Thanks! In my head, I was us­ing the model of “flip 100 coins, ex­act value of all coins is micro states, heads-tails count is macro state”. In that model, the macro states form dis­joint sets, so it’s prob­a­bly not a good ex­am­ple.

I think I get your point in ab­stract, but I’m strug­gling to form an ex­am­ple model that fits it. Any sug­ges­tions?

• Apolo­gies for this be­ing late; I also strug­gled to come up with an ex­am­ple model. Check­ing the refer­ences, he talks about A more thor­oughly in the pa­per where the idea was origi­nally pre­sented.

I strongly recom­mend tak­ing a look at page 5 of the PDF, which is where he starts a two page sec­tion clar­ify­ing the mean­ing of en­tropy in this con­text. I think this will help a lot...once I figure it out.