# [LINK] The Bayesian Second Law of Thermodynamics

Sean Car­roll et al. posted a preprint with the above ti­tle. Sean also has a dis­cus­sion of it in his blog.

While I am a physi­cist by train­ing, statis­ti­cal me­chan­ics and ther­mo­dy­nam­ics is not my strong suit, and I hope some­one with ex­per­tise in the area can give their per­spec­tive on the pa­per. For now, here is my sum­mary, apolo­gies for any po­ten­tial er­rors:

There is a ten­sion be­tween differ­ent defi­ni­tions of en­tropy: Boltz­mann en­tropy, which counts macro­scop­i­cally in­dis­t­in­guish­able microstates always in­creases, ex­cept for ex­tremely rare de­creases. Gibbs/​Shan­non en­tropy, which counts our knowl­edge of a sys­tem, can de­crease if an ob­server ex­am­ines the sys­tem and learns some­thing new about it. Jaynes had a pa­per on that topic, Eliezer dis­cussed this in the Se­quences, and spxtr re­cently wrote a post about it. Now Car­roll and col­lab­o­ra­tors pro­pose the “Bayesian Se­cond Law” that quan­tifies this de­crease in Gibbs/​Shan­non en­tropy due to a mea­sure­ment:

[...] we de­rive the Bayesian Se­cond Law of Ther­mo­dy­nam­ics, which re­lates the origi­nal (un-up­dated) dis­tri­bu­tion at ini­tial and fi­nal times to the up­dated dis­tri­bu­tion at ini­tial and fi­nal times. That re­la­tion­ship makes use of the cross en­tropy be­tween two dis­tri­bu­tions [...]

[...] the Bayesian Se­cond Law (BSL) tells us that this lack of knowl­edge — the amount we would learn on av­er­age by be­ing told the ex­act state of the sys­tem, given that we were us­ing the un-up­dated dis­tri­bu­tion — is always larger at the end of the ex­per­i­ment than at the be­gin­ning (up to cor­rec­tions be­cause the sys­tem may be emit­ting heat)

This last point seems to re­solve the ten­sion be­tween the two defi­ni­tions of en­tropy, and has ap­pli­ca­tions to non-equil­ibrium pro­cesses, where an ob­server is re­placed with an out­come of some nat­u­ral pro­cess, such as RNA self-as­sem­bly.

• There is so much con­fu­sion sur­round­ing the topic of en­tropy. Which is some­what sad, since it’s fun­da­men­tally a very well-defined and use­ful con­cept. En­tropy is my strong suit, and I’ll try to see if I can help.

There are no ‘differ­ent defi­ni­tions’ of en­tropy. Boltz­mann and Shan­non En­tropy are the same con­cept. The prob­lem is that in­for­ma­tion the­ory by it­self doesn’t give the com­plete phys­i­cal pic­ture of en­tropy. Shan­non en­tropy only tells you what the en­tropy of a given dis­tri­bu­tion is, but it doesn’t tell you what the dis­tri­bu­tion of states for a phys­i­cal sys­tem is. This is the root of the ‘ten­sion’ that you’re de­scribing. Much of the prob­lems in rec­on­cil­ing in­for­ma­tion the­ory with statis­ti­cal me­chan­ics have been that we don’t of­ten have a clear idea what the dis­tri­bu­tion of states of a given sys­tem is.

which counts macro­scop­i­cally in­dis­t­in­guish­able microstates always in­creases, ex­cept for ex­tremely rare de­creases.

The 2nd law is never vi­o­lated, not even a lit­tle. Un­for­tu­nately the idea that en­tropy it­self can de­crease in a closed sys­tem is a mis­con­cep­tion which has be­come very wide­spread. Di­sor­der can some­times de­crease in a closed sys­tem, but di­s­or­der has noth­ing to do with en­tropy!

Gibbs/​Shan­non en­tropy, which counts our knowl­edge of a sys­tem, can de­crease if an ob­server ex­am­ines the sys­tem and learns some­thing new about it.

This is ex­actly the same as Boltz­mann en­tropy. This is the ori­gin of Maxwell’s de­mon, and it doesn’t vi­o­late the 2nd law.

the Bayesian Se­cond Law (BSL) tells us that this lack of knowl­edge — the amount we would learn on av­er­age by be­ing told the ex­act state of the sys­tem, given that we were us­ing the un-up­dated dis­tri­bu­tion — is always larger at the end of the ex­per­i­ment than at the be­gin­ning (up to cor­rec­tions be­cause the sys­tem may be emit­ting heat)

This is pre­cisely cor­rect and is the proper way to view en­tropy. Ideas similar to this have been float­ing around for quite some time and this work doesn’t seem to be any­thing fun­da­men­tally new. It just seems to be rephras­ing of ex­ist­ing ideas. How­ever if it can help peo­ple un­der­stand en­tropy then I think it’s a quite valuable rephras­ing.

I was think­ing about writ­ing a se­ries of blog posts ex­plain­ing en­tropy in a rigor­ous yet sim­ple way, and got to the draft level be­fore real-world com­mit­ments caught up with me. But if any­one is in­ter­ested and knows about the sub­ject and is will­ing to offer their time to proofread, I’m will­ing to have a go at it again.

• I surely am in­ter­ested!

• Drop me a pm.

• this work doesn’t seem to be any­thing fun­da­men­tally new. It just seems to be rephras­ing of ex­ist­ing ideas. How­ever if it can help peo­ple un­der­stand en­tropy then I think it’s a quite valuable rephras­ing.

Sean Car­roll seems to think oth­er­wise, judg­ing by the ab­stract:

We de­rive a gen­er­al­iza­tion of the Se­cond Law of Ther­mo­dy­nam­ics that uses Bayesian up­dates to ex­plic­itly in­cor­po­rate the effects of a mea­sure­ment of a sys­tem at some point in its evolu­tion.

[...]

We also de­rive re­fined ver­sions of the Se­cond Law that bound the en­tropy in­crease from be­low by a non-nega­tive num­ber, as well as Bayesian ver­sions of the Jarzyn­ski equal­ity.

This seems to im­ply that this is a gen­uine re­search re­sult, not just a di­dac­tic ex­po­si­tion. Do you dis­agree?

• The 2nd law is never vi­o­lated, not even a lit­tle. Un­for­tu­nately the idea that en­tropy it­self can de­crease in a closed sys­tem is a mis­con­cep­tion which has be­come very wide­spread. Di­sor­der can some­times de­crease in a closed sys­tem, but di­s­or­der has noth­ing to do with en­tropy!

Could you elab­o­rate on this fur­ther? Order im­plies reg­u­lar­ity which im­plies in­for­ma­tion that I can burn to ex­tract use­ful work. I think I agree with you, but I’m not sure that I un­der­stand all the im­pli­ca­tions of what I’m agree­ing with.

• A sim­ple ex­am­ple is that in a closed con­tainer filled with gas it’s pos­si­ble for all the gas molecules to spon­ta­neously move to one side of the con­tainer. This tem­porar­ily in­creases the or­der but has noth­ing to do with en­tropy.

• I think you’re ig­nor­ing the differ­ence be­tween the Boltz­mann and Gibbs en­tropy, both here and in your origi­nal com­ment. This is go­ing to be long, so I apol­o­gize in ad­vance.

Gibbs en­tropy is a prop­erty of en­sem­bles, so it doesn’t change when there is a spon­ta­neous fluc­tu­a­tion to­wards or­der of the type you de­scribe. As long as the gross con­straints on the sys­tem re­main the same, the en­sem­ble re­mains the same, so the Gibbs en­tropy doesn’t change. And it is the Gibbs en­tropy that is most straight­for­wardly as­so­ci­ated with the Shan­non en­tropy. If you in­ter­pret the en­sem­ble as a prob­a­bil­ity dis­tri­bu­tion over phase space, then the Gibbs en­tropy of the en­sem­ble is just the Shan­non en­tropy of the dis­tri­bu­tion (ig­nor­ing some ir­rele­vant and anachro­nis­tic con­stant fac­tors). Every­thing you’ve said in your com­ments is perfectly cor­rect, if we’re talk­ing about Gibbs en­tropy.

Boltz­mann en­tropy, on the other hand, is a prop­erty of re­gions of phase space, not of en­sem­bles or dis­tri­bu­tions. The fa­mous Boltz­mann for­mula equates en­tropy with the log­a­r­ithm of the vol­ume of a re­gion in phase space. Now, it’s true that cor­re­spond­ing to ev­ery phase space re­gion there is an en­sem­ble/​dis­tri­bu­tion whose Shan­non en­tropy is iden­ti­cal to the Boltz­mann en­tropy, namely the dis­tri­bu­tion that is uniform in that re­gion and zero el­se­where. But the con­verse isn’t true. If you’re given a generic en­sem­ble or dis­tri­bu­tion over phase space and also some par­ti­tion of phase space into re­gions, it need not be the case that the Shan­non en­tropy of the dis­tri­bu­tion is iden­ti­cal to the Boltz­mann en­tropy of any of the re­gions.

So I don’t think it’s ac­cu­rate to say that Boltz­mann and Shan­non en­tropy are the same con­cept. Gibbs and Shan­non en­tropy are the same, yes, but Boltz­mann en­tropy is a less gen­eral con­cept. Even if you in­ter­pret Boltz­mann en­tropy as a prop­erty of dis­tri­bu­tions, it is only iden­ti­cal to the Shan­non en­tropy for a sub­set of pos­si­ble dis­tri­bu­tions, those that are uniform in some re­gion and zero el­se­where.

As for the ques­tion of whether Boltz­mann en­tropy can de­crease spon­ta­neously in a closed sys­tem—it re­ally de­pends on how you par­ti­tion phase space into Boltz­mann macro-states (which are just re­gions of phase space, as op­posed to Gibbs macro-states, which are en­sem­bles). If you define the re­gions in terms of the gross ex­per­i­men­tal con­straints on the sys­tem (e.g. the vol­ume of the con­tainer, the ex­ter­nal pres­sure, the ex­ter­nal en­ergy func­tion, etc.), then it will in­deed be true that the Boltz­mann en­tropy can’t change with­out some change in the ex­per­i­men­tal con­straints. Triv­ially true, in fact. As long as the con­straints re­main con­stant, the sys­tem re­mains within the same Boltz­mann macro-state, and so the Boltz­mann en­tropy must re­main the same.

How­ever, this wasn’t how Boltz­mann him­self en­vi­sioned the par­ti­tion­ing of phase space. In his origi­nal “count­ing ar­gu­ment” he par­ti­tioned phase space into re­gions based on the col­lec­tive prop­er­ties of the par­ti­cles them­selves, not the ex­ter­nal con­straints. So from his point of view, the par­ti­cles all be­ing scrunched up in one cor­ner of the con­tainer is not the same macro-state as the par­ti­cles be­ing uniformly spread through­out the con­tainer. It is a macro-state (re­gion) of smaller vol­ume, and there­fore of lower Boltz­mann en­tropy. So if you par­ti­tion phase space in this man­ner, the en­tropy of a closed sys­tem can de­crease spon­ta­neously. It’s just enor­mously un­likely. It’s worth not­ing that sub­se­quent work in the Boltz­man­nian tra­di­tion, rang­ing from the Ehren­fests to Pen­rose, has more or less adopted Boltz­mann’s method of delineat­ing macrostates in terms of the col­lec­tive prop­er­ties of the par­ti­cles, rather than the ex­ter­nal con­straints on the sys­tem.

Boltz­mann’s man­ner of talk­ing about en­tropy and macro-states seems nec­es­sary if you want to talk about the en­tropy of the uni­verse as a whole in­creas­ing, which is some­thing Car­roll definitely wants to talk about. The in­crease in the en­tropy of the uni­verse is a con­se­quence of spon­ta­neous changes in the con­figu­ra­tion of its con­stituent par­ti­cles, not a con­se­quence of chang­ing ex­ter­nal con­straints (un­less you count the ex­pan­sion of the uni­verse, but that is not enough to fully ac­count for the change in en­tropy on Car­roll’s view).

• This is go­ing to be a some­what tech­ni­cal re­ply, but here goes any­way.

Boltz­mann en­tropy, on the other hand, is a prop­erty of re­gions of phase space, not of en­sem­bles or dis­tri­bu­tions. The fa­mous Boltz­mann for­mula equates en­tropy with the log­a­r­ithm of the vol­ume of a re­gion in phase space. Now, it’s true that cor­re­spond­ing to ev­ery phase space re­gion there is an en­sem­ble/​dis­tri­bu­tion whose Shan­non en­tropy is iden­ti­cal to the Boltz­mann en­tropy, namely the dis­tri­bu­tion that is uniform in that re­gion and zero el­se­where.

You can­not calcu­late the Shan­non en­tropy of a con­tin­u­ous dis­tri­bu­tion so this doesn’t make sense. How­ever I see what you’re get­ting at here—if we as­sume that all parts of the phase space have equal prob­a­bil­ity of be­ing vis­ited, then the ‘size’ of the phase space can be taken as pro­por­tional to the ‘num­ber’ of microstates (this is stud­ied un­der er­godic the­ory). But to make this ar­gu­ment work for ac­tual phys­i­cal sys­tems where we want to calcu­late real quan­tities from the­o­ret­i­cal con­sid­er­a­tions, the phase space must be ‘dis­cretized’ in some way. A very sim­ple way of do­ing this is the Sackur-Tetrode for­mu­la­tion which dis­cretizes a con­tin­u­ous space based on the Heisen­berg un­cer­tainty prin­ci­ple (‘dis­cretize’ is the best word I can come up with here—what I mean is not list­ing the microstates but in­stead giv­ing the vol­ume of the phase space in terms of some definite el­e­men­tary vol­ume). But there’s a catch here. To be able to use the HUP, you have to for­mu­late the phase space in terms of com­ple­men­tary pa­ram­e­ters. For in­stance, mo­men­tum+po­si­tion, or time+en­ergy.

How­ever, this wasn’t how Boltz­mann him­self en­vi­sioned the par­ti­tion­ing of phase space. In his origi­nal “count­ing ar­gu­ment” he par­ti­tioned phase space into re­gions based on the col­lec­tive prop­er­ties of the par­ti­cles them­selves, not the ex­ter­nal con­straints.

My pre­vi­ous point illus­trates why this naive view is not phys­i­cal—you can’t dis­cretize any kind of sys­tem. With some sys­tems—like a box full of par­ti­cles that can have ar­bi­trary po­si­tion and mo­men­tum—you get in­finite (non-phys­i­cal) val­ues for en­tropy. It’s easy to see why you can now get a fluc­tu­a­tion in en­tropy—in­finity ‘minus’ some num­ber is still in­finity!

I tried re-word­ing this ar­gu­ment sev­eral times but I’m still not satis­fied with my at­tempt at ex­plain­ing it. Nev­er­the­less, this is how it is. Look­ing at en­tropy based on mod­els of col­lec­tive prop­er­ties of par­ti­cles may be in­ter­est­ing the­o­ret­i­cally but it may not always be a phys­i­cally re­al­is­tic way of calcu­lat­ing the en­tropy of the sys­tem. If you go through some­thing like the Sackur-Tetrode way, though, you see that Boltz­mann en­tropy is the same thing as Shan­non en­tropy.

• Boltz­mann’s origi­nal com­bi­na­to­rial ar­gu­ment already pre­sumed a dis­cretiza­tion of phase space, de­rived from a dis­cretiza­tion of sin­gle-molecule phase space, so we don’t need to in­cor­po­rate quan­tum con­sid­er­a­tions to “fix” it. The com­bi­na­torics re­lies on di­vid­ing sin­gle-par­ti­cle state space into tiny dis­crete boxes, then look­ing at the num­ber of differ­ent ways in which par­ti­cles could be dis­tributed among those boxes, and ob­serv­ing that there are more ways for the par­ti­cles to be spread out evenly among the boxes than for them to be clus­tered. Without dis­cretiza­tion the en­tire ar­gu­ment col­lapses, since no more than one par­ti­cle would be able to oc­cupy any par­tic­u­lar “box”, so clus­ter­ing would be im­pos­si­ble.

So Boltz­mann did suc­cess­fully dis­cretize a box full of par­ti­cles with ar­bi­trary po­si­tion and mo­men­tum, and us­ing his dis­cretiza­tion he de­rived (dis­crete ap­prox­i­ma­tions of) the Maxwell-Boltz­mann dis­tri­bu­tion and the Boltz­mann for­mula for en­tropy. And he did all this with­out in­vok­ing (or, in­deed, be­ing aware of) quan­tum con­sid­er­a­tions. So the Sackur-Tetrode route is not a re­quire­ment for a dis­cretized Boltz­mann-es­que ar­gu­ment. I guess you could ar­gue that in the ab­sence of quan­tum con­sid­er­a­tions there is no way to jus­tify the dis­cretiza­tion, but I don’t see why not. The dis­cretiza­tion need not be in­ter­preted as on­tolog­i­cal, emerg­ing from the Uncer­tainty Prin­ci­ple. It could be in­ter­preted as merely episte­molog­i­cal, a re­flec­tion of limits to our abil­ities of ob­ser­va­tion and in­ter­ven­tion.

In­ci­den­tally, none of these deriva­tions re­quire the as­sump­tion of er­god­ic­ity in the sys­tem. The re­sult that the size of a macrostate in phase space is pro­por­tional to the num­ber of microstates emerges purely from the com­bi­na­torics, with no as­sump­tions about the sys­tem’s dy­nam­ics (other than that they are Hamil­to­nian). Er­god­ic­ity, or some­thing like it, is only re­quired to es­tab­lish that the time spent by a sys­tem in a par­tic­u­lar macrostate is pro­por­tional to the size of the macrostate, and that is used to jus­tify prob­a­bil­is­tic claims about the sys­tem, such as the claim that a closed sys­tem ob­served at an ar­bi­trary time is over­whelm­ingly likely to be in the macrostate of max­i­mum Boltz­mann en­tropy.

So ul­ti­mately, I do think the point Car­roll was mak­ing is valid. The Boltz­mann en­tropy—as in, the ac­tual origi­nal quan­tity defined by Boltz­mann and re­fined by the Ehren­fests, not the mod­ified in­ter­pre­ta­tion pro­posed by peo­ple like Jaynes—is dis­tinct from the Gibbs en­tropy. The former can in­crease (or de­crease) in closed sys­tem, the lat­ter can­not.

To put it slightly more tech­ni­cally, the Gibbs en­tropy, be­ing a prop­erty of a dis­tri­bu­tion that evolves ac­cord­ing to Hamil­to­nian laws, is bound to stay con­stant by Liou­ville’s the­o­rem, un­less there is a ge­o­met­ri­cal change in the ac­cessible phase space or we ap­ply some coarse-grain­ing pro­ce­dure. Boltz­mann en­tropy, be­ing a prop­erty of macrostates, not of dis­tri­bu­tions, is not bound by Liou­ville’s the­o­rem. Even if you in­ter­pret the Boltz­mann en­tropy as a prop­erty of a dis­tri­bu­tion, it is not a dis­tri­bu­tion that evolves in a Hamil­to­nian man­ner. It evolves dis­con­tin­u­ously when the sys­tem moves from one Boltz­mann macrostate to the next.

• But if I know that all the gas molecules are in one half of the con­tainer, then I can move a pis­ton for free and then as the gas ex­pands to fill the con­tainer again I can ex­tract use­ful work. It seems like if I know about this in­crease in or­der it definitely con­sti­tutes a de­crease in en­tropy.

• If you know pre­cisely when this in­crease in or­der will oc­cur then your knowl­edge about the sys­tem is nec­es­sar­ily very high and your en­tropy is nec­es­sar­ily very low (prob­a­bly close to zero) to be­gin with.

• I feel like this may be a se­man­tics is­sue. I think that or­der im­plies in­for­ma­tion. To me, say­ing that a sys­tem be­comes more or­dered im­plies that I know about the in­creased or­der some­how. Un­der that con­struc­tion, di­s­or­der (i.e. the ab­sence of de­tectable pat­terns) is a mea­sure of ig­no­rance and di­s­or­der then is closely re­lated to en­tropy. You may be pre­serv­ing a dis­tinc­tion be­tween the map and ter­ri­tory (i.e. be­tween the sys­tem and our knowl­edge of the sys­tem) that I’m ne­glect­ing. I’m not sure which frame­work is more use­ful/​pro­duc­tive.

I think it’s definitely an im­por­tant dis­tinc­tion to be aware of ei­ther way.

• ‘or­der’ is not a well-defined con­cept. One per­son’s or­der is an­other’s chaos. En­tropy, on the other hand, is a well-defined con­cept.

Even though en­tropy de­pends on the in­for­ma­tion you have about the sys­tem, the way that it de­pends on that is not sub­jec­tive, and any two ob­servers with the same amount of in­for­ma­tion about the sys­tem must come up with the ex­act same quan­tity for en­tropy.

All of this might seem counter-in­tu­itive at first but it makes sense when you re­al­ize that En­tropy(sys­tem) isn’t well-defined, but En­tropy(sys­tem, model) is pre­cisely defined. The ‘model’ is what Bayesi­ans would call the prior. It is always there, ei­ther im­plic­itly or ex­plic­itly.

• I’d be in­ter­ested in this.

Also, I have a back­ground in ther­mo­dy­nam­ics (took three courses and TAed one course), but from an en­g­ineer­ing per­spec­tive. I’d be happy to proofread if you want some­one to, though my statis­ti­cal me­chan­ics back­ground is prob­a­bly weaker than yours.

• I’m very sur­prised E.T. Jaynes isn’t cited any­where in the bibliog­ra­phy, given he wrote tons of ar­ti­cles about that kind of things.

• I’m not a physi­cist, but aren’t this and the linked quanta ar­ti­cle on Prof. England’s work bad news? (great filter wise)

If this im­plies self-as­sem­bly is much more com­mon in the uni­verse, then that makes it worse for the lat­ter pro­posed filters (i.e. makes them EDIT higher prob­a­bil­ity)