Demystifying the Second Law of Thermodynamics

Eigil Rischel22 Nov 2020 13:52 UTC

10 points

Thermodynamics is really weird. Most people have probably encountered a bad explanation of the basics at some point in school, but probably don’t remember more than

Energy is conserved
Entropy increases
There’s something called the ideal gas law/ideal gas equation.

Energy conservation is not very mysterious. Apart from some weirdness around defining energy in general, it’s just a thing you can prove from whatever laws of motion you’re using.

But entropy is very weird. You’ve heard that it measures “disorder” in some vague sense. Maybe you’ve heard that it’s connected to the Shannon entropy of a probability distribution $H (p) = \sum_x - p (x) ln p (x)$ . Probably the weirdest thing about it is the law it obeys: It’s not conserved, but rather it increases with time. This is more or less the only law like that in physics.

It gets even weirder when you consider that at least classical Newtonian physics is time-symmetric. Roughly speaking, this means if you have a movie of things interacting under the laws of Newton, and you play it backwards, they’re still obeying the laws of Newton. An orbiting moon just looks like it’s orbiting in the other direction, which is perfectly consistent. A stone which is falling towards earth and accelerating looks like it’s flying away from earth and decelerating—exactly as gravity is supposed to do.

But if there’s some “entropy” quality out there that only increases, then that’s obviously impossible! When you played the movie backwards, you’d be able to tell that entropy was decreasing, and if entropy always increases, some law is being violated. So what, is entropy some artefact of quantum mechanics? No, as it turns out. Entropy is an artefact of the fact that you can’t measure all the particles in the universe at once. And the fact that it seems to always increase is a consequence of the fact that matter is stable at large scales.

The points in this post are largely from E.T. Jaynes’ Macroscopic Prediction.

A proof that entropy doesn’t always increase

Let $X$ be the set of states of some physical system. Here I will assume that there is a finite number of states and time advances in discrete steps—there is some function $T : X \to X$ which steps time forward one step. We assume that these dynamics are time-reversible in the weak sense that $T$ is a bijection—every state is the future of exactly one “past” state. Let $S : X \to R$ be some function. Assume $S (x) \leq S (T x)$ - in other words, $S$ can never decrease. Then $S$ is constant, i.e $S (x) = S (T x)$ .

Proof: Assume for contradiction $S (x) < S (T x)$ for some $x$ . Since $X$ is finite, let $\sum_x S (x)$ be the sum of $S$ over all states. Then clearly $\sum_x S (x) = \sum_x S (T x)$ , since $T x$ just ranges over all the $x$ s. But on the other hand, we have $S (x) \leq S (T x)$ for all $x$ , and $S (x) < S (T x)$ in at least one case. So we must have $\sum_x S (x) < \sum_x S (T x)$ - contradiction.

This proof can be generalized to the continuous time and space case without too much trouble, for the types of dynamics that actually show up in physics (using Liouville’s Theorem). The proof above still requires a bounded phase volume (corresponding to the finiteness of $X$ ). To generalize to other situations we need some more assumptions—the easiest thing is to assume that the dynamics are time-reversible in a stronger sense, and that this is compatible with the entropy in some way.

(You can find easy counterexamples in general, e.g. if $X = Z$ and the dynamics are $T (x) = x + 1$ , then obviously we really do have that $S (x) = x$ is increasing. Nothing to do about that.)

Anyways the bounded/finite versions of the theorems do hold for a toy thermodynamic system like particles in a (finite) box—here the phase volume really is bounded.

The true meaning of entropy

Okay, so what the hell is going on? Did your high school physics textbook lie to you about this? Well, yes. But you’re probably never going to observe entropy going down in your life, so you can maybe rest easy.

Let $X$ be the physical system under consideration again. But suppose now that we can’t observe $x \in X$ , but only some “high-level description $p (x) \in Y$ . Maybe $x$ is the total microscopic state of every particle in a cloud of gas—their position and momentum—while $p (x)$ is just the average energy of the particles (roughly corresponding to the temperature). $x$ is called a microstate and $y = p (x)$ is called a macrostate. Then the entropy of $y \in Y$ is $S (y) = ln (p^{- 1} ({y})$ - the logarithm of the number of microstates $x$ where $p (x) = y$ . We say these are the microstates that realize the macrostate $y$ .

The connection with Shannon entropy is now that this is exactly the Shannon entropy of the uniform distribution over $p^{- 1} (y)$ . This is the distribution you should have over microstates if you know nothing except the microstate. In other words, the entropy measures your uncertainty about the microstate given that you know nothing except the macrostate.

There are more sophisticated versions of this definition in general, to account for the fact that

In general, your microstates are probably sets of real numbers, and there are probably infinitely many compatible with the macrostate, so we need a notion of “continuous entropy” (usually called differential entropy, I think)
Your measurement of the macrostate is probably not that certain (but this turns out to matter surprisingly little for thermodynamic systems),

but this is the basic gist.

Why entropy usually goes up

Okay, so why does entropy go up? Because there are more high-entropy states than low-entropy states. That’s what entropy means. If you don’t know anything about what’s gonna happen to $x$ (in reality, you usually understand the dynamics $T$ themselves, but have absolutely no information about $x$ except the macrostate), it’s more likely that it will transfer to a macrostate with a higher number of representatives than to one with a low number of representatives.

This also lets us defuse our paradox from above. In reality, entropy doesn’t go down for literally every microstate $x$ . It’s not true that $S (p (T x)) > S (p (x))$ for all $x$ - I proved that impossible above. What can be true is this: given a certain macrostate, it’s more probable that entropy increases than that it decreases.

We can consider an extreme example where we have two macrostates $L$ and $H$ , corresponding to low and high entropy. Clearly the number of low-entropy states that go to a high-entropy state is exactly the same as the number of high-entropy states that go to a low-entropy state. That’s combinatorics. But the fraction of low-entropy states that go to high-entropy is then necessarily larger than the fraction of high-entropy states that go to low-entropy states.

In other words, $P (H (x_t + 1) | L (x_t)) > P (L (x_t + 1) | H (x_t))$

Why entropy (almost) always goes up

Okay, but that’s a lot weaker than “entropy always increases”! How do we get from here to there? I could say some handwavy stuff here about how the properties of thermodynamic systems mean that the differences in the number of representatives between high-entropy and low-entropy states are massive—and that means the right-hand probability above can’t possibly be non-neglible. And that in general this works out so that entropy is almost guaranteed to increase.

But that’s very unsatisfying. It just happened to work out that way? I have a much more satisfying answer: entropy almost always increases because matter is stable at large scales.

Wait, what? What does that mean?

By “matter is stable at large scales”, I mean that the macroscopic behaviour of matter is predictable only from macroscopic observations. When a bricklayer builds a house, they don’t first go over them with a microscope to make sure the microstate of the brick isn’t going to surprise us later. And as long as we know the temperature and pressure of a gas, we can pretty much predict what will happen if we compress it with a piston.

What this means is that, if $p (x) = p (x^{'})$ , then with extremely high probability, $p (T x) = p (T x^{'})$ . It might not be literally certain, but it’s sure enough.

Now, let’s say we’re in the macrostate $y$ . Then there is some macrostate $y^{'}$ which is extremely likely to be the next one. For very nearly all $x$ so that $p (x) = y$ , we have $p (T x) = y^{'}$ . But this means that $y^{'}$ must have at least that many microstates representing it, since $T$ is a bijection. So the entropy of $y^{'}$ can at most be a tiny bit smaller than the entropy of $y$ - this difference would be as tiny as the fraction of $x$ with $p (T x) \neq y^{'}$ , so we can ignore it.

So unless something super unlikely happens and $p (T x) \neq y^{'}$ , entropy goes up.

By the way, this also explains what goes wrong with time-reversibility, and why in reality, you can easily tell that a video is going backwards. The “highly probably dynamics” $Y \to Y$ , which takes each macrostate the the most probable next state, don’t have to be time-reversible. For instance, let’s return to the two-macrostate system above. Suppose that with 100% certainty, low-entropy states become high-entropy. Let there be $N_L$ low-entropy states and $N_H$ high-entropy states. Then, just because $T$ is a bijection, there must be $N_L$ high-entropy states that become low-entropy. Now if $N_H ≫ N_L$ , then practically all high-entropy states go to other high-entropy states. So $L \mapsto H$ but $H \mapsto H$ .

Of course in reality, if you start with a low-entropy state and watch this unfold for a really long time, you’ll eventually see it become a low-entropy state again. It’s just extremely unlikely to happen in a short amount of time.

Entropy is not exactly your uncertainty about the microstate

The entropy of a given macrostate is the uncertainty about the microstate of an observer who knows only the macrostate. In general, you have more information than this. For example, if the system starts in a low-entropy state, and you let it evolve into a high-entropy state, you know that the system is in one of the very small number of high-entropy states which come from low-entropy states! But since you can only interact with the system on macroscales, this information won’t be useful.

Eigil Rischel22 Nov 2020 13:52 UTC

10 points

9 comments6 min readLW link

World Modeling ET Jaynes

Steven Byrnes 22 Nov 2020 18:55 UTC
4 points
Have you seen Eliezer’s 2008 post on the 2nd law? His perspective matches my own, and I was delighted to see it written up so nicely. (Eliezer might have gotten it from Jaynes? Not sure. I reinvented that wheel, for my part, it’s quite possible that Eliezer did too.) That style of argument convinces me that the 2nd law cannot depend on any empirical claims like “matter is stable at large scales”. It should just depend on conservation of phase space (Liouville’s theorem in classical mechanics, unitarity in quantum mechanics). And it depends on human brains being part of the same universe as everything else, and subject to the same laws.
The entropy of a given macrostate is the uncertainty about the microstate of an observer who knows only the macrostate. In general, you have more information than this.
I basically agree. There’s a nice description here of getting a subregion of phase space that looks like ever-finer filamentary threads spread all around, or as I put it “it often turns out that you wind up with useless information about the microstate, i.e. information that cannot be translated into a “magic recipe” for undoing the apparent disorder … In that case, you might as well just forget that information and accept a higher entropy.”
There are cases where it’s not obvious what the macrostate is, and you need to think more concretely about what exactly you can and can’t do with the microstate information. My example here was a light beam whose polarization is set by a pseudorandom code that changes every nanosecond. Another example would be shining light through a ground-glass diffuser: It looks like a random, high-entropy, diffuse beam … But if you have an exact copy of the original diffuser, and a phase-conjugate mirror, you can unwind the randomness and magically get the original low-entropy beam back.
- Eigil Rischel 23 Nov 2020 9:45 UTC
  3 points
  Parent
  I hadn’t, thanks!
  
  I took the argument about the large-scale “stability” of matter from Jaynes (although I had to think a bit before I felt I understood it, so it’s also possible that I misunderstood it).
  
  I think I basically agree with Eliezer here?
  
  The Second Law of Thermodynamics is actually probabilistic in nature—if you ask about the probability of hot water spontaneously entering the “cold water and electricity” state, the probability does exist, it’s just very small. This doesn’t mean Liouville’s Theorem is violated with small probability; a theorem’s a theorem, after all. It means that if you’re in a great big phase space volume at the start, but you don’t know where, you may assess a tiny little probability of ending up in some particular phase space volume. So far as you know, with infinitesimal probability, this particular glass of hot water may be the kind that spontaneously transforms itself to electrical current and ice cubes. (Neglecting, as usual, quantum effects.)
  
  So the Second Law really is inherently Bayesian. When it comes to any real thermodynamic system, it’s a strictly lawful statement of your beliefs about the system, but only a probabilistic statement about the system itself.
  
  The reason we can be sure that this probability is “infinitesimal” is that macrobehavior is deterministic. We can easily imagine toy systems where entropy shrinks with non-neglible probability (but, of course, still grows /in expectation/). Indeed, if the phase volume of the system is bounded, it will return arbitrarily close to its initial position given enough time, undoing the growth in entropy—the fact that these timescales are much longer than any we care about is an empirical property of the system, not a general consequence of the laws of physics.
  
  To put it another way: if you put an ice cube in a glass of hot water, thermally insulated, it will melt—but after a very long time, the ice cube will coalesce out of the water again. It’s a general theorem that this must be less likely than the opposite—ice cubes melt more frequently than water “demelts” into hot water and ice, because ice cubes in hot water occupies less phase volume. But the ratio between these two can’t be established by this sort of general argument. To establish that water “demelting” is so rare that it may as well be impossible, you have to either look at the specific properties of the water system (high number of particles $\to$ the difference in phase volume is huge), or make the sort of general argument I tried to sketch in the post.
  - Steven Byrnes 23 Nov 2020 11:55 UTC
    5 points
    Parent
    Sure. I think even more interesting than the ratio / frequency argument is the argument that if you check whether the ice cube has coalesced, then that brings you into the system too, and now you can prove that the entropy increase from checking is, in expectation, larger than the entropy decrease from the unlikely chance that you find an ice cube. Repeat many times and the law of large numbers guarantees that this procedure increases entropy. Hence no perpetual motion. Well anyway, that’s the part I like, but I’m not disagreeing with you. :-)
gjm 22 Nov 2020 20:59 UTC
2 points
I find myself not quite satisfied with your argument about “why entropy (almost) always goes up”—it feels as if there’s some sleight of hand going on—but I’m not sure exactly where to locate my dissatisfaction. Let me try to express it...
First of all, although you use the word “stable” to describe the property of matter you’re appealing to, I don’t think that’s right. I think what you’re actually using is much more like “consistent” than “stable”. Consider a standard thought experiment. We have a sealed container with an impermeable partition half way across. We remove all the air from one half. Then we suddenly break or remove the partition, and of course the air pressure rapidly equalizes. The macrostate transition y → y’ is from “container with air at 1atm on the left and vacuum on the right” to “container with air at 0.5atm throughout”; y->y’ is a plausible thing and y’->y is not. But I don’t think this asymmetry has anything to do with stability; it’s not as if y->y’ is a smaller change than y’->y. Rather, the thing you need is that both y and y’ consistently almost always yield y’.
OK, so now we find that most macrostates have almost-perfectly-consistent successors, but less-consistent predecessors. (This is the time-asymmetry we need to get anything like the second law.) E.g., y’ is preceded by y much more often than it is followed by y, even though if we consider all microstates corresponding to y’ exactly the same fraction are preceded and followed by microstates corresponding to y.
Well, why is that? Isn’t it unexpected? It should be (at least if we’re considering what the laws of physics entail, rather than what we find in our everyday experience). So let’s ask: why do things behave more consistently “forwards” than “backwards”?. I think the answer, unfortunately, is: “Because of the second law of thermodynamics” or, kinda-equivalently, “Because the universe has a low-entropy past”.
Consider that partitioned container again. If we picked a present microstate truly at random from all those whose corresponding macrostate is “0.5atm air in the whole container”, the probability is vanishingly small that we’d find a microstate like the real one whose recent past has all the air collected into half of the container. So however did it come about that we do now have such a microstate? As a result of the lower-entropy past. How will it come about in the future that we find ourselves in such microstates? As a result of the low-entropy present, which will then be a lower-entropy past.
So it seems as if the “stability” (by which, again, I think you mean “consistency of behaviour”) of matter only explains the thermodynamic arrow of time if we’re allowed to assume a lower-entropy past. And if we can make that assumption, we can get the Second Law without needing to appeal explicitly to the consistent behaviour of matter.
(On the other hand, considered as a way of understanding how the low-entropy past of the universe leads to the Second Law, I think I like it.)
Cleo Nardo 11 Feb 2023 2:11 UTC
1 point
This argument does not work!
If you don’t know anything about what’s gonna happen to $x$ it’s more likely that it will transfer to a macrostate with a higher number of representatives than to one with a low number of representatives.
Does this explain time-asymmetry? Well, let’s check the time-reverse of your sentence.
If you don’t know anything about what happened to $x$ it’s more likely that it transferred from a macrostate with a higher number of representatives than f rom one with a low number of representatives.
Hmm… This sentence seems just as plausible, but it implies that entropy increases into the past.
This is called Loschmidt’s Paradox. You can read about it here.
But clearly entropy does increase, so how do we explain it?
Well, if the dynamics are time-symmetric, and the boundary conditions are time-symmetric, then the behaviour of the system is also time-symmetric. (Symmetry in, symmetry out!) Therefore, if the behaviour of the system is time-asymmetric, and the dynamics are time-symmetric, then the boundary conditions must be time-asymmetric.
In other words, at one boundary of space-time, the universe is low-entropy, and at the other boundary of space-time, the universe is high-entropy. For convenience, we call the low-entropy boundary the “start” and the high-entropy boundary the “end”. And we say one event is “earlier” than another if and only if it occurs closer to the low-entropy boundary.
adamShimi 23 Nov 2020 22:00 UTC
1 point
I feel that you don’t mention a logical step between your proof and the point of the “the true meaning of entropy section”. Because you do not make explicit how you use the result that you prove: since S does increase sometimes (and thus is not constant), then by the contraposition of your result, it cannot increase for every state. This becomes clear after rereading the section a couple of times, but I think you can probably make this smoother by explicitly saying it.
Other than that, your explanation makes sense to me, but I’m far from very good at thermodynamics.
Slider 22 Nov 2020 14:45 UTC
1 point
The proof for constancy doens’t compute for me. Sum of S(x) and S(Tx) have different amount of things to sum so it is not obvvious that they are equal. Say we have 2, 2, 2, 2, 2, 10. Sum of S(x) is 20. the four first terms of sum of S(Tx) are 2,2,2,2. The fifth is 10. Then the sixth is ill defined as T(state 6) isn’t really well defined. Even if T is more wider reaching then summing over x and Tx don’t count the same terms (say that the bigger sequence is 0,0,0,0,2,2,2,2,2,10,40,50,60,70) x for every x is 2,2,2,2,2,10 and Tx for every x is 2,2,2,2,10,40). Those sums can clearly be different.
- Eigil Rischel 22 Nov 2020 16:22 UTC
  2 points
  Parent
  This may be poorly explained. The point here is that
  - $T x$ is supposed to be always well-defined. So each state has a definite next state (since X is finite, this means it will eventually cycle around).
  - Since $T$ is well-defined and bijective, each $x$ is $T (x^{'})$ for exactly one $x$ .
  - We’re summing over every $x$ , so each $x$ also appears on the list of $T x$ s (by the previous point), and each $T x$ also appears on the list of $x$ s (since it’s in $X$ )
  E.g. suppose $X = {x_{1}, x_{2}, x_{3}, x_{4}, \dots x_{n}}$ and $T (x_{i}) = x_{i + 1}$ when $i \leq n$ , and $T (x_{n}) = x_{1}$ . Then $\sum_{x} S (x)$ is $S (x_{1}) + S (x_{2}) + \dots + S (x_{n - 1}) + S (x_{n})$ . But $\sum_{x} S (T (x)) = S (T (x_{1})) + \dots + S (T (x_{n})) = S (x_{2}) + S (x_{3}) + \dots + S (x_{n}) + S (x_{1})$ - these are the same number.
  - Slider 22 Nov 2020 18:53 UTC
    1 point
    Parent
    Having implicit closed timelike curves seems highly irregular. In such a setup it is doubtful whether stepping “advances” time.
    That explains that the math works out. T gives each state a future but unintuitive part is that future is guaranteed to be among the events. Most regular scenarios are open towards the future ie have future edges where causation can run away from the region. One would expect for each event to have a cause and an effect but the cause of the pastest event to be outside of the region and the effect of the most future event to be outside of the region.
    Having CTCs probably will not extend to any “types of dynamics that actually show up in physics”