I’m not quite sure what the cruxes of our disagreement are yet. So I’m going to write up some more of how I’m thinking about things, which I think might be relevant.
When we decide to model a system and assign its states entropy, there’s a question of what set of states we’re including. Often, we’re modelling part of the real universe. The real universe is in only one state at any given time. But we’re ignorant of a bunch of parts of it (and we’re also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on some stuff we do know about its state, and then we decide to include all states compatible with that information. But this is all just epistemic. There’s no one true set that encompasses all possible states; there’s just states that we’re considering possible.
And then there’s the concept of a macrostate. Maybe we use the word macrostate to refer to the set of all states that we’ve decided are possible. But then maybe we decide to make an observation about the system, one that will reduce the number of possible states consistent with all our observations. Before we make the observation, I think it’s reasonable to say that for every possible outcome of the observation, there’s a macrostate consistent with that outcome. The probability that we will find the system to be in that macrostate is the sum of the probability of its microstates. Thus the macrostate has p<1 before the observation, and p=1 after the observation. This feels pretty normal to me.
We can do this for any property that we can observe, and that’s why I defined a macrostate as, “collections of microstates … connotively characterized by a generalized property of the state”.
I also don’t see why it couldn’t be a set containing a singe state; a set of one thing is still a set. Whether that one thing has probability 1 or not depends on what you’re deciding to do with your uncertainty model.
I think the crux of our disagreement [edit: one of our disagreements] is whether the macrostate we’re discussing can be chosen independently of the “uncertainty model” at all.
When physicists talk about “the entropy of a macrostate”, they always mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over all p’s. [EXPECTATION of -log p aka ENTROPY of the distribution]
They never mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s. [???]
Or:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s, divided by the sum of p over the same p’s. [CONDITIONAL EXPECTATION of -log p given some event]
Or:
There are a bunch of p’s that add up to 1. We want the sum of (-log p) over just some of the p’s, divided by the number of p’s we included. [ARITHMETIC MEAN of -log p over some event]
This also applies to information theorists talking about Shannon entropy.
I think that’s the basic crux here.
This is perhaps confusing because “macrostate” is often claimed to have something to do with a subset of the microstates. So you might be forgiven for thinking “entropy of a macrostate” in statmech means:
For some arbitrary distribution p, consider a separately-chosen “macrostate” A (a set of outcomes). Compute the sum of p × (-log p) over every p whose corresponding outcome is in A, maybe divided by the total probability of A or something.
But in fact this is not what is meant!
Instead, “entropy of a macrostate” means the following:
For some “macrostate”, whatever the hell that means, we construct a probability distribution p. Maybe that’s the macrostate itself, maybe it’s a distribution corresponding to the macrostate, usage varies. But the macrostate determines the distribution, either way. Compute the sum of p × (-log p) over every p.
EDIT: all of this applies even more to negentropy. The “S_max” in that formula is always the entropy of the highest-entropy possible distribution, not anything to do with a single microstate.
I’m not quite sure what the cruxes of our disagreement are yet. So I’m going to write up some more of how I’m thinking about things, which I think might be relevant.
When we decide to model a system and assign its states entropy, there’s a question of what set of states we’re including. Often, we’re modelling part of the real universe. The real universe is in only one state at any given time. But we’re ignorant of a bunch of parts of it (and we’re also ignorant about exactly what states it will evolve into over time). So to do some analysis, we decide on some stuff we do know about its state, and then we decide to include all states compatible with that information. But this is all just epistemic. There’s no one true set that encompasses all possible states; there’s just states that we’re considering possible.
And then there’s the concept of a macrostate. Maybe we use the word macrostate to refer to the set of all states that we’ve decided are possible. But then maybe we decide to make an observation about the system, one that will reduce the number of possible states consistent with all our observations. Before we make the observation, I think it’s reasonable to say that for every possible outcome of the observation, there’s a macrostate consistent with that outcome. The probability that we will find the system to be in that macrostate is the sum of the probability of its microstates. Thus the macrostate has p<1 before the observation, and p=1 after the observation. This feels pretty normal to me.
We can do this for any property that we can observe, and that’s why I defined a macrostate as, “collections of microstates … connotively characterized by a generalized property of the state”.
I also don’t see why it couldn’t be a set containing a singe state; a set of one thing is still a set. Whether that one thing has probability 1 or not depends on what you’re deciding to do with your uncertainty model.
I think the crux of our disagreement [edit: one of our disagreements] is whether the macrostate we’re discussing can be chosen independently of the “uncertainty model” at all.
When physicists talk about “the entropy of a macrostate”, they always mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over all p’s. [EXPECTATION of -log p aka ENTROPY of the distribution]
They never mean something of the form:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s. [???]
Or:
There are a bunch of p’s that add up to 1. We want the sum of p × (-log p) over just some of the p’s, divided by the sum of p over the same p’s. [CONDITIONAL EXPECTATION of -log p given some event]
Or:
There are a bunch of p’s that add up to 1. We want the sum of (-log p) over just some of the p’s, divided by the number of p’s we included. [ARITHMETIC MEAN of -log p over some event]
This also applies to information theorists talking about Shannon entropy.
I think that’s the basic crux here.
This is perhaps confusing because “macrostate” is often claimed to have something to do with a subset of the microstates. So you might be forgiven for thinking “entropy of a macrostate” in statmech means:
For some arbitrary distribution p, consider a separately-chosen “macrostate” A (a set of outcomes). Compute the sum of p × (-log p) over every p whose corresponding outcome is in A, maybe divided by the total probability of A or something.
But in fact this is not what is meant!
Instead, “entropy of a macrostate” means the following:
For some “macrostate”, whatever the hell that means, we construct a probability distribution p. Maybe that’s the macrostate itself, maybe it’s a distribution corresponding to the macrostate, usage varies. But the macrostate determines the distribution, either way. Compute the sum of p × (-log p) over every p.
EDIT: all of this applies even more to negentropy. The “S_max” in that formula is always the entropy of the highest-entropy possible distribution, not anything to do with a single microstate.