Late comment here, but I really liked this post and want to make sure I’ve fully understood it. In particular there’s a claim near the end which says: if H(X) is not fixed, then we can build equivalent models M′1, M′2 for which it is fixed. I’d like to formalize this claim to make sure I’m 100% clear on what it means. Here’s my attempt at doing that:
For any pair of models M1(θ), M2 where H(X0|M1(θ))≠H(X0|M1(θ′)), there exists a variable X (of which X0 is a subset) and a pair of models M′1(θ), M′2 such that 1) H(X|M′1(θ))=H(X|M′1(θ′)) for any θ, θ′; and 2) the behavior of the system is the same under M′1(θ), M′2 as it was under M1(θ), M2.
To satisfy this claim, we construct our X as the conjunction of X0 and some “extra” component X′0. e.g., X0∈{heads,tails} for a coin flip, X′0∈{1,2,3,4,5,6} for a die roll, and so X=X0X′0∈{(heads,1),(tails,1),(heads,2),...} is the conjunction of the coin flip and the die roll, and the domain of X is the outer product of the coin flip domain and of the die roll domain.
Then we construct our M′1(θ) by imposing 1) P(X0X′0|M′1(θ))=P(X0|M′1(θ))P(X′0|M′1(θ)) (i.e., X0, X′0 are logically independent given M′1(θ) for every θ); and 2) P(X0|M′1(θ))=P(X0|M1(θ)) (i.e., the marginal prob given M′1(θ) equals the original prob under M1(θ)).
Finally we construct M′2 by imposing the analogous 2 conditions that we did for M′1: 1) P(X0X′0|M′2)=P(X0|M′2)P(X′0|M′2) and 2) P(X0|M′2)=P(X0|M2). But we also impose the extra condition 3) P(X′0|M′2)=1|X′0| (assuming finite sets, etc.).
We can always find X, M′1(θ) and M′2 that satisfy the above conditions, and with these choices we end up with H(X|M′1(θ))=H(X|M′1(θ′)) for all θ, θ′ (i.e., H is fixed) and E[−log(P(X|M′2))|M′1(θ)]=E[−log(P(X0|M′2))|M′1(θ)]+constant (i.e., the system retains the same dynamics).
Is this basically right? Or is there something I’ve misunderstood?
Note that for M2, conceptually we don’t need to modify it, we just need to use the original M2 but apply it only to the subcomponents of the new X-variable which correspond to the original X-variable. Alternatively, we can take the approach you do: construct M′2 which has a distribution over the new X, but “doesn’t say anything” about the new components, i.e. the it’s just maxentropic over the new components. This is equivalent to ignoring the new components altogether.
Ah yes, that’s right. Yeah, I just wanted to make this part fully explicit to confirm my understanding. But I agree it’s equivalent to just let M′2 ignore the extra X′0 (or whatever) component.
Late comment here, but I really liked this post and want to make sure I’ve fully understood it. In particular there’s a claim near the end which says: if H(X) is not fixed, then we can build equivalent models M′1, M′2 for which it is fixed. I’d like to formalize this claim to make sure I’m 100% clear on what it means. Here’s my attempt at doing that:
For any pair of models M1(θ), M2 where H(X0|M1(θ))≠H(X0|M1(θ′)), there exists a variable X (of which X0 is a subset) and a pair of models M′1(θ), M′2 such that 1) H(X|M′1(θ))=H(X|M′1(θ′)) for any θ, θ′; and 2) the behavior of the system is the same under M′1(θ), M′2 as it was under M1(θ), M2.
To satisfy this claim, we construct our X as the conjunction of X0 and some “extra” component X′0. e.g., X0∈{heads,tails} for a coin flip, X′0∈{1,2,3,4,5,6} for a die roll, and so X=X0X′0∈{(heads,1),(tails,1),(heads,2),...} is the conjunction of the coin flip and the die roll, and the domain of X is the outer product of the coin flip domain and of the die roll domain.
Then we construct our M′1(θ) by imposing 1) P(X0X′0|M′1(θ))=P(X0|M′1(θ))P(X′0|M′1(θ)) (i.e., X0, X′0 are logically independent given M′1(θ) for every θ); and 2) P(X0|M′1(θ))=P(X0|M1(θ)) (i.e., the marginal prob given M′1(θ) equals the original prob under M1(θ)).
Finally we construct M′2 by imposing the analogous 2 conditions that we did for M′1: 1) P(X0X′0|M′2)=P(X0|M′2)P(X′0|M′2) and 2) P(X0|M′2)=P(X0|M2). But we also impose the extra condition 3) P(X′0|M′2)=1|X′0| (assuming finite sets, etc.).
We can always find X, M′1(θ) and M′2 that satisfy the above conditions, and with these choices we end up with H(X|M′1(θ))=H(X|M′1(θ′)) for all θ, θ′ (i.e., H is fixed) and E[−log(P(X|M′2))|M′1(θ)]=E[−log(P(X0|M′2))|M′1(θ)]+constant (i.e., the system retains the same dynamics).
Is this basically right? Or is there something I’ve misunderstood?
The construction is correct.
Note that for M2, conceptually we don’t need to modify it, we just need to use the original M2 but apply it only to the subcomponents of the new X-variable which correspond to the original X-variable. Alternatively, we can take the approach you do: construct M′2 which has a distribution over the new X, but “doesn’t say anything” about the new components, i.e. the it’s just maxentropic over the new components. This is equivalent to ignoring the new components altogether.
Ah yes, that’s right. Yeah, I just wanted to make this part fully explicit to confirm my understanding. But I agree it’s equivalent to just let M′2 ignore the extra X′0 (or whatever) component.
Thanks very much!