I think that’s a good way of phrasing it, except that I would emphasize that these are two different states of knowledge, not necessarily two different states of the world.
I didn’t think it would work out to the maximum entropy distribution even in your first case, so I worked out an example to check:
Suppose we have a three-sided die, that can land on 0, 1 or 2. Then suppose we are told the die was rolled several times, and the average value was 1.5. The maximum entropy distribution is (if my math is correct) probability 0.116 for 0, 0.268 for 1 and 0.616 for 2.
Now suppose we had a prior analogous to Laplace’s Rule: two parameters p0 and p1 for the “true probability” or “bias” of 0 and 1, and uniform probability density 2dp0dp1 for all possible values of these parameters (the region where their sum is less than 1, which has area 1⁄2). Then as the number of cases goes to infinity, the probability each possible set of parameter values assigns to the average being 1.5 goes to 1 if that’s their expected value, and to 0 otherwise. So we can condition on “the true values give an expected value of 1.5”. We get probabilities of 0.125 for 0, 0.25 for 1 and 0.625 for 2.
That is not exactly equal to the maximum entropy distribution, but it’s surprisingly close! Now I’m wondering if there’s a different set of priors that gives the maximum entropy distribution exactly. I really should have worked out an actual numerical example sooner; I had previously thought of this example, assumed it would end up at different values than maxentropy distribution, and didn’t go to the end and notice it ends up actually very close to it.
I think that’s a good way of phrasing it, except that I would emphasize that these are two different states of knowledge, not necessarily two different states of the world.
I didn’t think it would work out to the maximum entropy distribution even in your first case, so I worked out an example to check:
Suppose we have a three-sided die, that can land on 0, 1 or 2. Then suppose we are told the die was rolled several times, and the average value was 1.5. The maximum entropy distribution is (if my math is correct) probability 0.116 for 0, 0.268 for 1 and 0.616 for 2.
Now suppose we had a prior analogous to Laplace’s Rule: two parameters p0 and p1 for the “true probability” or “bias” of 0 and 1, and uniform probability density 2dp0dp1 for all possible values of these parameters (the region where their sum is less than 1, which has area 1⁄2). Then as the number of cases goes to infinity, the probability each possible set of parameter values assigns to the average being 1.5 goes to 1 if that’s their expected value, and to 0 otherwise. So we can condition on “the true values give an expected value of 1.5”. We get probabilities of 0.125 for 0, 0.25 for 1 and 0.625 for 2.
That is not exactly equal to the maximum entropy distribution, but it’s surprisingly close! Now I’m wondering if there’s a different set of priors that gives the maximum entropy distribution exactly. I really should have worked out an actual numerical example sooner; I had previously thought of this example, assumed it would end up at different values than maxentropy distribution, and didn’t go to the end and notice it ends up actually very close to it.