We’ve shown that the probability P[q|X] summarizes all the information in X relevant to q, and throws out as much irrelevant information as possible.
This seems correct.
Lets say two different points in the data configuration space, X_1 and X_2, provide equal evidence for q. Then P[q|X_1] = P[q|X_2]. The two different data possibilities are mapped to the same point in this compressed map. So far so good.
(I assume that I should interpret the object P[q|X] as a function over X, not as a point probability for a specific X.)
First, hopefully this provides some intuition for interpreting a probability P[q|X] as a representation of the information in X relevant to q. In short: probabilities directly represent information. This interpretation makes sense even in the absence of “agents” with “beliefs”, or “independent experiments” repeated infinitely many times. It directly talks about maps matching territories, and the role probability plays, without invoking any of the machinery of frequentist or subjectivist interpretations. That means we can potentially apply it in a broader variety of situations—we can talk about simple mechanical processes which produce “maps” of the world, and the probabilistic calculations embedded in those processes.
I don’t think this works.
The map P[q|X] have gotten rid of all the irrelevant information in the map, but it still contains some information that never came from the map. I.e. P[q|X] is not generated only from the information in X relevant for q.
E.g. from P[q|X] we can get
P[q] = sum_X P[q|X]
i.e. the prior probability of q. And if the prior of q where different P[q|X] would be different too.
The way you can’t (shouldn’t) get rid of priors here, feels similar to how you can’t (shouldn’t) get rid of coordinates in physics. In this analogy, the choice of prior is analogues to the choice of the origin. Your choice of origin is completely subjective (even more so than the prior). Technically you can represent position in a coordinate free way (only relative positions), but no one does it, because doing so destroys other things.
This interpretation makes sense even in the absence of “agents” with “beliefs”, or “independent experiments” repeated infinitely many times. It directly talks about maps matching territories, and the role probability plays, without invoking any of the machinery of frequentist or subjectivist interpretations.
This seems correct.
Lets say two different points in the data configuration space, X_1 and X_2, provide equal evidence for q. Then P[q|X_1] = P[q|X_2]. The two different data possibilities are mapped to the same point in this compressed map. So far so good.
(I assume that I should interpret the object P[q|X] as a function over X, not as a point probability for a specific X.)
I don’t think this works.
The map P[q|X] have gotten rid of all the irrelevant information in the map, but it still contains some information that never came from the map. I.e. P[q|X] is not generated only from the information in X relevant for q.
E.g. from P[q|X] we can get
P[q] = sum_X P[q|X]
i.e. the prior probability of q. And if the prior of q where different P[q|X] would be different too.
The way you can’t (shouldn’t) get rid of priors here, feels similar to how you can’t (shouldn’t) get rid of coordinates in physics. In this analogy, the choice of prior is analogues to the choice of the origin. Your choice of origin is completely subjective (even more so than the prior). Technically you can represent position in a coordinate free way (only relative positions), but no one does it, because doing so destroys other things.
(I’m being maximally critical, because you asked for it)
Yeah, I basically buy that complaint. None of this was intended to get rid of priors, because a correct model shouldn’t get rid of priors.
In that case I’m confused about this statement
What is priors in the absence of something like agents with beliefs?
Frequencies are one example.
Here’s what you wrote:
Do you still agree with yourself?
I’m no longer sure. Thanks!