Both the original post and subsequent comments seem to have some misconceptions regarding the Ap distribution as introduced by Jaynes.
(i) @criticalpoints claims that “way to think about the proposition Ap is as a kind of limit … The proposition Ap can be thought of a shorthand for an infinite collection of evidences.” But as also noted, Jaynes wants Ap to say that “regardless of anything else you may have been told, the probability of A is p.” But in general, this need not involve any limit, or an infinite amount of evidence. For any proposition A, and probability value p, the proposition Ap is something of a formal maneuver, which merely asserts whatever would be minimally required for a rational agent possessing said information to assign a probability of p to proposition A. For a proposition A, and any suitable finite body evidence E, there is a corresponding Ap distribution.
(ii) This may have been a typo, or a result of Jaynes’s very non-standard parenthetical notation used for probabilities, but @criticalpoints claims:
For any proposition A, the probability of A can be found by integrating over our probabilities of {Ap}
p(A)=∫10dpp(Ap)
[emphasis added]. But evaluation of p(A|E) is to be achieved by calculating the expectation of p with respect to an Ap distribution, not just the integral—the integral evaluates to unity, 1∫0dpp(Ap)=1, because the distribution is normalized, while p(A|E)=1∫0dppp(Ap|E)
(iii) @transhumanist_atom_understanderder claims that
The reason nobody else talks about the Ap distribution is because the same concept appears in standard probability expositions as a random variable representing an unknown probability
The point is that Jaynes is actually extending the idea of “probability of a probability” that arises in binomial parameter estimation, or in de Finetti’s theorem for exchangeable sequences, as a way to compress the information in a body of evidence E relevant to some other proposition A. While Jaynes admits that the trick might not always work, if it does, rather than store every last detail of the information asserted in some compound evidentiary statement E, a rational agent would instead only need to store a description of a probability distribution p(Ap|E), which is equivalent to some probability distribution over a parameter p∈[0,1], so might be approximated by, say, some beta distribution or mixture over beta distributions. The mean of the Ap distribution reproduces the probability of A, while the shape of this distribution characterizes how labile the probability for A will tend to be upon acquisition of additional evidence. Bayesian updating can then take place at the level of the Ap distribution, rather than over some vastly complicated Boolean algebra of possible explicit conditioning information.
So Jaynes’ concept of the Ap distribution goes beyond the usual “probability of a probability” introduced in introductory Bayesian texts (such as Hoff or Lee or Sivia) for binomial models. In my opinion, it is a clever construct that does deserve more attention.
For the first part about ”Ap being a formal maneuver”—I don’t disagree with the comment as stated nor with what Jaynes did in a technical sense. But I’m trying to imbue the proposition with a “physical interpretation” when I identify it with an infinite collection of evidences. There is a subtlety with my original statement that I didn’t expand on, but I’ve been thinking about ever since I read the post: “infinitude” is probably best understood as a relative term. Maybe the simplest way to think about this is that, as I understand it, if you condition on two Ap distributions at the same time, you get a “do not compute”—not zero, but “do not compute”. So the Ap proposition only seems to make sense with respect to some subset {E} of all possible propositions. I interpret this subset as being those of “finite” evidence while the Ap’s (and other propositions) somehow stand outside of this finite evidence class. There is also the matter that, in day-to-day life, it doesn’t really seem possible to encounter what to me seems like a “single” piece of evidence that has the dramatic effect of rendering our beliefs “deterministically indeterministic”. Can we really learn something that tells us that there is no more to learn?
Yes, I suspect that there is a typo there, though I’m a bit too lazy to reference the original text to check. It should be that the probability density over Ap is normalized, and their expectation is the probability of A.
This idea of compressing all relevant information of E relevant to A in the object p(Ap|E) is interesting and indeed, it’s perhaps a better articulation of what I find interesting about the Ap distribution than what is conveyed in the main body of the original post. One thread that I want(ed) to tug at a little further is that the Ap distribution seems to lend itself well to the first steps towards something of a dynamical model of probability theory: when you encounter a piece of evidence E, its first-order effect is to change your probability of A, but its second and n-th order effects are to affect your distribution of what future evidence you expect to encounter and how to “interpret” those pieces of evidence—where by “interpret” I mean in what way encountering that piece of evidence shifts your probability of A. This dynamical theory of probability would have folk theorems like “the variance in your A_p distribution must monotonically decrease over time”. These are shower thoughts.
And it’s also interesting perhaps on a more applied/agentic sense in that we often casually talk about “updating” our beliefs, but what does that actually look like in practice? Empirically, we see that we can have evidence in our head that we fail to process (lack of logical omniscience). Maybe something like the Ap distribution could be helpful for understanding this even better.
Both the original post and subsequent comments seem to have some misconceptions regarding the Ap distribution as introduced by Jaynes.
(i) @criticalpoints claims that “way to think about the proposition Ap is as a kind of limit … The proposition Ap can be thought of a shorthand for an infinite collection of evidences.” But as also noted, Jaynes wants Ap to say that “regardless of anything else you may have been told, the probability of A is p.” But in general, this need not involve any limit, or an infinite amount of evidence. For any proposition A, and probability value p, the proposition Ap is something of a formal maneuver, which merely asserts whatever would be minimally required for a rational agent possessing said information to assign a probability of p to proposition A. For a proposition A, and any suitable finite body evidence E, there is a corresponding Ap distribution.
(ii) This may have been a typo, or a result of Jaynes’s very non-standard parenthetical notation used for probabilities, but @criticalpoints claims:
p(A)=∫10dp p(Ap)[emphasis added]. But evaluation of p(A|E) is to be achieved by calculating the expectation of p with respect to an Ap distribution, not just the integral—the integral evaluates to unity, 1∫0dpp(Ap)=1, because the distribution is normalized, while p(A|E)=1∫0dppp(Ap|E)
(iii) @transhumanist_atom_understanderder claims that
The point is that Jaynes is actually extending the idea of “probability of a probability” that arises in binomial parameter estimation, or in de Finetti’s theorem for exchangeable sequences, as a way to compress the information in a body of evidence E relevant to some other proposition A. While Jaynes admits that the trick might not always work, if it does, rather than store every last detail of the information asserted in some compound evidentiary statement E, a rational agent would instead only need to store a description of a probability distribution p(Ap|E), which is equivalent to some probability distribution over a parameter p∈[0,1], so might be approximated by, say, some beta distribution or mixture over beta distributions. The mean of the Ap distribution reproduces the probability of A, while the shape of this distribution characterizes how labile the probability for A will tend to be upon acquisition of additional evidence. Bayesian updating can then take place at the level of the Ap distribution, rather than over some vastly complicated Boolean algebra of possible explicit conditioning information.
So Jaynes’ concept of the Ap distribution goes beyond the usual “probability of a probability” introduced in introductory Bayesian texts (such as Hoff or Lee or Sivia) for binomial models. In my opinion, it is a clever construct that does deserve more attention.
For the first part about ”Ap being a formal maneuver”—I don’t disagree with the comment as stated nor with what Jaynes did in a technical sense. But I’m trying to imbue the proposition with a “physical interpretation” when I identify it with an infinite collection of evidences. There is a subtlety with my original statement that I didn’t expand on, but I’ve been thinking about ever since I read the post: “infinitude” is probably best understood as a relative term. Maybe the simplest way to think about this is that, as I understand it, if you condition on two Ap distributions at the same time, you get a “do not compute”—not zero, but “do not compute”. So the Ap proposition only seems to make sense with respect to some subset {E} of all possible propositions. I interpret this subset as being those of “finite” evidence while the Ap’s (and other propositions) somehow stand outside of this finite evidence class. There is also the matter that, in day-to-day life, it doesn’t really seem possible to encounter what to me seems like a “single” piece of evidence that has the dramatic effect of rendering our beliefs “deterministically indeterministic”. Can we really learn something that tells us that there is no more to learn?
Yes, I suspect that there is a typo there, though I’m a bit too lazy to reference the original text to check. It should be that the probability density over Ap is normalized, and their expectation is the probability of A.
This idea of compressing all relevant information of E relevant to A in the object p(Ap|E) is interesting and indeed, it’s perhaps a better articulation of what I find interesting about the Ap distribution than what is conveyed in the main body of the original post. One thread that I want(ed) to tug at a little further is that the Ap distribution seems to lend itself well to the first steps towards something of a dynamical model of probability theory: when you encounter a piece of evidence E, its first-order effect is to change your probability of A, but its second and n-th order effects are to affect your distribution of what future evidence you expect to encounter and how to “interpret” those pieces of evidence—where by “interpret” I mean in what way encountering that piece of evidence shifts your probability of A. This dynamical theory of probability would have folk theorems like “the variance in your A_p distribution must monotonically decrease over time”. These are shower thoughts.
And it’s also interesting perhaps on a more applied/agentic sense in that we often casually talk about “updating” our beliefs, but what does that actually look like in practice? Empirically, we see that we can have evidence in our head that we fail to process (lack of logical omniscience). Maybe something like the Ap distribution could be helpful for understanding this even better.