[Question] Jaynes-Cox Probability: Are plausibilities objective?

Haziq Muhammad12 Aug 2021 14:23 UTC

9 points

Probability & Statistics Mind projection fallacy AI Rationality ET Jaynes

Is the objectivity of plausibility assignments assumed in the Jaynes-Cox formulation of probability theory?

This is what I mean by “the objectivity of plausibility assignments”:

$A$ and $B$ are propositions. $(A | B)$ is the plausibility of $A$ given that $B$ is true and is represented with a real number as a result of our desiderata. Is the quantity $(A | B)$ uniquely determined by $A$ and $B$ ?

If this is the case, is this one of the assumptions that we make (implicitly or explicitly) or can this be derived from our desiderata?

If not, then in what sense is the plausibility of $A$ given $B$ objective?

Thank you

Haziq Muhammad12 Aug 2021 14:23 UTC

9 points

17 comments1 min readLW link

Probability & Statistics Mind projection fallacy AI Rationality ET Jaynes

Dagon 12 Aug 2021 15:49 UTC
5 points
0
Is the quantity $(A | B)$ uniquely determined by $A$ and $B$ ?
No. p(A|B) is determined by (p(B|A) * p(A)) / p(B). All three terms are independent beliefs in plausibility level.

> If not, then in what sense is the plausibility of $A$ given $B$ objective?
The relationship of the different expressions of plausibility is objective. The values are subjective. Much like “2 + 2 = 4” is objective, but whether I actually have two coins in two pockets is subjective (or at least contingent).
- Haziq Muhammad 12 Aug 2021 17:19 UTC
  3 points
  0
  Parent
  Thanks!
  Objective Bayesians say that “if two different people have the same information, B, then they will assign the same plausibility $(A | B)$ ”, right? If they didn’t say this, wouldn’t they just be subjective Bayesians?
  So how is this possible without the plausibility $(A | B)$ being uniquely determined by $A$ and $B$ ?
  - Dagon 12 Aug 2021 18:38 UTC
    3 points
    0
    Parent
    If two different bayeseans have the same priors and the same evidence, they will agree. If they have mutual knowledge of their rationality and common priors, their posteriors will converge. Neither of these is the same as “having the same information B” when the item in question is A|B (setting B to 1, so any prior for B is irrelevant).
    - TAG 12 Aug 2021 19:20 UTC
      2 points
      0
      Parent
      
      two different bayeseans have the same priors and the same evidence, they will agree
      
      And the same concept of, and weighting of, evidence.
      - Dagon 12 Aug 2021 19:56 UTC
        3 points
        0
        Parent
        Yes, “same evidence” in this context implies that it is usable in the same bayesean updates in the same way.
        Haziq Muhammad 12 Aug 2021 20:17 UTC
        3 points
        0
        Parent
        If two different bayeseans have the same priors and the same evidence, they will agree.
        Please see this section about Professor Jaynes’ view of priors from Wikipedia: https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors
        Essentially, he says that it is impossible for two people with the same information to have different priors and instead should use the same “objective prior”. Same idea for evidence as well
      - Haziq Muhammad 12 Aug 2021 19:55 UTC
        1 point
        0
        Parent
        Hmmm… I see what you mean but I am not sure if that is the understanding of the Jaynes-Cox school of thought. Please see the picture below ⬇️ - it is from pages 44-45 of Professor Jaynes’ book. Have I misunderstood what Professor Jaynes is saying?
        Dagon 13 Aug 2021 13:44 UTC
        3 points
        0
        Parent
        It’s easy to get tripped up here, because authors are describing theoretical perfect agents, but saying “people” to sound somewhat accessible. My old intro to physics book started with an assertion that “in this text, we will assume that all elephants are perfectly spherical, frictionless, and uniformly dense”. This was good for calculating orbital mechanics or collisions, but very bad for understanding anything about pachyderms.
        
        1) people NEVER have the same information. They have different experiences, and can only imperfectly communicate those experiences with each other. They don’t actually do bayesean updates—there’s a bunch of heuristics and summaries that go on in our cognition.
        
        2) Hypotheses about universal common priors are pretty shaky. Selection bias in the universe of considered options is just one way that what you probably think of as “prior” is actually a posterior belief from very early learning.
        Haziq Muhammad 13 Aug 2021 14:51 UTC
        1 point
        0
        Parent
        Ahhh… that makes a lot of sense -Thank you! A couple of things that I still find a bit confusing:
        ‘It’s easy to get tripped up here, because authors are describing uniformly-dense spherical objects, but calling them “elephants” to make it sound more accessible.’ - So what is the difference between objective Bayesianism and subjective Bayesianism? And do you have any references to show that what you describe is the view of the objective Bayesian school of thought? Although your explanation makes a lot of sense, it does seems to contradict the obvious meaning of the text that I quoted above, which is the bible of objective Bayesianism, so I would appreciate some references that show that the author is actually ‘describing uniformly-dense spherical objects, but calling them “elephants” to make it sound more accessible.’
        Professor Jaynes says “It is ‘objectivity’ in this sense that is needed for a scientifically respectable theory of inference.”—How can scientists make claims like “everyone should prefer hypothesis 1 over hypothesis 2 because of the evidence” when they can only talk about the plausibility of the hypotheses given the information that they have which is obviously different to the information that everyone else has? Does every individual have to verify the claims of scientists independently given their own information?
        ‘Hypotheses about universal common priors are pretty shaky.’ - Are you saying that “a priori” probability distributions don’t exist? This seems to contradict the objective Bayesian viewpoint (please see the quotation below ⬇️ from the Wikipedia page on Uninformative priors)
        Some attempts have been made at finding a priori probabilities, i.e. probability distributions in some sense logically required by the nature of one’s state of uncertainty; these are a subject of philosophical controversy, with Bayesians being roughly divided into two schools: “objective Bayesians”, who believe such priors exist in many useful situations, and “subjective Bayesians” who believe that in practice priors usually represent subjective judgements of opinion that cannot be rigorously justified (Williamson 2010). Perhaps the strongest arguments for objective Bayesianism were given by Edwin T. Jaynes, based mainly on the consequences of symmetries and on the principle of maximum entropy.
        Dagon 13 Aug 2021 15:27 UTC
        3 points
        0
        Parent
        I should probably have stated earlier that I’m more interested in practical and human-level (and medium-term artificial agents, with far more calculating power than humans, but still each a tiny subset of the actual universe), than in academic or theoretical distinctions.
        I am not well-positioned to explain or defend the idea of “objective” probability. There may be such a thing in toy situations, but I haven’t seen any path from micro to macro that makes me believe it’s feasible for anything real.
        Haziq Muhammad 13 Aug 2021 15:45 UTC
        1 point
        0
        Parent
        I see… Thanks a lot for your help anyway. Much appreciated. I’m actually quite new to this forum so I would really appreciate it if someone could point me to the seasoned objective Bayesians here
- Carlos Javier Gil Bellosta 16 May 2022 13:29 UTC
  1 point
  0
  Parent
  I would only like to note that in the conception of probability of Jaynes, Keynes and others, it makes no sense to talk about P(A). They all assume that probabilities do not happen in the void and that you are always “conditioning” on some previous knowledge, B. So they would always write P(A|B) where other authors/schools just write P(A).
Charlie Steiner 13 Aug 2021 18:40 UTC
2 points
0
Sort of?
There is a sense in which Cox’s theorem and related formalizations of probability assume that the plausibility of (A|B) is some function F(A,B). But what they end up showing is not that F is some specific function, just that it must obey certain rules (the laws of probability).
So the objectivity is not in the results of the theorem, it’s more like there’s an assumption of some kind of objectivity (or at least self-consistency) that goes into what formalizers of probability are willing to think of as a “plausibility” in the first place.
- Haziq Muhammad 13 Aug 2021 23:13 UTC
  2 points
  0
  Parent
  Thinking about again, I am not sure if the assumption that such a function F exists is as intuitive as I first thought. We are trying to formalise the intuitive concept of the plausibility of a A given B, i.e. “how true the proposition A is given that we know that the proposition B is true”, and this assumption seems to contradict some of our, at least my, intuitions about plausibility.
  For example, suppose A is some proposition suppose B is a proposition which tells us absolutely nothing about A. Maybe B = “1+2=3” and A = “The earth is not flat 🌎”. Intuitively, B being true tells us nothing about “how true” A so we should not be able to assign a plausibility but the existence of F contradicts this and implies that there is a number x which represents how plausible A is given B.
  Someone might say that the plausibility of A given B is 0.5 applying Professor Jaynes’ principle of transformation-groups/indifference but this is a result of the theory and the existence of a function F s.t. [(A|B) = F(A, B) for any A and B] is an axiom of the theory. You can’t make an assumption, prove something using that assumption, and use your result to justify the assumption right?
  I think the idea of imprecise probability was conceived to solve this exact problem. Instead of talking about the degree of plausibility (i.e. how true something is), imprecise probability talks about the upper bound and the lower bound of the degree of plausibility and represent that with real numbers, $¯ P (A | B)$ and $P - - (A | B)$ respectively. So instead of postulating the existence of a function F s.t. (A|B) = F(A|B), they postulate the existence of functions $¯ ¯¯ ¯ F$ and $F - -$ s.t. $¯ P (A | B) = ¯ F (A, B)$ and $P - - (A | B) = F - - (A, B)$ . I think this is a weaker and philosophically better axiom because it will hold even in the example that I gave above: B = “1+2=3” and A = “The earth is not flat 🌎”. Intuitively even though B tells us nothing about A, we can still assign an upper bound and lower bound of plausibility: 1 (true) and 0 (false) so the new axiom is in line with our intuitions. How much our upper and lower bounds change of the plausibility of A when we condition on B from 1 and 0 respectively tells us how informative B is about A. In the example above ⬆️, B is not informative at all about A because the upper bound is still 1 and the lower bound is still 0.
  Perhaps we can impose objectiveness using some other assumption instead of resorting to imprecise probability?
  David Chapman, the metarationality guy, shares a similar critique here—Probability theory does not extend logic:
  It is also controversial what the (fixed-up) mathematical result means philosophically. Whereas in 1946, when Cox published his Theorem, there clearly was nothing else like probability theory, there are now a variety of related mathematical systems for reasoning about uncertainty.
  These share a common motivation. Probability theory doesn’t work when you have inadequate information. Implicitly, it demands that you always have complete confidence in your probability estimate,⁶like maybe 0.5279371, whereas in fact often you just have no clue. Or you might say “well, it seems the probability is at least 0.3, and not more than 0.8, but any guess more definite than that would be meaningless.”
  So various systems try to capture this intuition: sometimes a specific numerical probability is unavailable, but you can still do some reasoning anyway. These systems coincide with probability theory in cases where you are confident of your probability estimate, and extend it to handle cases where you aren’t.
  Please let me know what you think about this. Thanks!
  - JBlack 14 Aug 2021 4:50 UTC
    2 points
    0
    Parent
    For example, suppose A is some proposition suppose B is a proposition which tells us absolutely nothing about A. Maybe B = “1+2=3” and A = “The earth is not flat 🌎”. Intuitively, B being true tells us nothing about “how true” A so we should not be able to assign a plausibility but the existence of F contradicts this and implies that there is a number x which represents how plausible A is given B.
    I’m not sure what your objection is here. You appear to be using propositions A and B that we believe to be almost certainly true, in which case the plausibility of A given B must be very close to 1 by definition. Maybe you meant to negate B?
    In general though, yes Cox’s result is an existence proof, not a uniqueness one. It means that under given reasonable conditions you can use probability theory, but doesn’t tell you what probabilities to assign to which propositions.
    Jaynes’ extension of this to objective probabilities is much more controversial and does not have anything like a mathematical proof.
    - Haziq Muhammad 14 Aug 2021 6:49 UTC
      1 point
      0
      Parent
      Thanks for the reply!
      I’m not sure what your objection is here. You appear to be using propositions A and B that we believe to be almost certainly true, in which case the plausibility of A given B must be very close to 1 by definition. Maybe you meant to negate B?
      
      Sorry about that. Think it would have been clearer if chose a different A and B. But I believe the argument still holds because by (A|B), I did not mean (A|B, I) where I is the background information (I should have been clearer about this as well as I am going against the convention). So basically, we have are not conditioning on any background information so (A|B) is not close to one by definition.
      In general though, yes Cox’s result is an existence proof, not a uniqueness one. It means that under given reasonable conditions you can use probability theory, but doesn’t tell you what probabilities to assign to which propositions.
      Well what does it prove the existence of? Are you saying that Cox’s theorem implies the existence of a function F such that [the plausibility (A|B) = F(A, B) for any propositions A and B]?
      Jaynes’ extension of this to objective probabilities is much more controversial and does not have anything like a mathematical proof.
      I do not think that Professor Jaynes’ theory necessarily warrants a mathematical proof as we are only trying to formalise our intuitions about plausibilities. My contention is that Professor Jaynes’ theory contradicts our intuitions about plausibilities and hence the necessity of an “imprecise” theory of plausibility which addresses this problem.
      We cannot argue that Cox’s theorem justifies Professor Jaynes’ theory if we start with axioms which are not consistent with our understanding of plausibility. This is what David Chapman argues:
      Cox’s Theorem says that there is no formal system other than probability theory that is very similar to it; so if you want something like that, you’ve only got one choice.⁵ This is irrelevant unless you are considering using one of the dubious alternatives, none of which seems to work as well in practice. - What probability can’t do
      Chapman discusses this in more detail here: Probability theory does not extend logic.
- Haziq Muhammad 13 Aug 2021 18:50 UTC
  1 point
  0
  Parent
  Fantastic answer! Thanks a lot—I really appreciate it

No comments.