Bayesian probability theory as extended logic—a new result

I have a new paper that strengthens the case for strong Bayesianism, a.k.a. One Magisterium Bayes. The paper is entitled “From propositional logic to plausible reasoning: a uniqueness theorem.” (The preceding link will be good for a few weeks, after which only the preprint version will be available for free. I couldn’t come up with the $2500 that Elsevier makes you pay to make your paper open-access.)

Some background: E. T. Jaynes took the position that (Bayesian) probability theory is an extension of propositional logic to handle degrees of certainty—and appealed to Cox’s Theorem to argue that probability theory is the only viable such extension, “the unique consistent rules for conducting inference (i.e. plausible reasoning) of any kind.” This position is sometimes called strong Bayesianism. In a nutshell, frequentist statistics is fine for reasoning about frequencies of repeated events, but that’s a very narrow class of questions; most of the time when researchers appeal to statistics, they want to know what they can conclude with what degree of certainty, and that is an epistemic question for which Bayesian statistics is the right tool, according to Cox’s Theorem.

You can find a “guided tour” of Cox’s Theorem here (see “Constructing a logic of plausible inference”). Here’s a very brief summary. We write A | X for “the reasonable credibility” (plausibility) of proposition A when X is known to be true. Here X represents whatever information we have available. We are not at this point assuming that A | X is any sort of probability. A system of plausible reasoning is a set of rules for evaluating A | X. Cox proposed a handful of intuitively-appealing, qualitative requirements for any system of plausible reasoning, and showed that these requirements imply that any such system is just probability theory in disguise. That is, there necessarily exists an order-preserving isomorphism between plausibilities and probabilities such that A | X, after mapping from plausibilities to probabilities, respects the laws of probability.

Here is one (simplified and not 100% accurate) version of the assumptions required to obtain Cox’s result:

  1. A | X is a real number.

  2. (A | X) = (B | X) whenever A and B are logically equivalent; furthermore, (A | X) ≤ (B | X) if B is a tautology (an expression that is logically true, such as (a or not a)).

  3. We can obtain (not A | X) from A | X via some non-increasing function S. That is, (not A | X) = S(A | X).

  4. We can obtain (A and B | X) from (B | X) and (A | B and X) via some continuous function F that is strictly increasing in both arguments: (A and B | X) = F((A | B and X), B | X).

  5. The set of triples (x,y,z) such that x = A|X, y = (B | A and X), and z = (C | A and B and X) for some proposition A, proposition B, proposition C, and state of information X, is dense. Loosely speaking, this means that if you give me any (x’,y’,z’) in the appropriate range, I can find an (x,y,z) of the above form that is arbitrarily close to (x’,y’,z’).

The “guided tour” mentioned above gives detailed rationales for all of these requirements.
Not everyone agrees that these assumptions are reasonable. My paper proposes an alternative set of assumptions that are intended to be less disputable, as every one of them is simply a requirement that some property already true of propositional logic continue to be true in our extended logic for plausible reasoning. Here are the alternative requirements:
  1. If X and Y are logically equivalent, and A and B are logically equivalent assuming X, then (A | X) = (B | Y).

  2. We may define a new propositional symbol s without affecting the plausibility of any proposition that does not mention that symbol. Specifically, if s is a propositional symbol not appearing in A, X, or E, then (A | X) = (A | (s ↔ E) and X).

  3. Adding irrelevant background information does not alter plausibilities. Specifically, if Y is a satisfiable propositional formula that uses no propositional symbol occurring in A or X, then (A | X) = (A | Y and X).

  4. The implication ordering is preserved: if A → B is a logical consequence of X, but B → A is not, then then A | X < B | X; that is, A is strictly less plausible than B, assuming X.

Note that I do not assume that A | X is a real number. Item 4 above assumes only that there is some partial ordering on plausibility values: in some cases we can say that one plausibility is greater than another.
I also explicitly take the state of information X to be a propositional formula: all the background knowledge to which we have access is expressed in the form of logical statements. So, for example, if your background information is that you are tossing a six-sided die, you could express this by letting s1 mean “the die comes up 1,” s2 mean “the die comes up 2,” and so on, and your background information X would be a logical formula stating that exactly one of s1, …, s6 is true, that is,
(s1 or s2 or s3 or s5 or s6) and
not (s1 and s2) and not (s1 and s3) and not (s1 and s4) and
not (s1 and s5) and not (s1 and s6) and not (s2 and s3) and
not (s2 and s4) and not (s2 and s5) and not (s2 and s6) and
not (s3 and s4) and not (s3 and s5) and not (s3 and s6) and
not (s4 and s5) and not (s4 and s6) and not (s5 and s6).
Just like Cox, I then show that there is an order-preserving isomorphism between plausibilities and probabilities that respects the laws of probability.
My result goes further, however, in that it gives actual numeric values for the probabilities. Imagine creating a truth table containing one row for each possible combination of truth values assigned to each atomic proposition appearing in either A or X. Let n be the number of rows in this table for which X evaluates true. Let m be the number of rows in this table for which both A and X evaluate true. If P is the function that maps plausibilities to probabilities, then P(A | X) = m /​ n.
For example, suppose that a and b are atomic propositions (not decomposable in terms of more primitive propositions), and suppose that we only know that at least one of them is true; what then is the probability that a is true? Start by enumerating all possible combinations of truth values for a and b:
  1. a false, b false: (a or b) is false, a is false.

  2. a false, b true : (a or b) is true, a is false.

  3. a true, b false: (a or b) is true, a is true.

  4. a true, b true : (a or b) is true, a is true.

There are 3 cases (2, 3, and 4) in which (a or b) is true, and in 2 of these cases (3 and 4) a is also true. Therefore,
P(a | a or b) = 23.
This concords with the classical definition of probability, which Laplace expressed as
The probability of an event is the ratio of the number of cases favorable to it, to the number of possible cases, when there is nothing to make us believe that one case should occur rather than any other, so that these cases are, for us, equally possible.
This definition fell out of favor, in part because of its apparent circularity. My result validates the classical definition and sharpens it. We can now say that a “possible case” is simply a truth assignment satisfying the premise X. We can simply drop the problematic phrase “these cases are, for us, equally possible.” The phrase “there is nothing to make us believe that one case should occur rather than any other” means that we possess no additional information that, if added to X, would expand by differing multiplicities the rows of the truth table for which X evaluates true.
For more details, see the paper linked above.