The Two-Update Problem: Monotonicity

In his posts (1, 2) on the two-update problem, Abram Demski discussed a problem we see in existing proposals for logical priors. The existing proposals for logical priors work from a base theory $T$ , and construct probability distributions $P_{T}$ , which represent a probability distributions on completions of that theory. The two-update problem is that it is not necessarily the case that $P_{T} (ϕ) = P_{\emptyset} (ϕ | T)$ . This gives us two updates: One from putting sentences in the base theory, and one by performing a Bayesian update. Here, I want to talk about a weaker requirement for families of logical priors, where we only require that adding consequences of $ϕ$ to the base theory does not decrease the probability of $ϕ$ .

We say that a family of logical priors $P_{T}$ is monotonic if whenever $T ⊢ ϕ \to s$ for all $s \in S$ , then $P_{T} (ϕ) \leq P_{T \cup S} (ϕ)$ . That is to say, if we only update on assumptions which are logical consequences of $ϕ$ , then the probability of $ϕ$ should only go up.

(I mentioned this before as one of the desirable properties in this post)

Theorem: The Demski prior is monotonic.

Proof: When sampling for the Demski prior with base theory $T$ , for each infinite sequence of sentences sampled, either all the sentences in $S$ are accepted as not contradicting previously sampled sentences or not.

In the case where all of the sentences in $S$ are accepted, If we were to consider sampling the same infinite sequence of sentences for the Demski prior with base theory $T \cup S$ , then we would get the same complete theory in the end. Therefore, if we condition on the assumption that the infinite sequence causes all sentences in $S$ to be accepted when the base theory is $T$ , the probability that $ϕ$ is accepted when the base theory is $T$ is the same as the probability that $ϕ$ is accepted when the base theory is $T \cup S$ .

On the other hand, if we condition on the assumption that the infinite sequence does not causes all sentences in $S$ to be accepted when the base theory is $T$ , then the probability that $ϕ$ is accepted when the base theory is $T$ is 0, while the probability that $ϕ$ is accepted when the base theory is $T \cup S$ is non-negative.

Therefore, the probability that $ϕ$ is accepted when the base theory is $T$ is less than or equal to the probability that $ϕ$ is accepted when the base theory is $T \cup S$ . $□$

Theorem: The Worst Case Bayes prior is not monotonic

Proof Sketch: Let $A$ and $B$ be two simple and independent sentences. We will consider a weighting on sentences such that $A$ and $B$ have very large weight, the four sentences $A \land B$ , $A \land \neg B$ , $\neg A \land B$ , and $\neg A \land \neg B$ have much smaller weight, and all other sentences have negligible weight.

$T$ will be the empty theory, and $S$ will be the theory consisting of just the sentence $A \lor B$ . Note that $(A \land B) \to (A \lor B)$ , so monotinicity would imply that $P_{\emptyset} (A \land B) \leq P_{A \lor B} (A \land B)$ .

Note that in $P_{\emptyset}$ , $A$ and $B$ will each be given probability about ¹⁄₂, and the 4 conjunctions will each be given probability about ¹⁄₄.

However, in $P_{A \land B}$ , $A$ and $B$ will each be given probability about ¹⁄₂. This is because the only two sentences with large weight are $A$ and $B$ , by symmetry they should be given the same probability, and if that probability is far from ¹⁄₂, they will have a worse score when $A$ is true and $B$ is false.

Since $A$ is given probability near ¹⁄₂, $\neg A \land B$ will also be given probability near ¹⁄₂, since exactly one of $A$ and $\neg A \land B$ is true. Similarly $A \land \neg B$ will get probability near ¹⁄₂. However, this means that $A w e d g e B$ will be left with probability near 0, since exactly one of $A \land B$ , $A \land \neg B$ , and $\neg A \land B$ is true.

What happens here is that the probabilities assigned to $A$ and $B$ will both be slightly greater than ¹⁄₂, so in the case where both are true, the Bayes score will be much larger from the contribution from those sentences, and so maximizing the worst case Bayes score, we trade some of our Bayes score in that world for Bayes score in other worlds by making the probability of $A w e d g e B$ small. $□$

The fact that the monotonicity is possible and the Worst Case Bayes prior does not achieve it is the reason I have abandoned the Worst Case Bayes prior and have now thought about it much for the last year.