The Two-Update Problem: Monotonicity

In his posts (1, 2) on the two-update problem, Abram Demski discussed a problem we see in existing proposals for logical priors. The existing proposals for logical priors work from a base theory , and construct probability distributions , which represent a probability distributions on completions of that theory. The two-update problem is that it is not necessarily the case that . This gives us two updates: One from putting sentences in the base theory, and one by performing a Bayesian update. Here, I want to talk about a weaker requirement for families of logical priors, where we only require that adding consequences of to the base theory does not decrease the probability of .

We say that a family of logical priors is monotonic if whenever for all , then . That is to say, if we only update on assumptions which are logical consequences of , then the probability of should only go up.

(I mentioned this before as one of the desirable properties in this post)

Theorem: The Demski prior is monotonic.

Proof: When sampling for the Demski prior with base theory , for each infinite sequence of sentences sampled, either all the sentences in are accepted as not contradicting previously sampled sentences or not.

In the case where all of the sentences in are accepted, If we were to consider sampling the same infinite sequence of sentences for the Demski prior with base theory , then we would get the same complete theory in the end. Therefore, if we condition on the assumption that the infinite sequence causes all sentences in to be accepted when the base theory is , the probability that is accepted when the base theory is is the same as the probability that is accepted when the base theory is .

On the other hand, if we condition on the assumption that the infinite sequence does not causes all sentences in to be accepted when the base theory is , then the probability that is accepted when the base theory is is 0, while the probability that is accepted when the base theory is is non-negative.

Therefore, the probability that is accepted when the base theory is is less than or equal to the probability that is accepted when the base theory is .

Theorem: The Worst Case Bayes prior is not monotonic

Proof Sketch: Let and be two simple and independent sentences. We will consider a weighting on sentences such that and have very large weight, the four sentences , , , and have much smaller weight, and all other sentences have negligible weight.

will be the empty theory, and will be the theory consisting of just the sentence . Note that , so monotinicity would imply that .

Note that in , and will each be given probability about 12, and the 4 conjunctions will each be given probability about 14.

However, in , and will each be given probability about 12. This is because the only two sentences with large weight are and , by symmetry they should be given the same probability, and if that probability is far from 12, they will have a worse score when is true and is false.

Since is given probability near 12, will also be given probability near 12, since exactly one of and is true. Similarly will get probability near 12. However, this means that will be left with probability near 0, since exactly one of , , and is true.

What happens here is that the probabilities assigned to and will both be slightly greater than 12, so in the case where both are true, the Bayes score will be much larger from the contribution from those sentences, and so maximizing the worst case Bayes score, we trade some of our Bayes score in that world for Bayes score in other worlds by making the probability of small.

The fact that the monotonicity is possible and the Worst Case Bayes prior does not achieve it is the reason I have abandoned the Worst Case Bayes prior and have now thought about it much for the last year.

No comments.