Walkthrough of “Definability of Truth in Probabilistic Logic”

I recently went through the paper Definability of Truth in Probabilistic Logic for a second time. Explaining things to others often helps me solidify my own knowledge, so I’m doing a walkthrough of the paper.

Within, I explain things that I found confusing and expand on sections where the paper is somewhat brief. This is designed to be something that I could send to a copy of myself that has not read the paper, along with the paper, to help my clone learn the concepts as fast as possible. Your mileage may vary: its quite likely that your sticking points differ from my own.

I’ll assume competence in the subject area, including knowledge of Tarski’s theorem and the impossibility of adding a truth predicate to a sufficiently expressive language.

Because this paper is an early draft, I’ve included a few suggestions for edits that would have helped me out. You can find them throughout, or collected at the bottom of the post.

Motivation

Section 1 of the paper is well put together. For posterity, I’ll summarize it here:

  • Given a sufficiently expressive language of first-order logic, you cannot embed that language’s truth predicate into the language itself. For if a language has access to its own truth predicate, it can express the liar’s paradox: G ⇔ True('¬G').

  • Responses to this include:

    • Working with a tower of meta-languages, each containing the previous system’s truth predicate

    • Allowing some sentences must take on a third “undefined” value, instead of true or false

This paper explores a third alternative: assign probabilities to sentences instead of binary (or trinary) truth-values. This is potentially far stronger than a tri-valued logic: it’s all well and good to say that some sentences are “undefined”, but it turns out that many sentences of interest — not just pathological paradoxes — become undefined. A probability function, by contrast, allows you to retain much more information about sentences which cannot be assigned values “true” or “false”.

Preliminaries

Throughout the paper, we’ll fix a language L that we’re working with. It doesn’t matter what language L is, so long as it’s strong enough to perform a Gödel numbering, and that it has terms corresponding to the rational numbers.

We also consider a particular theory T (for example, ZFC) and assume that the rationals have their usual properties in T. Furthermore:

  • 'φ' (φ, in quotes) will denote the Gödel encoding of the sentence φ.

  • Q will denote the set of all rational numbers. I point this out because I lack a script font.

The paper is a little lax when throwing around variable names, especially in second section. Here’s a quick run-down:

  • P refers to both a function and a symbol. I’ll try to be explicit about which is which. In general, the function P operates on sentences, whereas the symbol P is contained in a sentence, and operates on quoted sentences. So P(φ) refers to the output of the function on the sentence φ, and P(‘φ’) refers to the symbol P applied to a Gödel encoding of the sentence φ. This is usually not ambiguous.

  • μ is used for probability measures, and does not tend to be fixed. Remember that a probability measure is defined over a set of objects. It assigns a measure to each object such that the measure of the whole set is 1, while obeying the laws of probability.

  • T generally refers to the theory under consideration (eg ZFC), but there are times where it doesn’t. I’ll point them out.

Probability Predicates

We want a probability function defined on sentences of L. What does that actually mean?

We begin by considering all functions P that map sentences of L onto real numbers. Note that P could be any function from L onto reals. It could be that P takes “x” to 100 and “x or x” to −3. Don’t let the suggestive name ‘P’ fool you: we haven’t yet narrowed down the behavior of functions under consideration.

Clearly, such functions cannot in general be viewed as measuring the “probability” of a sentence. What, precisely, do we mean when we say we want a probability function for our logical language?

Intuitively, we want a P that treats sentences in a “consistent” manner. More formally, we want there to be some underlying probability measure μ over models such that P is “talking about” μ.

Note that we do not require P be a mere probability measure over sentences of L! That would be too weak a claim: such a P could assign 1 to the sentence x ∧ ¬x while being a perfectly good probability measure over sentences, though clearly such a P (which assigns 1 to a contradiction!) is not what we had in mind when we went hunting for a probability function.

Claiming that P obeys a probability measure μ over models of L is a much stronger claim. It states that there is some way μ of assigning probabilities to models such that P(φ) is the probability of the sentence φ according to how μ weights models. For example, such a P is forced to assign 0 to every contradiction, because there are no models that model a contradiction.

Let’s make that explicit: We want P(φ) to be the proportion of all models (weighted by μ) that model φ. In other words, P(φ) = μ({M : M⊧φ}).

We call such a P “coherent”. Coherent P are quite well-behaved: They map all contradictions to 0 (as no model models a contradiction) and all tautologies to 1 (as every model models a tautology). Coherent P also act like probability functions: we know that P(φ)=P(φ∧ψ)+P(φ∧¬ψ) for any φ and ψ, because clearly the models that model φ consist of the models that model both φ and ψ, and the models that model φ but don’t model ψ.

These three properties are necessary for coherent P. Turns out, they’re also sufficient for a coherent P. This may be rather surprising: remember that P is otherwise unrestricted. So the above claim is equivalent to saying that these three axioms are sufficient to make a coherent P:

  1. For all φ and ψ, P(φ)=P(φ∧ψ)+P(φ∧¬ψ)

  2. For all tautologies φ, P(φ)=1

  3. For all contradictions φ, P(φ)=0

The paper then seems to make the assumption that these axioms constrain the range of P to [0, 1] (or, equivalently, that these axioms are sufficient to guarantee that P maps logically equivalent sentences to the same value). This is not obvious to me, and may in fact be incorrect. (See discussion, below.)
But *assuming* that the range of P is [0, 1], the rest of the proof is fairly simple. For formal details, see the paper. I will only sketch the proof.

We’re going to give a method for constructing theories. In this section, T refers to the constructed theory, NOT the T that we’re holding fixed. Also, we’re going to be assessing the probability of the theory under construction, using syntax like P(Ti). This is just the probability of the sentence constructed by conjuncting all sentences in the theory Ti. We can always assess this because each Ti will be finite.

Fix an enumeration of all sentences φi. Set T0 to be the empty set.

Note: The paper claims that “By axiom 3, P(T0)=1”. I believe this is a typo: first of all, it should read “By axiom 2″. Second of all, it’s not obvious to me that P must assign a probability to the empty sentence, nor that it should be 1. This is, however, easily worked around by defining P(φ|Ti) to be P(φ) if i=0. For all other Ti, P(φ|Ti) is defined in the usual way.

Now iterate sentences, considering only sentences independent of the theory Ti built so far. For each independent sentence φj, set Tj=Ti∪{φj} with probability P(φj|Ti) and Tj=Ti∪{¬φj} with probability P(¬φj|Ti).

Define T=∪Ti, this is a complete consistent theory. At each step i along the way, the probability that the sequence derives φ is P(φ|Ti). Thus, the sequence of Ti is a martingale. Because T is complete and consistent, P(φ|Ti) stabilizes at either 0 or 1. Thus, the probability that any given T generated by this procedure derives φ is P(φ|T0), which is just P(φ).

Note: Briefly, a martingale is a sequence where the expectation of the next value is equal to the present observed value. I’m not entirely clear on why you also need the fact that P(φ|Ti) stabalizes at either 0 or 1 before concluding that T⊢φ with probability P(φ), due to inexperience with martingales. Regardless, the point is that given a probability function P over sentences, you can generate theories in such a way that the resulting theory models φ with probability P(φ), and this is not too surprising.

We have just given a process for constructing theories at random. This process defines a probability measure over complete consistent theories: every complete consistent theory has a particular probability of being generated by this process.

The chance that the T coming out of this process derives φ is P(φ). In other words, we have defined a μ such that μ({T : T⊢φ})=P(φ). By the completeness theorem, each complete consistent theory has a model, so this process also defines a measure μ over models such that P(φ)=μ({M:M⊧φ}). This is exactly what we wanted.

We have now showed that any P following the above axioms is coherent in that it acts as if there is an underlying probability measure over models of L such that P(φ) measures the probability of selecting a model (weighted by μ) that models φ.

Reflective Probability

Let’s go back to considering our fixed theory T. Our goal is to allow T to “talk about” the “subjective probability” of its sentences: in other words, we want to extend the language L to include a coherent probability function P.

We can’t include just any P, obviously. Specifically, we want to embed a probability function that talks about the probability of sentences of T: In other words, it must map all sentences derived by T to probability 1, and then assign probability values to sentences not derived by T. (Otherwise, it clearly isn’t talking about T.) It’s easy to see that probability functions that assign probability 1 to all sentences of T correspond to probability functions derived from some probability measure over models of T.

Pick a coherent probability function P that assigns probability 1 to all sentences of T. We now want to extend the language L to include a symbol P which is intended to represent the function P. Call the extended language L’.

It’s not enough to just shove P into L and call it a day. We know that P assigns “sane” probabilities to sentences of L, but we have no idea how P reacts to sentences of L’: the fact that P was sane when it was talking about L does not mean that P will remain sane when it can start referring to itself.

For example, imagine a probability function P such that P(φ)=1, but P(P('φ')=1)=0. This is legal, so long as P is coherent and P(φ)=1 is consistent with T. However, it’s clearly not the probability function we’re looking for, for it lies about itself: we would like the probability function to interpret the symbol P as the function P itself.

We could simply require this property, and consider only P where

∀φ∈L'. ∀a,b∈Q. a<P(φ)<b ⇔ P(a<P('φ')<b)=1

This translates to “whenever the function P says that the probability of φ is between rational numbers a and b, the function P also says that the sentence a < P('φ') < b has probability 1”. In other words, this narrows our search down to functions P that treat the symbol P exactly as the function P itself acts.

Unfortunately, no such P exists. For imagine that there is such a P, and consider the sentence G ⇔ P('G')<1. Then

P(G)<1 ⇔ P(P('G')<1)=1 ⇔ P(G)=1

In other words, P(G)<1 ⇔ P(G)=1. Because P is coherent and has range [0, 1], this is a contradiction: no such P exists.

I’ll call this the “strong reflective consistency” requirement, and as shown above it is unsatisfiable for exactly the same reason that we can’t define a predicate True such that ∀φ∈L'. True(φ) ⇔ True(True('φ')).

So we must weaken our requirements. Fortunately, it turns out we don’t have to weaken them very much. Instead of requiring that P can talk about itself exactly, we allow P to talk about itself with arbitrarily small (infinitesimal) error. In other words, we require

∀φ∈L'. ∀a,b∈Q. a<P(φ)<b ⇒ P(a<P('φ')<b)=1

If there is such a P, we’re in business: While sentences of the language L’ cannot precisely assert the probability of a sentence, they can do so with arbitrarily small error. Before we discuss whether such a P exists, let’s make a few notes and see an example.

First, the above (weakened) “reflection principle” above implies the following:

∀φ∈L'. ∀a,b∈Q. ¬(a≤P('φ')≤b) ⇒ P(a≤P('φ')≤b)=0

Because if P(‘φ’) is not in [a, b] then either P(P('φ')<a)=1 or P(P('φ')>b)=1. This can be rephrased as

∀φ∈L'. ∀a,b∈Q. P(a≤P('φ')≤b)>0 ⇒ a≤P('φ')≤b

which is the “other direction” of our weak reflection principle.

Second, an example: Consider the sentence G⇔P('G')<p for some p∈(0,1]. A reflectively consistent P must assign P(G)=p: If it assigns P(G)>p then P(P('G')>p)=1 so P(G)=0 (contradiction) and if it assigns P(G)<p then P(P('G')<p)=1 so P(G)=1 (contradiction). If we required the “strong” reflection principle, then P could not exist, because P(G)=p would imply P(P('G')<p)=0 and thus P(G)=0, a contradiction. But with the weak reflection principle, no such contradiction can follow, because P(G)=p does not force P(P('G')<p)=0: the symbol P is “uncertain” about the true value of the function P, and cannot prove that P(‘G’) is precisely p.

Models of L’ with a reflectively consistent P will be able to show that P(G)∈[p-ε,p+ε] for arbitrarily small ε, but cannot prove that P(G)=p. This is the mechanism by which consistency is maintained.

Finding P

The question is, are there (weakly) reflectively consistent P? We know that there are not strongly reflectively consistent P, just as there are not consistent truth predicates. Before we can get excited about reflectively consistent P, we have to prove that there are some.

I lack the knowledge to fully understand this proof, as it uses theorems from topology that I do not yet understand. I can, however, give you a sketch:

Consider the set A of coherent probability distributions over L’ that assign probability 1 to T. View it as a subset of the space [0,1]^L’. A is convex, closed, and (by Tychonoff’s theorem) compact. A is also non-empty, because there is at least one model of T: we can make a probability distribution that assigns 1 to M and 0 to any other model.

Given any P, we can construct the set of axioms that describe that P. For every φ, we add every sentence a<P('φ')<b for every rational interval (a, b) which does indeed contain P(φ). For each P, call this set Rp. We say that a probability function “obeys” Rp if it assigns probability 1 to every sentence in Rp. We need to show that there is some P with obeys its own reflexivity axioms.

We can define a function f : A → Powerset(A) such that f(P’) = {P∈A : P(Rp’)=1}. In other words, f takes probability functions P’ to the set of probability functions that obey the reflectivity axioms of P’.

Each f(P) is convex and non-empty by the same arguments as above. We show that f has a closed graph and apply Kukatani’s fixed point theorem to show that f has a fixed point. This guarantees the existence of some P such that P∈f(P), which means P obeys its own reflexivity axioms.

Thus, there exists a reflectively consistent P.

Note: I don’t understand Kukatani’s fixed point theroem. I need to brush up on my topology. In the meantime, the point is that we can construct a function from probability functions P onto the set of probability functions obeying the reflectivity axioms of P, and that this function has a fixed point.

Since f in general has a fixed point, we can say in general that reflectively consistent P do exist.

Knowing your limits

Section 3.3 is fairly clear and worth a read. For posterity, I’ll summarize it below.

Note that a reflectively consistent P does not necessarily “know” it is reflectively consistent. Such a P is reflectively consistent, but it may not claim to be. In fact, if a reflectively consistent P assigns probability 1 to the general statement of its own reflective consistency, a contradiction can be derived.

We can weaken the reflection principle further, but it is not yet known whether there is a version that both captures the interesting behavior of reflection and which is both true of P and asserted in P.

I’ll close with the closing quote of the paper:

[this] work shows that the obstructions presented by the liar’s paradox can be overcome by tolerating an infinitesimal error, and that Tarski’s result on the undefinability of truth is in some sense an artifact of the infinite precision demanded by reasoning about complete certainty.

Discussion

I don’t have much to add at this time. I clearly need to read up on topology, and am doing so. The result is very interesting, and I’m looking forward to playing around with other reflection principles.

Next up is a walkthrough of Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.

Compiled Notes

  1. p3. I, like John Baez here, originally thought that axiom 3 was unnecessary. With a little help from Daniel Dewey, I realized that P is not restricted to [0, 1], nor even restricted to map logically equivalent sentences to the same probability. Now I’m not convinced that axioms 1, 2, and 3 are sufficient to force P to act like a probability function. If they are, I think the paper should either show that any P following the three axioms has range [0,1] or that it maps logically equivalent sentences to the same value. (There’s some discussion about this in the comments.)

  2. p3. It’s weird to “fix a theory T” at the beginning of section 2.1, and then “Define T=∪Ti” on p3. It’s not particularly confusing in context, but it irks the programmer in me.

  3. p3. I’m pretty sure that “By axiom 3, P(T0)=1” is a typo. Furthermore, it’s not clear to me that we should treat the empty set like a tautological sentence. I would have appreciated a sentence after “Let T0=∅” to the effect of “P(Ti) is the value of P on the conjunction of all Ti, which we can do because each Ti is finite”, with special treatment given to the case where Ti is empty.

  4. p4. I misparsed “which contradicts P(G)∈[0,1]” on the first pass. I took Range(P)=[0,1] as background knowledge and thought the claim should have been “so P(G)∈[0,1) and P(G)=1, a contradiction”. I concluded there was a typo. This may have been a personal fluke. The meaning was clear, regardless.