Optimal predictors and propositional calculus

This is a writeup of the stuff Benja and I thought up during the logical uncertainty workshop in May.

An optimal predictor for $(D, μ)$ allows assigning probabilities to logical sentences of the form $x \in D$ but in general it doesn’t allow assigning probabilities to propositional formulae constructed from sentences of that form. Here we outline two approaches to defining such probabilities.

Consider $D$ a language. Define $B_{D} \subseteq {0, 1}^{*}$ to be the set of words encoding propositional formulae whose variables are labeled by elements of ${0, 1}^{*}$ such that the formula evaluates to truth if a variable labeled by $x$ is substituted for the truth value of $x \in D$ . We can think of $B_{D}$ as consisting of expressions like $⌈ 0110 \in D ⌉ \land \neg ⌈ 101 \in D ⌉$ which represent true sentences.

Consider $μ$ a word ensemble and suppose $P$ is an optimal predictor for $(D, μ)$ . It would be nice to show $P$ satisfies something like the “coherence condition”

$P^{k} (ϕ) = P^{k} (ϕ \land ψ) + P (ϕ \land \neg ψ)$

Obviously we cannot expect the exact equality since $P$ is only defined up to similarity relative to $μ$ . Instead, we were able to show that under certain assumptions on $μ$ , we have that $(P^{k} (ϕ) - P^{k} (ϕ \land ψ) - P (ϕ \land \neg ψ))^{2}$ is “negligible on average” in some sense.

To see this define the languages

$D_{+} = {(ϕ, ψ) ∣ ϕ \land ψ \in B_{D}}$ $D_{-} = {(ϕ, ψ) ∣ ϕ \land \neg ψ \in B_{D}}$ $D_{0} = {(ϕ, ψ) ∣ ϕ \in B_{D}}$

All three languages have reductions to $B_{D}$ :

$f_{+} (ϕ, ψ) := ϕ \land ψ$ $f_{-} (ϕ, ψ) := ϕ \land \neg ψ$ $f_{0} (ϕ, ψ) := ϕ$

Assume a word ensemble $ν$ exists s.t. $f_{+}$ , $f_{-}$ and $f_{0}$ are pseudo-invertible reductions of $(D_{+}, ν)$ , $(D_{-}, ν)$ and $(D_{0}, ν)$ to $(B_{D}, μ)$ . Then by Theorem 6.1 of [1], $f_{+}^{- 1} (P)$ is an optimal predictor for $(D_{+}, ν)$ , $f_{-}^{- 1} (P)$ is an optimal predictor for $(D_{-}, ν)$ and $f_{0}^{- 1} (P)$ is an optimal predictor for $(D_{0}, ν)$ . Since $D_{0} = D_{+} ⊔ D_{-}$ , Theorems 5.1 and 4.2 imply that $f_{0}^{- 1} (P) ν \sim f_{+}^{- 1} (P) + f_{-}^{- 1} (P)$ . That is, $E_{ν^{k}} [(P^{k} (ϕ) - P^{k} (ϕ \land ψ) - P (ϕ \land \neg ψ))^{2}]$ is negligible.

Another approach is considering a language $D$ equipped with a binary operation $m : {0, 1}^{*} \times {0, 1}^{*} \to {0, 1}^{*}$ such that $χ_{D} (m (x, y)) = χ_{D} (x) χ_{D} (y)$ . If $μ$ is a word ensemble and $P$ is an optimal predictor for $(D, μ)$ then $P^{k} (m (x, y))$ can be interpreted as the probability of $⌈ x \in D ⌉ \land ⌈ y \in D ⌉$ . The probability of $⌈ x \in D ⌉ \land \neg ⌈ y \in D ⌉$ can be taken to be $η (P^{k} (m (x, y)) - P^{k} (y))$ . Continuing in this manner, it is possible to assign probabilities to all propositional formulae of the sort.

In this case, the coherence condition would hold automatically except for the presence of $η$ . Define

$D_{+} = {(x, y) ∣ m (x, y) \in D}$ $D_{0} = {(x, y) ∣ x \in D}$

We can now apply the same method as above combined with Lemma 4.3 ( $D_{+} \subseteq D_{0}$ ) to show approximate coherence.

It would be interesting to construct concrete examples in which these results are applicable.

[1] “A complexity theoretic approach to logical uncertainty (Draft)”