Proof Section to an Introduction to Credal Sets and Infra-Bayes Learnability

This post accompanies An Introduction to Credal Sets and Infra-Bayes Learnability.

Notation

We use to denote the space of probability distributions over a set , which is assumed throughout to be a compact metric space. We use to denote the set of credal sets over

Given and , let

Let denote the space of continuous functions from to

Proof of Lemma 1

Lemma 1: If and are finite, the set of countably infinite histories is a compact metric space under the metric where and is the time of first difference between and

Proof. The space is compact under the discrete topology since it is finite. Therefore, is compact under the product topology by Tychonoff’s theorem. The stated metric induces a topology . By the definition of compactness, it is sufficient to show that all basis elements of are contained in

The basis elements of have the form

for and

Furthermore, sets of the form

are basis elements of where and

Given define . We will verify that if and only if Note that if then by construction, For the other direction, we have that if then since By the contrapositive, if then

Since if and only if for , this proves that and are equal as sets. Thus all basis elements of are contained in and is compact in the metric topology.

Proof of Proposition 1

The following lemma is used both in the proof of Proposition 1 and Proposition 2.

Lemma 2: The set of deterministic policies is first-countable under the product topology.

Proof: Let We aim to show that has a countable neighborhood basis. Consider the collection of open sets in of the form where for finitely many and otherwise There is a bijection between and the set of finite binary sequences, so this collection is countable.

Let be a neighborhood of By the definition of a basis, there exists a basis element for the product topology such that By definition of the product topology, the set can be written as where for finitely many Since an element of is contained in and thus in Therefore, is a countable neighborhood basis.

Proposition 1: Every crisp causal law is continuous with respect to the product topology on and the Hausdorff topology on

Proof: By Lemma 2, is first-countable under the product topology. Consequently, is continuous if given a convergent sequence such that , then Let such a convergent sequence be given.

By definition, there exists a set of environments that generates We will show convergence using the weak topology, so let be a 1-Lipschitz continuous function.

Since given any finite time horizon there exists such that for all and agree up to time Furthermore, the set of destinies can be partitioned into a disjoint union where is a set of destinies that are equal up to time and Note that by construction, if then

Then for every environment

For all By the 1-Lipschitz property of and the construction of

Therefore,

As can be taken arbitrarily large and tends to zero as tends to infinity. The Kantorovich-Rubinstein metric induces the weak topology on so for all Therefore, in the Hausdorff topology.

Proof of Proposition 2 and Corollary 1

Corollary 1 below corresponds to Proposition 5 in this proof section of the original infra-Bayesianism sequence. The proof of that proposition was organized in three phases. Since we expand the ideas in more detail, we have several lemmas; “phase 1” is captured here in Lemma 5, and “phase 2″ and “phase 3” are captured here in Proposition 2. The other lemmas cover prerequisite ideas that are used in the proof.

The following lemma is a special case of Theorem 6.9 of [1]. We provide a simplified proof under the assumption that is a compact metric space using the fact that the set of all Lipschitz functions is dense in the space of continuous functions from to [2].

Lemma 3: Let be a compact metric space, be continuous, and Then is continuous as a function from with the Kantorovich-Rubinstein (KR-) metric to

Proof. Since is a metric space, is first-countable. Thus is continuous if implies . Let and By definition, there exists such that for all

where the supremum is taken over all 1-Lipschitz continuous functions

Suppose is 1-Lipschitz. Then since

Suppose is -Lipschitz. Then is 1-Lipschitz. So, by the first case, there exists such that for all

By linearity of expectation, for all

Suppose is continuous. By the density of the set of all Lipschitz functions in [2], there exists a Lipschitz function such that By the second case, there exists such that for all Applying the triangle inequality, we obtain that for all

Here we have used the fact that and thus This shows that which completes the proof by the remark at the beginning.

Lemma 4: Let be a credal set over a compact metric space Then for all continuous there exists such that .

Proof. By Lemma 3, the map is continuous. By definition, is compact. A continuous function over a compact set achieves a maximum over that set, which proves the result.

The next lemma corresponds to “phase one” of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence.

Lemma 5: Let be a crisp causal law, and let be continuous. Suppose a sequence of policies converges to in the KR-metric on Then

Proof. Let a convergent sequence and be given. We aim to show that there exists such that for all

By Lemma 1 and Lemma 4, for all , there exists such that . Similarly, there exists such that . Then

By definition, for all and This implies

By (2),

By applying (3) similarly, we obtain

These inequalities imply that

We will now consider how to bound each of the expressions in the set on the right hand side.

Since is compact (by Lemma 2 of An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism) and is continuous, is uniformly continuous by the Heine-Cantor theorem. Recall that the metric on is the Chebyshev metric defined by By the definition of uniform continuity, there exists such that implies that

Since , there exists such that for all Then (with steps explained below),

Note that due to the linearity of expectation and the triangle inequality for integrals, which states that for any measure and integrable function By definition, By monotonicity of the Lebesgue integral,

By the fact that is a probability measure and thus

for all

By a similar argument, for all

Returning to equations (1) and (4), we obtain that for all ,

which completes the proof.

The next proposition is a special case of “phase two” and “phase 3″ of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence. The ideas here are the same, although we provide a direct proof.

Proposition 2: Let be a crisp causal law, and let be continuous. Then the function is continuous.

Proof: By Lemma 2, is first-countable, and thus it is sufficient to show that for any convergent sequence By Lemma 5, it is furthermore sufficient to prove that

To that end, let We will show that and

By Lemma 1 and Lemma 4, there exists such that By the continuity of , there exists a convergent sequence such that and for all By assumption, is continuous, and thus Lemma 3 implies that By construction, Then

For the second argument, choose a subsequence such that

Invoking Lemma 1 and Lemma 4 again, for each , choose such that Since is continuous and is closed, there exists a convergent subsequence with a limit point

Then by (1), Lemma 3, and the definition of expectation with respect to a credal set,

Thus which completes the proof.

Corollary 1 then follows from Proposition 2 using a standard continuity argument.

Corollary 1: For all crisp causal laws and for all continuous functions , is a closed, non-empty set.

Proof of Proposition 3

The following proposition corresponds to Proposition 12 in this proof section of the infra-Bayesian sequence. Equation (2) below appears there, and we explain it in more detail here. The remainder of the proof differs from the original version.

Proposition 3: For any non-dogmatic prior over a learnable collection of crisp causal laws , if a family of policies is infra-Bayes optimal with respect to , then learns

Proof: Assume that is learnable. We will proceed by contrapositive and show that if a family of policies does not learn then for any non-dogmatic prior over is not infra-Bayes optimal. By definition, there exists a family of policies such that for all

Equivalently, by the definition of infra-regret, for all

We preliminarily aim to show that

Let be given. Note that for all

Since is countable and , there exists such that Since for all the second sum above is bounded by . We will now bound the first sum.

Let Applying equation (1) for each we can obtain such that for all Let . Then holds simultaneously for all and Thus for proving Equation (2).

Since does not learn there exists such that We will show that this implies which when combined with (2) indicates that is not infra-Bayes optimal with respect to .

With steps explained in the proceeding paragraphs, we have that

The first equality follows from the definition of infra-regret, and the second equality follows from the definition of expectation with respect to a measure on a countable set. The third equality is an application of the Lebesgue Dominating Convergence Theorem using the constant one function as a dominating function. The inequalities follow from the fact that all terms in the sum are positive, and which is true by the assumption that is non-dogmatic.

We will now show that is not infra-Bayes optimal with respect to . There exists such that

Therefore,

By linearity,

By definition,

Simplifying and applying linearity of expectation, we obtain

Therefore, is not an infra-Bayes optimal family of policies.

Acknowledgements

Many thanks Vanessa Kosoy and Marcus Ogren for their constructive feedback on the initial draft.

References

[1] Villani, Cédric, Optimal Transport: Old and New. Springer Berlin, Heidelberg, 2009.
[2] Carothers, N. L., “The Stone-Weierstrass Theorem.” In Real Analysis. Cambridge University Press, 2012.

No comments.