This post accompanies An Introduction to Credal Sets and Infra-Bayes Learnability.

Notation

We use $Δ X$ to denote the space of probability distributions over a set $X$ , which is assumed throughout to be a compact metric space. We use $□ X$ to denote the set of credal sets over $X .$

Given $f : X \to R$ and $m \in Δ X$ , let $m (f) := E_{m} [f] .$

Let $C (X, Y)$ denote the space of continuous functions from $X$ to $Y .$

Proof of Lemma 1

Lemma 1: If $A$ and $O$ are finite, the set of countably infinite histories $(A \times O)^{ω}$ is a compact metric space under the metric $d (h, h^{'}) = γ^{t (h, h^{'})}$ where $γ \in (0, 1)$ and $t (h, h^{'})$ is the time of first difference between $h$ and $h^{'} .$

Proof. The space $A \times O$ is compact under the discrete topology since it is finite. Therefore, $(A \times O)^{ω}$ is compact under the product topology $P$ by Tychonoff’s theorem. The stated metric induces a topology $M$ . By the definition of compactness, it is sufficient to show that all basis elements of $M$ are contained in $P .$

The basis elements of $M$ have the form

B_{δ} (h) := {h^{'} \in (A \times O)^{ω} : d (h, h^{'}) < δ}

for $h \in (A \times O)^{ω}$ and $δ \in (0, 1] .$

Furthermore, sets of the form

U_{h, N} := N \prod i = 0 {h_{i}} \times \infty \prod i = N + 1 (A \times O)

are basis elements of $P$ where $h_{i} \in A \times O$ and $N \in N .$

Given $δ \in (0, 1],$ define $N (δ) = max {n \in N : γ^{n} \geq δ}$ . We will verify that $γ^{t (h, h^{'})} < δ$ if and only if $d (h, h^{'}) \geq N (δ) + 1.$ Note that if $d (h, h^{'}) \geq N (δ) + 1 > N (δ),$ then by construction, $γ^{t (h, h^{'})} < δ .$ For the other direction, we have that if $d (h, h^{'}) \leq N (δ),$ then since $γ \in (0, 1),$ $γ^{t (h, h^{'})} \geq γ^{N (δ)} \geq δ .$ By the contrapositive, if $γ^{t (h, h^{'})} < δ,$ then $d (h, h^{'}) \geq N (δ) + 1.$

Since $d (h, h^{'}) \geq N (δ) + 1$ if and only if $h_{i} = h_{i}^{'}$ for $0 \leq i \leq N (δ)$ , this proves that $U_{h, N (δ)}$ and $B_{δ} (h)$ are equal as sets. Thus all basis elements of $M$ are contained in $P,$ and $(A \times O)^{ω}$ is compact in the metric topology. $□$

Proof of Proposition 1

The following lemma is used both in the proof of Proposition 1 and Proposition 2.

Lemma 2: The set of deterministic policies $Π$ is first-countable under the product topology.

Proof: Let $π \in Π .$ We aim to show that $π$ has a countable neighborhood basis. Consider the collection $B$ of open sets in $Π$ of the form $\prod_{h_{i} \in (A \times O)^{ω}} U_{h_{i}}$ where $U_{h_{i}} = π (h_{i})$ for finitely many $h_{i}$ and otherwise $U_{h_{i}} = A .$ There is a bijection between $B$ and the set of finite binary sequences, so this collection is countable.

Let $N$ be a neighborhood of $π .$ By the definition of a basis, there exists a basis element $V$ for the product topology such that $x \in V \subset N .$ By definition of the product topology, the set $V$ can be written as $\prod_{h_{i} \in (A \times O)^{ω}} V_{h_{i}}$ where $V_{h_{i}} \neq A$ for finitely many $h_{i} .$ Since $x \in V,$ an element of $B$ is contained in $V$ and thus in $N .$ Therefore, $B$ is a countable neighborhood basis. $□$

Proposition 1: Every crisp causal law $Λ : Π \to □ (A \times O)^{ω}$ is continuous with respect to the product topology on $Π$ and the Hausdorff topology on $□ (A \times O)^{ω} .$

Proof: By Lemma 2, $Π$ is first-countable under the product topology. Consequently, $Λ : Π \to □ (A \times O)^{ω}$ is continuous if given a convergent sequence ${π_{n}}_{n \in N}$ such that $π_{n} \to π$ , then $Λ (π_{n}) \to Λ (π) .$ Let such a convergent sequence ${π_{n}}_{n \in N}$ be given.

By definition, there exists a set of environments $E$ that generates $Λ .$ We will show convergence using the weak topology, so let $g : (A \times O)^{ω} \to [0, 1]$ be a 1-Lipschitz continuous function.

Since $π_{n} \to π,$ given any finite time horizon $T < \infty,$ there exists $N$ such that for all $n \geq N,$ $π_{n}$ and $π$ agree up to time $T .$ Furthermore, the set of destinies $(A \times O)^{ω}$ can be partitioned into a disjoint union $⨿_{j = 0}^{J (T)} D_{j}$ where $D_{j}$ is a set of destinies that are equal up to time $T$ and $J (T) < \infty .$ Note that by construction, if $h, h^{'} \in D_{j},$ then $d (h, h^{'}) \leq γ^{T} .$

Then for every environment $μ \in E,$

| E_{μ^{π_{n}}} g - E_{μ^{π}} g |

= | J (T) \sum j = 0 \int_{D_{j}} g d μ^{π_{n}} - \int_{D_{j}} g d μ^{π} |

\leq J (T) \sum j = 0 | \int_{D_{j}} g d μ^{π_{n}} - \int_{D_{j}} g d μ^{π} |

\leq J (T) \sum j = 0 | \int_{D_{j}} sup h \in D_{j} g (h) d μ^{π_{n}} - \int_{D_{j}} inf h \in D_{j} g (h) d μ^{π} |

= J (T) \sum j = 0 | sup h \in D_{j} g (h) μ^{π_{n}} (D_{j}) - inf h \in D_{j} g (h) μ^{π} (D_{j}) | .

For all $n \geq N,$ $μ^{π_{n}} (D_{j}) = μ^{π} (D_{j}) .$ By the 1-Lipschitz property of $g$ and the construction of $D_{j},$

| sup h \in D_{j} g (h) - inf h \in D_{j} g (h) | \leq sup h, h^{'} \in D_{j} d (h, h^{'}) \leq γ^{T} .

Therefore,

J (T) \sum j = 0 | sup h \in D_{j} g (h) μ^{π_{n}} (D_{j}) - inf h \in D_{j} g (h) μ^{π} (D_{j}) |

\leq J (T) \sum j = 0 γ^{T} μ^{π} (D_{j})

= γ^{T} .

As $T$ can be taken arbitrarily large and $γ \in (0, 1),$ $| E_{μ^{π_{n}}} g - E_{μ^{π}} g |$ tends to zero as $n$ tends to infinity. The Kantorovich-Rubinstein metric induces the weak topology on $Δ (A \times O)^{ω},$ so for all $μ \in E,$ $μ^{π_{n}} \to μ^{π} .$ Therefore, $Λ (π_{n}) \to Λ (π)$ in the Hausdorff topology. $□$

Proof of Proposition 2 and Corollary 1

Corollary 1 below corresponds to Proposition 5 in this proof section of the original infra-Bayesianism sequence. The proof of that proposition was organized in three phases. Since we expand the ideas in more detail, we have several lemmas; “phase 1” is captured here in Lemma 5, and “phase 2″ and “phase 3” are captured here in Proposition 2. The other lemmas cover prerequisite ideas that are used in the proof.

The following lemma is a special case of Theorem 6.9 of [1]. We provide a simplified proof under the assumption that $X$ is a compact metric space using the fact that the set of all Lipschitz functions is dense in $C (X, [0, 1]),$ the space of continuous functions from $X$ to $[0, 1]$ [2].

Lemma 3: Let $X$ be a compact metric space, $f : X \to [0, 1]$ be continuous, and $m \in Δ X .$ Then $m \mapsto m (f)$ is continuous as a function from $Δ X$ with the Kantorovich-Rubinstein (KR-) metric to $[0, 1] .$

Proof. Since $Δ X$ is a metric space, $Δ X$ is first-countable. Thus $m \mapsto m (f)$ is continuous if $m_{n} \to m$ implies $m_{n} (f) \to m (f)$ . Let $m_{n} \to m$ and $ϵ > 0.$ By definition, there exists $N \in N$ such that for all $n \geq N,$

sup f_{lip} | m_{n} (f_{lip}) - m (f_{lip}) | < ϵ,

where the supremum is taken over all 1-Lipschitz continuous functions $f_{lip} : X \to [- 1, 1] .$

Suppose $f$ is 1-Lipschitz. Then $m_{n} (f) \to m (f)$ since

| m_{n} (f) - m (f) | \leq sup f_{lip} | m_{n} (f_{lip}) - m (f_{lip}) | < ϵ .

Suppose $f$ is $K$ -Lipschitz. Then $\frac{1}{K} f$ is 1-Lipschitz. So, by the first case, there exists $N \in N$ such that for all $n \geq N,$

| m_{n} (\frac{1}{K} f) - m (\frac{1}{K} f) | < \frac{1}{K} ϵ .

By linearity of expectation, for all $n \geq N,$

| m_{n} (f) - m (f) | = K | m_{n} (\frac{1}{K} f) - m (\frac{1}{K} f) | < ϵ .

Suppose $f$ is continuous. By the density of the set of all Lipschitz functions in $C (X, [0, 1])$ [2], there exists a Lipschitz function $g : X \to [0, 1]$ such that ${sup}_{x \in X} | f (x) - g (x) | < \frac{ϵ}{3} .$ By the second case, there exists $N \in N$ such that for all $n \geq N,$ $| m_{n} (g) - m (g) | < \frac{ϵ}{3} .$ Applying the triangle inequality, we obtain that for all $n \geq N,$

| m_{n} (f) - m (f) |

= | m_{n} (f) - m_{n} (g) + m_{n} (g) - m (g) + m (g) - m (f) |

\leq | m_{n} (f) - m_{n} (g) | + | m_{n} (g) - m (g) | + | m (g) - m (f) |

\leq m_{n} (X) sup x \in X | f (x) - g (x) | + \frac{ϵ}{3} + m (X) sup x \in X | f (x) - g (x) |

< \frac{ϵ}{3} + \frac{ϵ}{3} + \frac{ϵ}{3} < ϵ .

Here we have used the fact that $m, m_{n} \in Δ X$ and thus $m (X) = m_{n} (X) = 1.$ This shows that $m_{n} (f) \to m (f),$ which completes the proof by the remark at the beginning. $□$

Lemma 4: Let $θ \in □ X$ be a credal set over a compact metric space $X .$ Then for all continuous $f : X \to [0, 1]$ there exists $m^{*} \in θ$ such that $m^{*} (f) = {max}_{m \in θ} {m (f)}$ .

Proof. By Lemma 3, the map $m \mapsto m (f)$ is continuous. By definition, $θ$ is compact. A continuous function over a compact set achieves a maximum over that set, which proves the result. $□$

The next lemma corresponds to “phase one” of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence.

Lemma 5: Let $Λ : Π \to □ (A \times O)^{ω}$ be a crisp causal law, and let $g : Π \to C ((A \times O)^{ω}, [0, 1])$ be continuous. Suppose a sequence of policies ${π_{n}}_{n \in N}$ converges to $π$ in the KR-metric on $Π .$ Then ${lim}_{n \to \infty} | E_{Λ (π_{n})} [g (π_{n})] - E_{Λ (π_{n})} [g (π)] | = 0.$

Proof. Let a convergent sequence $π_{n} \to π$ and $ϵ > 0$ be given. We aim to show that there exists $N$ such that for all $n \geq N,$ $| E_{Λ (π_{n})} [g (π_{n})] - E_{Λ (π_{n})} [g (π)] | < ϵ .$

By Lemma 1 and Lemma 4, for all $n \in N$ , there exists $m_{n} \in Λ (π_{n})$ such that $E_{Λ (π_{n})} [g (π_{n})] = m_{n} (g (π_{n}))$ . Similarly, there exists $m \in Λ (π_{n})$ such that $E_{Λ (π_{n})} [g (π)] = m (g (π))$ . Then

(1) | E_{Λ (π_{n})} [g (π_{n})] - E_{Λ (π_{n})} [g (π)] | = | m_{n} (g (π_{n})) - m (g (π)) | .

By definition, for all $n \in N,$ $m_{n} (g (π_{n})) \geq m (g (π_{n}))$ and $m (g (π)) \geq m_{n} (g (π)) .$ This implies

(2) m_{n} (g (π_{n})) - m (g (π_{n})) \geq 0, and

(3) m (g (π)) - m_{n} (g (π)) \geq 0.

By (2),

m (g (π)) - m_{n} (g (π_{n}))

\leq m (g (π)) - m_{n} (g (π_{n})) + m_{n} (g (π_{n})) - m (g (π_{n}))

= m (g (π)) - m (g (π_{n})) .

By applying (3) similarly, we obtain

m_{n} (g (π_{n})) - m (g (π)) \leq m_{n} (g (π_{n})) - m_{n} (g (π)) .

These inequalities imply that

(4) | m_{n} (g (π_{n})) - m (g (π)) | \leq max {m (g (π)) - m (g (π_{n})), m_{n} (g (π_{n})) - m_{n} (g (π))} .

We will now consider how to bound each of the expressions in the set on the right hand side.

Since $Π$ is compact (by Lemma 2 of An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism) and $g$ is continuous, $g$ is uniformly continuous by the Heine-Cantor theorem. Recall that the metric on $C ((A \times O)^{ω}, [0, 1])$ is the Chebyshev metric defined by $d (f_{1}, f_{2}) = {sup}_{h \in (A \times O)^{ω}} | f_{1} (h) - f_{2} (h) | .$ By the definition of uniform continuity, there exists $δ > 0$ such that $d_{K R} (π_{n}, π) < δ$ implies that ${sup}_{h \in (A \times O)^{ω}} | g (π_{n}) (h) - g (π) (h) | < ϵ .$

Since $π_{n} \to π$ , there exists $N \in N$ such that for all $n \geq N,$ $d_{K R} (π_{n}, π) < δ .$ Then (with steps explained below),

| m_{n} (g (π_{n})) - m_{n} (g (π)) | \leq m_{n} (| g (π_{n}) - g (π) |) = \int_{(A \times O)^{ω}} | g (π_{n}) - g (π) | d m_{n} < \int_{(A \times O)^{ω}} (ϵ) d m_{n} = ϵ .

Note that $| m_{n} (g (π_{n})) - m_{n} (g (π)) | \leq m_{n} (| g (π_{n}) - g (π) |)$ due to the linearity of expectation and the triangle inequality for integrals, which states that for any measure $m$ and integrable function $f,$ $| \int f d m | \leq \int | f | d m .$ By definition, $m_{n} (| g (π_{n}) - g (π) |) = \int_{(A \times O)^{ω}} | g (π_{n}) - g (π) | d m_{n} .$ By monotonicity of the Lebesgue integral,

\int_{(A \times O)^{ω}} | g (π_{n}) - g (π) | d m_{n} \leq \int_{(A \times O)^{ω}} sup h \in (A \times O)^{ω} | g (π_{n}) (h) - g (π) (h) | d m_{n} .

By the fact that $m_{n}$ is a probability measure and thus $m_{n} ((A \times O)^{ω}) = 1,$

\int_{(A \times O)^{ω}} sup h \in (A \times O)^{ω} | g (π_{n}) (h) - g (π) (h) | < ϵ

for all $n \geq N .$

By a similar argument, for all $n \geq N,$

| m (g (π_{n})) - m (g (π)) | < ϵ .

Returning to equations (1) and (4), we obtain that for all $n \geq N$ ,

| E_{Λ (π_{n})} [g (π_{n})] - E_{Λ (π_{n})} [g (π)] | < ϵ,

which completes the proof. $□$

The next proposition is a special case of “phase two” and “phase 3″ of the proof of Proposition 5 in this proof section of the original infra-Bayesianism sequence. The ideas here are the same, although we provide a direct proof.

Proposition 2: Let $Λ : Π \to □ (A \times O)^{ω}$ be a crisp causal law, and let $g : Π \to C ((A \times O)^{ω}, [0, 1])$ be continuous. Then the function $π \mapsto E_{Λ (π)} [g (π)]$ is continuous.

Proof: By Lemma 2, $Π$ is first-countable, and thus it is sufficient to show that for any convergent sequence $π_{n} \to π,$ ${lim}_{n \to \infty} E_{Λ (π_{n})} [g (π_{n})] = E_{Λ (π)} [g (π)] .$ By Lemma 5, it is furthermore sufficient to prove that ${lim}_{n \to \infty} E_{Λ (π_{n})} [g (π)] = E_{Λ (π)} [g (π)] .$

To that end, let $π_{n} \to π .$ We will show that ${liminf}_{n \to \infty} E_{Λ (π_{n})} [g (π)] \geq E_{Λ (π)} [g (π)]$ and ${limsup}_{n \to \infty} E_{Λ (π_{n})} [g (π)] \leq E_{Λ (π)} [g (π)] .$

By Lemma 1 and Lemma 4, there exists $m \in Λ (π)$ such that $E_{Λ (π)} [g (π)] = m (g (π)) .$ By the continuity of $Λ$ , there exists a convergent sequence ${m_{n}}_{n \in N}$ such that $m_{n} \to m$ and $m_{n} \in Λ (π_{n})$ for all $n \in N .$ By assumption, $g$ is continuous, and thus Lemma 3 implies that $m_{n} (g (π)) \to m (g (π)) .$ By construction, $E_{Λ (π_{n})} [g (π)] \geq m_{n} (g (π)) .$ Then

liminf n \to \infty E_{Λ (π_{n})} [g (π)] \geq liminf n \to \infty m_{n} (g (π)) = m (g (π)) = E_{Λ (π)} [g (π)] .

For the second argument, choose a subsequence ${n_{k}}_{k \in N}$ such that

(1) lim k \to \infty E_{Λ (π_{n_{k}})} [g (π)] = limsup n \to \infty E_{Λ (π_{n})} [g (π)] .

Invoking Lemma 1 and Lemma 4 again, for each $n_{k}$ , choose $m_{n_{k}} \in Λ (π_{n_{k}})$ such that $E_{Λ (π_{n_{k}})} [g (π)] = m_{n_{k}} (g (π)) .$ Since $Λ$ is continuous and $Λ (π)$ is closed, there exists a convergent subsequence ${m_{n_{k_{j}}}}_{j \in N}$ with a limit point $m \in Λ (π) .$

Then by (1), Lemma 3, and the definition of expectation with respect to a credal set,

limsup n \to \infty E_{Λ (π_{n})} [g (π)] = lim j \to \infty m_{n_{k_{j}}} [g (π)] = m (g (π)) \leq E_{Λ (π)} [g (π)] .

Thus ${lim}_{n \to \infty} E_{Λ (π_{n})} [g (π)] = E_{Λ (π)} [g (π)],$ which completes the proof. $□$

Corollary 1 then follows from Proposition 2 using a standard continuity argument.

Corollary 1: For all crisp causal laws $Λ$ and for all continuous functions $g : Π \to C ((A \times O)^{ω}, [0, 1])$ , ${argmin}_{π \in Π} E_{Λ \sim ζ} [E_{Λ (π)} [g (π)]]$ is a closed, non-empty set.

Proof of Proposition 3

The following proposition corresponds to Proposition 12 in this proof section of the infra-Bayesian sequence. Equation (2) below appears there, and we explain it in more detail here. The remainder of the proof differs from the original version.

Proposition 3: For any non-dogmatic prior $ζ$ over a learnable collection of crisp causal laws ${Λ_{i}}_{i = 0}^{\infty}$ , if a family $π^{* γ}$ of policies is infra-Bayes optimal with respect to $ζ$ , then $π^{* γ}$ learns ${Λ_{i}}_{i = 0}^{\infty} .$

Proof: Assume that ${Λ_{i}}_{i = 0}^{\infty}$ is learnable. We will proceed by contrapositive and show that if a family of policies $π^{* γ}$ does not learn ${Λ_{i}}_{i = 0}^{\infty},$ then for any non-dogmatic prior $ζ$ over ${Λ_{i}}_{i = 0}^{\infty},$ $π^{* γ}$ is not infra-Bayes optimal. By definition, there exists a family of policies $π^{γ}$ such that for all $i \geq 0,$

(1) lim γ \to 1 Reg (π^{γ}, Λ_{i}, L^{γ}) = 0

Equivalently, by the definition of infra-regret, for all $i \geq 0,$

(1) lim γ \to 1 (E_{Λ_{i} (π^{γ})} [L^{γ}] - min π \in Π E_{Λ_{i} (π)} [L^{γ}]) = 0.

We preliminarily aim to show that

(2) lim γ \to 1 IBReg (π^{γ}, ζ, L^{γ}) = lim γ \to 1 E_{Λ_{i} \sim ζ} Reg (π^{γ}, Λ_{i}, L^{γ}) = 0.

Let $ϵ > 0$ be given. Note that for all $N \in N,$

E_{Λ_{i} \sim ζ} Reg (π^{γ}, Λ_{i}, L^{γ})

= N \sum i = 0 ζ (Λ_{i}) Reg (π^{γ}, Λ_{i}, L^{γ}) + \infty \sum i = N + 1 ζ (Λ_{i}) Reg (π^{γ}, Λ_{i}, L^{γ})

Since ${Λ_{i}}_{i = 1}^{\infty}$ is countable and $\sum_{i \geq 0} ζ (Λ_{i}) = 1 < \infty$ , there exists $N < \infty$ such that $\sum_{i \geq N + 1} ζ (Λ_{i}) < ϵ / 2.$ Since $Reg (π^{γ}, Λ_{i}, L^{γ}) \leq 1$ for all $i \in N,$ the second sum above is bounded by $ϵ / 2$ . We will now bound the first sum.

Let $^ϵ := \frac{ϵ}{2 N} .$ Applying equation (1) for each $0 \leq i \leq N,$ we can obtain $γ_{i} \in (0, 1)$ such that for all $γ \in [γ_{i}, 1),$ $Reg (π^{γ}, Λ_{i}, L^{γ}) <^ϵ .$ Let $γ^{*} = {max}_{0 \leq i \leq N} γ_{i}$ . Then $Reg (π^{γ}, Λ_{i}, L^{γ}) <^ϵ$ holds simultaneously for all $0 \leq i \leq N$ and $γ \geq γ^{*} .$ Thus for $γ \geq γ^{*},$ $E_{Λ_{i} \sim ζ} Reg (π^{γ}, Λ_{i}, L^{γ}) < ϵ,$ proving Equation (2).

Since $π^{* γ}$ does not learn ${Λ_{i}}_{i = 1}^{\infty},$ there exists $j \in N$ such that ${lim}_{γ \to 1} Reg (π^{* γ}, Λ_{j}, L^{γ}) \neq 0.$ We will show that this implies ${lim}_{γ \to 1} IBReg (π^{* γ}, ζ, L^{γ}) > 0,$ which when combined with (2) indicates that $π^{* γ}$ is not infra-Bayes optimal with respect to $ζ$ .

With steps explained in the proceeding paragraphs, we have that

lim γ \to 1 IBReg (π^{* γ}, ζ, L^{γ})

= lim γ \to 1 E_{Λ_{i} \sim ζ} Reg (π^{* γ}, Λ_{i}, L^{γ})

= lim γ \to 1 \infty \sum i = 0 ζ (Λ_{i}) Reg (π^{* γ}, Λ_{i}, L^{γ})

= \infty \sum i = 0 ζ (Λ_{i}) lim γ \to 1 Reg (π^{* γ}, Λ_{i}, L^{γ})

\geq ζ (Λ_{j}) lim γ \to 1 Reg (π^{* γ}, Λ_{j}, L^{γ})

> 0.

The first equality follows from the definition of infra-regret, and the second equality follows from the definition of expectation with respect to a measure on a countable set. The third equality is an application of the Lebesgue Dominating Convergence Theorem using the constant one function as a dominating function. The inequalities follow from the fact that all terms in the sum are positive, ${lim}_{γ \to 1} Reg (π^{* γ}, Λ_{j}, L^{γ}) > 0,$ and $ζ (Λ_{j}) \neq 0,$ which is true by the assumption that $ζ$ is non-dogmatic.

We will now show that $π^{* γ}$ is not infra-Bayes optimal with respect to $ζ$ . There exists $γ_{0} \in [0, 1)$ such that

IBReg (π^{* γ_{0}}, ζ, L^{γ_{0}}) > IBReg (π^{γ_{0}}, ζ, L^{γ_{0}}) .

Therefore,

E_{Λ \sim ζ} Reg (π^{* γ_{0}}, Λ, L^{γ_{0}}) - E_{Λ \sim ζ} Reg (π^{γ_{0}}, Λ, L^{γ_{0}}) > 0.

By linearity,

E_{Λ \sim ζ} [Reg (π^{* γ_{0}}, Λ, L^{γ_{0}}) - Reg (π^{γ_{0}}, Λ, L^{γ_{0}})] > 0.

By definition,

E_{Λ \sim ζ} [E_{Λ (π^{* γ_{0}})} [L^{γ_{0}}] - min π \in Π E_{Λ (π)} [L^{γ_{0}}] - (E_{Λ (π^{γ_{0}})} [L^{γ_{0}}] - min π \in Π E_{Λ (π)} [L^{γ_{0}}])] > 0.

Simplifying and applying linearity of expectation, we obtain

E_{Λ \sim ζ} [E_{Λ (π^{* γ_{0}})} [L^{γ_{0}}] - E_{Λ (π^{γ_{0}})} [L^{γ_{0}}]]

= E_{Λ \sim ζ} E_{Λ (π^{* γ_{0}})} [L^{γ_{0}}] - E_{Λ \sim ζ} E_{Λ (π^{γ_{0}})} [L^{γ_{0}}] > 0.

Therefore, $π^{* γ}$ is not an infra-Bayes optimal family of policies. $□$

Acknowledgements

Many thanks Vanessa Kosoy and Marcus Ogren for their constructive feedback on the initial draft.

References

[1] Villani, Cédric, Optimal Transport: Old and New. Springer Berlin, Heidelberg, 2009.
[2] Carothers, N. L., “The Stone-Weierstrass Theorem.” In Real Analysis. Cambridge University Press, 2012.

Proof Section to an Introduction to Credal Sets and Infra-Bayes Learnability