Proof Section to an Introduction to Reinforcement Learning for Understanding Infra-Bayesianism

Introduction

This post accompanies An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism. The goal of this introduction is to provide a high-level overview of the proofs contained in this post.

The proof of Proposition 1 is achieved through three lemmas. I believe the most insightful part of the proof is the use of the “epsilon over three” trick that lets us break the proof down into these lemmas. The first lemma uses the concept of the product topology on the space of policies, and I found this very useful for understanding better what convergence in the product topology on the space of policies actually means. The text Topology by James Munkres is a standard reference for topology.

The second lemma uses the concept of a Lebesgue integral and is simply an application of the dominated convergence theorem from measure theory. One standard reference for these ideas is Real Analysis: Modern Techniques and Their Applications by Gerald Folland.

The proof of Proposition 2 is less technical than the other proofs and should be readable to those with a basic real analysis and probability background. For basic real analysis, Principles of Mathematical Analysis by Walter Rudin is a standard reference.

Proposition 1

For brevity throughout this section, we use $μ^{π} (L^{γ})$ to denote $E_{h \sim μ^{π}} [L^{γ} (h)] .$

Proposition 1 Lemma 1: Suppose that $π_{n} \to π$ in the product topology on $Π .$ Then for any finite time-horizon $N \in N,$ $μ^{π_{n}} (L^{γ} |_{N}) \to μ^{π} (L^{γ} |_{N}) .$

Proof: Suppose that $π_{n} \to π$ in the product topology on $Π$ . Let $ϵ > 0$ be given and fix $N \in N$ . To satisfy the definition of the limit, we must show that there exists $M \in N$ such that for all $n \geq M,$ $| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π} (L^{γ} |_{N}) | < ϵ .$

Let $N (A, O)$ denote the number of histories of length $N .$ Since $A$ and $O$ are finite sets and $N < \infty,$ $N (A, O) < \infty .$ Let the set of histories of length $N$ then be indexed by ${1, 2, \dots, N (A, O)} .$

Then we have:

| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π} (L^{γ} |_{N}) |

= | E_{h \sim μ^{π_{n}}} L^{γ} |_{N} (h) - E_{h \sim μ^{π}} L^{γ} |_{N} (h) |

= | N (A, O) \sum i = 1 μ^{π_{n}} (h_{i}) L^{γ} |_{N} (h_{i}) - μ^{π} (h_{i}) L^{γ} |_{N} (h_{i}) |

= | N (A, O) \sum i = 1 L^{γ} |_{N} (h_{i}) (μ^{π_{n}} (h_{i}) - μ^{π} (h_{i})) | .

\leq N (A, O) \sum i = 1 | μ^{π_{n}} (h_{i}) - μ^{π} (h_{i}) | .

At this point, we want to bound the magnitude of terms of the form $μ^{π_{n}} (h_{i}) - μ^{π} (h_{i})$ for $1 \leq i \leq N (A, O) .$ The key idea is to show that these terms can be factored into the product of two terms: one term that is finite and one term that becomes arbitrarily small for sufficiently large $n$ .

Let $(h_{i})_{\leq t}$ denote the sequence of the first $t$ elements of $h_{i} .$ By the product rule,

μ^{π} (h_{i}) := N \prod t = 0 π (a_{t} | (h_{i})_{\leq t - 1}) \cdot μ (o_{t} | (h_{i})_{\leq t}) .

Since $μ^{π_{n}}$ and $μ^{π}$ are induced by the same environment, we can observe a common factor of $μ^{π_{n}} (h_{i})$ and $μ^{π} (h_{i})$ . Namely, we have:

μ^{π_{n}} (h_{i}) - μ^{π} (h_{i})

= N \prod t = 0 μ (o_{t} | (h_{i})_{\leq t}) \cdot [N \prod t = 0 π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - N \prod t = 0 π (a_{t} | (h_{i})_{\leq t - 1})]

Let $c = {max}_{1 \leq i \leq N (A, O)} {\prod_{t = 0}^{N} μ (o_{t} | (h_{i})_{\leq t})},$ which is finite. Consider now the second term of the product, which we will show becomes arbitrarily small for sufficiently large $n$ since $π_{n} \to π$ in the product topology.

First, we establish some notation: Let $J = | {h : h \in (A \times O)^{*}, | h | \leq N} |,$ the cardinality of the set of histories of length at most $N,$ which is a finite set since $N < \infty,$ and $A$ and $O$ are finite sets. We can enumerate elements of this set of histories by $h_{j}$ with the index $1 \leq j \leq J .$

Given $^ϵ > 0$ , let $B_{^ϵ} (π (h_{j}))$ denote the ball of radius $^ϵ$ centered at $π (h_{j}) \in Δ A$ under the total variation distance, which we denote by $d_{T V}$ . Importantly, $B_{^ϵ} (π (h_{j}))$ is an open set of probability distributions over actions. Let $U := \prod_{j = 1}^{J} B_{^ϵ} (π (h_{j})) \times \prod_{j = J + 1}^{\infty} Δ A$ , which is open in the product topology since it is the product of open sets in $Δ A$ not equal to $Δ A$ in finitely many components.

Since $π_{n} \to π$ in the product topology, by definition: for all $^ϵ > 0,$ there exists $M$ such that for all $n \geq M,$ $π_{n} \in U .$ By construction, if $π_{n} \in U,$ for $1 \leq j \leq J,$ $d_{TV} (π (h_{j}), π_{n} (h_{j})) <^ϵ .$ By definition, then for $1 \leq j \leq J,$ ${sup}_{a \in A} | π (a | h_{j}) - π_{n} (a | h_{j}) | <^ϵ .$

Consider the term we’re trying to bound,

| N \prod t = 0 π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - N \prod t = 0 π (a_{t} | (h_{i})_{\leq t - 1}) | .

We want to rewrite the terms $π_{n} (a_{t} | (h_{i})_{\leq t - 1})$ using the approximation provided above. We have that

π_{n} (a_{t} | (h_{i})_{\leq t - 1}) = | π_{n} (a_{t} | (h_{i})_{\leq t - 1}) | = | π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - π (a_{t} | (h_{i})_{\leq t - 1}) + π (a_{t} | (h_{i})_{\leq t - 1}) | .

Then by the triangle inequality,

| π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - π (a_{t} | (h_{i})_{\leq t - 1}) + π (a_{t} | (h_{i})_{\leq t - 1}) |

\leq | π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - π (a_{t} | (h_{i})_{\leq t - 1}) | + | π (a | (h_{i})_{\leq t - 1}) |

\leq^ϵ + π (a | h_{i})_{\leq t - 1}) .

Thus $π_{n} (a | (h_{i})_{\leq t - 1}) \leq^ϵ + π (a | (h_{i})_{\leq t - 1}),$ so we now have:

| N \prod t = 0 π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - N \prod t = 0 π (a_{t} | (h_{i})_{\leq t - 1}) | \leq | N \prod t = 0^ϵ + π (a | (h_{i})_{\leq t - 1}) - N \prod t = 0 π (a | (h_{i})_{\leq t - 1}) |

Notice that when expanded, the only term of $\prod_{t = 0}^{N}^ϵ + π (a | (h_{i})_{\leq t - 1})$ that isn’t the product of $^ϵ$ and some finite term is $\prod_{t = 0}^{N} π (a | (h_{i})_{\leq t - 1})$ , which cancels out with an identical term in the original expression. Since $^ϵ$ can be chosen arbitrarily small by choosing the index $n$ of $π_{n}$ to be large, this means: given any $~ ϵ > 0,$ there exists $M_{i}$ such that for all $n \geq M_{i},$

| N \prod t = 0 π_{n} (a_{t} | (h_{i})_{\leq t - 1}) - N \prod t = 0 π (a_{t} | (h_{i})_{\leq t - 1}) | \leq ~ ϵ .

Then if $n \geq max {M_{i} : 1 \leq i \leq N (A, O)},$ for all $1 \leq i \leq N (A, O),$ $| μ^{π_{n}} (h_{i}) - μ^{π} (h_{i}) | \leq c \cdot ~ ϵ .$ Then

| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π} (L^{γ} |_{N}) |

\leq N (A, O) \cdot c \cdot ~ ϵ

Choose, $~ ϵ < \frac{ϵ}{N (A, O) \cdot c} .$ Then for $n \geq max {M_{i} : 1 \leq i \leq N (A, O)},$ $| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π} (L^{γ} |_{N}) | < ϵ . □$

Proposition 1 Lemma 2: Define $L^{γ} |_{N}$ for $N \in N$ as $L^{γ} |_{N} (a_{t}, o_{t})_{t = 0}^{N} = \sum_{t = 0}^{N} γ^{t} L (o_{t}) .$ Let $m \in M (A \times O)^{\infty}$ be a probability measure on destinies. Then $m (L^{γ} |_{N}) \to m (L^{γ})$ as $N \to \infty .$

Proof: This lemma follows directly from the dominated convergence theorem, so our proof will be a check that all the conditions of that theorem are satisfied. First note that $L^{γ} |_{N}$ converges to $L^{γ}$ pointwise. Also $\int_{(A \times O)^{\infty}} | L^{γ} (h) | d m < \infty$ since $m$ is a probability measure (and thus $m (A \times O)^{\infty} = 1$ ) and $L^{γ} (h) \leq 1$ for all $h \in (A \times O)^{\infty} .$ Also, for all $N \in N$ and $h \in (A \times O)^{\infty},$ $L^{γ} |_{N} (h) \leq | L^{γ} (h) | .$ Then, by the dominated convergence theorem, ${lim}_{N \to \infty} \int L^{γ} |_{N} d m = \int L^{γ} d m .$ $□$

Proposition 1 Lemma 3: Suppose that ${π_{n}}_{n \in N}$ is a convergent sequence in $Π .$ Then for any $ϵ > 0,$ there exists $M \in N$ and a finite time-horizon $N_{0} \in N$ such that for all $m \geq M$ and $N \geq N_{0},$ $| μ^{π_{m}} (L^{γ} |_{N}) - μ^{π_{m}} (L^{γ}) | < ϵ .$

Proof: Let $ϵ > 0$ be given. Note that $Π$ is a metric space since it is the countable product of metric spaces (namely, $Δ A)$ . This implies that if ${π_{n}}_{n \in N}$ is a convergent sequence, ${π_{n}}_{n \in N}$ is Cauchy.

For any $n, m \in N,$ we have by the triangle inequality that

| μ^{π_{m}} (L^{γ}) - μ^{π_{m}} (L^{γ} |_{N}) |

= | μ^{π_{m}} (L^{γ}) - μ^{π_{n}} (L^{γ}) + μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N}) + μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{m}} (L^{γ} |_{N}) |

\leq | μ^{π_{m}} (L^{γ}) - μ^{π_{n}} (L^{γ}) | + | μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N}) | + | μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{m}} (L^{γ} |_{N}) | .

Since ${π_{n}}_{n \in N}$ is Cauchy, the same technique used in Proposition 1 Lemma 1 can be used to show that there exists $M$ such that for all $n, m \geq M,$ there exists $N_{1}$ such that for all $N \geq N_{1},$ $| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{m}} (L^{γ} |_{N}) | < \frac{ϵ}{9} .$

Let $n, m \geq M$ be fixed. By Proposition 1 Lemma 2, there exists $N_{2} \in N$ such that for all $N \geq N_{2},$ $| μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N}) | < \frac{ϵ}{3} .$

Note that

| μ^{π_{m}} (L^{γ}) - μ^{π_{n}} (L^{γ}) |

= | μ^{π_{m}} (L^{γ}) - μ^{π_{m}} (L^{γ} |_{N}) + μ^{π_{m}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ} |_{N}) + μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ}) |

\leq | μ^{π_{m}} (L^{γ}) - μ^{π_{m}} (L^{γ} |_{N}) | + | μ^{π_{m}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ} |_{N}) | + | μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ}) | .

By Proposition 1 Lemma 2 there exists $N_{3}$ such that for all $N \geq N_{3}$ , $| μ^{π_{m}} (L^{γ}) - μ^{π_{m}} (L^{γ} |_{N}) | < \frac{ϵ}{9} .$ By the same lemma, there exists $N_{4}$ such that for all $N \geq N_{4}$ , $| μ^{π_{n}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ}) | < \frac{ϵ}{9} .$ As previously stated, if $N \geq N_{1},$ then $| μ^{π_{m}} (L^{γ} |_{N}) - μ^{π_{n}} (L^{γ} |_{N}) | < \frac{ϵ}{9} .$ Therefore, if $N \geq max {N_{1}, N_{3}, N_{4}},$ then $| μ^{π_{m}} (L^{γ}) - μ^{π_{n}} (L^{γ}) | < \frac{ϵ}{3} .$

Let $N_{0} := max {N_{1}, N_{2}, N_{3}, N_{4}} .$ Then if $N \geq N_{0},$

| μ^{π_{m}} (L^{γ}) - μ^{π_{m}} (L^{γ} |_{N}) | < \frac{ϵ}{3} + \frac{ϵ}{3} + \frac{ϵ}{9} < ϵ . □

Proposition 1: If $A$ and $O$ are finite, the optimal policy for a given environment $μ$ exists; namely, ${arg min}_{π \in Π} E_{h \sim μ^{π}} [L^{γ} (h)] \neq \emptyset .$ Furthermore, the optimal policy can be chosen to be deterministic.

Proof: The usual technique to show that a function attains a minimum is to show that it is continuous and that the domain is compact. Then the existence of a minimum follows by the generalization of the Extreme Value Theorem to topological spaces. Recall that Lemma 2 (from the main text) states that $Π$ is compact.

So, it suffices to show that the map $π \mapsto E_{h \sim μ^{π}} [L^{γ} (h)]$ from $Π$ to $R$ is continuous. Assume that ${π_{n}}_{n \in N}$ is a sequence of policies converging to $π$ with respect to the product topology on $Π .$ Then we want to show that $μ^{π_{n}} (L^{γ}) \to μ^{π} (L^{γ})$ . Define $L^{γ} |_{N}$ for $N \in N$ as $L^{γ} |_{N} (a_{t}, o_{t})_{t = 0}^{N} := \sum_{t = 0}^{N} γ^{t} L (o_{t}),$ which we can conceptualize as the truncation of $L^{γ}$ to a finite time-horizon.

By Proposition 1 Lemma 2, there exists $N_{1}$ such that for all $N \geq N_{1},$ $| μ^{π} (L^{γ} |_{N}) - μ^{π} (L^{γ}) | < ϵ / 3.$ By Proposition 1 Lemma 3, there exists $M_{1}$ and $N_{2}$ such that for all $n \geq M_{1}$ and $N \geq N_{2},$ $| μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N}) | < \frac{ϵ}{3} .$

Let $N_{0} = max {N_{1}, N_{2}} .$ By Proposition 1 Lemma 1, there exists $M_{2}$ such that for all $n \geq M_{2},$ $| μ^{π} (L^{γ} |_{N_{0}}) - μ^{π_{n}} (L^{γ} |_{N_{0}}) | < \frac{ϵ}{3} .$

Let $ϵ > 0$ be given. If $n \geq max {M_{1}, M_{2}}$ , then by the triangle inequality,

| μ^{π_{n}} (L^{γ}) - μ^{π} (L^{γ}) |

= | μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N_{0}}) + μ^{π} (L^{γ} |_{N_{0}}) - μ^{π_{n}} (L^{γ} |_{N_{0}}) + μ^{π} (L^{γ} |_{N_{0}}) - μ^{π} (L^{γ}) |

\leq | μ^{π_{n}} (L^{γ}) - μ^{π_{n}} (L^{γ} |_{N_{0}}) | + | μ^{π} (L^{γ} |_{N_{0}}) - μ^{π_{n}} (L^{γ} |_{N_{0}}) | + | μ^{π} (L^{γ} |_{N_{0}}) - μ^{π} (L^{γ}) |

< ϵ .

Therefore, $μ^{π_{n}} (L^{γ}) \to μ^{π} (L^{γ}) .$

Now we prove that the optimal policy can be chosen to be deterministic. The environment $μ$ remains fixed. Let $Π_{det} \subset Π$ denote the set of deterministic policies, which is compact. By the same argument given more generally for $Π,$ $d := {min}_{π \in Π_{det}} E_{μ^{π}} [L^{γ}]$ is well-defined.

Any stochastic policy $π_{s}$ is equivalent to a probabilistic mixture of deterministic policies. Therefore, any stochastic policy $π_{s}$ induces a probability measure $ν_{s}$ on the countable set $μ^{Π_{det}} := {μ^{π} : π \in Π_{det}} .$ As a result, $μ^{π_{s}}$ can be written as a countable mixture of measures given by $μ^{π_{s}} = \sum_{μ^{π_{i}} \in μ^{Π_{det}}} ν_{s} (μ^{π_{i}}) μ^{π_{i}}$ .

This implies $E_{μ^{π_{s}}} [L^{γ}] = E_{ν_{s}} [E_{μ^{π}} [L^{γ}]]$ . (See problem 9.7 in reference [1] for an outline of how this can be justified^[1], as also cited in reference [2].) By the definition of $d$ , $E_{ν_{s}} [E_{μ^{π}} [L^{γ}]] \geq E_{ν_{s}} [d] = d$ . Thus, the expected loss of a stochastic policy always exceeds the minimum loss of the deterministic policies. $□$

Proposition 2

Proposition 2: If a countable class of environments ${μ_{i}}_{i \in I}$ is learnable, then the Bayes-optimal family of policies for any non-dogmatic prior $ζ$ on ${μ_{i}}_{i \in I}$ learns the class.

Proof: Suppose a class of hypotheses ${μ_{i}}_{i \in I}$ is learnable, and let $ζ$ be a non-dogmatic prior on ${μ_{i}}_{i \in I} .$ We will proceed by contrapositive and show that if the family of policies ${π_{ζ}^{γ}}_{γ \in [0, 1)}$ does not learn the class, then it is not the Bayes-optimal family for $ζ .$

Assume that there exists $j$ such that ${lim}_{γ \to 1} Reg (π_{ζ}^{γ}, μ_{j}, γ) \neq 0.$ By negating the definition of a limit, this means that there exists $ϵ > 0$ such that for all $δ > 0,$ there exists $γ$ such that $| γ - 1 | < δ$ and $| Reg (π_{ζ}^{γ}, μ_{j}, γ) | \geq ϵ .$ By definition, $| Reg (π_{ζ}^{γ}, μ_{j}, γ) | \geq ϵ$ if and only if

| min π \in Π {E_{h \sim μ_{j}^{π}} [L^{γ} (h)]} - E_{h \sim μ_{j}^{π_{ζ}^{γ}}} [L^{γ} (h)] | \geq ϵ .

We will show that this contradicts the assumption that ${π_{ζ}^{γ}}_{γ \in [0, 1)}$ is a Bayes-optimal family for $ζ .$

With steps explained in the proceeding paragraphs, we have:

lim γ \to 1 BReg (π_{ζ}^{γ}, ζ, γ)

= lim γ \to 1 E_{μ \sim ζ} Reg (π_{ζ}^{γ}, μ, γ)

= lim γ \to 1 \sum i \in I ζ (μ_{i}) (E_{h \sim μ_{i}^{π_{ζ}^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)])

= \sum i \in I ζ (μ_{i}) lim γ \to 1 (E_{h \sim μ_{i}^{π_{ζ}^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)])

\geq ζ (μ_{j}) \cdot ϵ

> 0.

In the above, the first and second equalities follow from definition. To get the third equality, we switch sum and the limit, which is justified by the dominated convergence theorem. In the language of countably infinite sums, this theorem says: Let $f_{γ} (i) : I \to R$ be a sequence of functions indexed by $γ \in [0, 1)$ such that the pointwise limit of $f_{γ}$ , ${lim}_{γ \to 1} f_{γ} (i),$ exists for all $i \in I$ . Suppose there exists $g (i) : I \to R$ such that $\sum_{i \in I} g (i) < \infty,$ and for all $γ$ and $i,$ $| f_{γ} (i) | \leq g (i)$ . Then

lim γ \to 1 \sum i \in I f_{γ} (i) = \sum i \in I lim γ \to 1 f_{γ} (i) .

So, this theorem essentially states that if we can find a sequence of terms $g (i)$ that “dominates” all of the the $f_{γ} (i)$ pointwise in $i,$ and the sum of the $g (i)$ doesn’t blow up, then the limit and sum can be interchanged.

To apply the dominated convergence theorem, let

f_{γ} (i) := ζ (μ_{i}) (E_{h \sim μ_{i}^{π_{ζ}^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)]) .

Note that

E_{h \sim μ_{i}^{π_{ζ}^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)] \leq 1.

So $g (i) := ζ (μ_{i})$ is an appropriate dominating function. By the definition of a prior, $\sum_{i \in I} ζ (μ_{i}) = 1 < \infty .$ Therefore, the assumptions of the dominated convergence theorem are satisfied.

The second to last inequality follows from the fact that for all $i \in I,$

lim γ \to 1 (E_{h \sim μ_{i}^{π_{ζ}^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)]) \geq 0.

On the other hand, by assumption, ${μ_{i}}_{i \in I}$ is learnable, so there exists some family of policies ${π^{γ}}_{γ \in [0, 1)}$ such that for all $i,$ ${lim}_{γ \to 1} Reg (π^{γ}, μ_{i}, γ) = 0.$ Consider a parallel argument to what we have above, replacing $π_{ζ}^{γ}$ with $π^{γ}$ :

lim γ \to 1 BReg (π^{γ}, ζ, γ)

= lim γ \to 1 \sum i \in I ζ (μ_{i}) (E_{h \sim μ_{i}^{π^{γ}}} [L^{γ} (h)] - min π \in Π E_{h \sim μ_{i}^{π}} [L^{γ} (h)])

= \sum i \in I ζ (μ_{i}) lim γ \to 1 (E_{h \sim μ_{i}^{π^{γ}}} [L^{γ} (h)] - min π \in Π {E_{h \sim μ_{i}^{π}} [L^{γ} (h)]})

= \sum i \in I ζ (μ_{i}) lim γ \to 1 Reg (π^{γ}, μ_{i}, γ)

= \sum i \in I ζ (μ_{i}) \cdot 0

= 0.

Then there exists $Γ \in [0, 1)$ such that for all $γ \geq Γ, BReg (π^{γ}, ζ, γ) < ζ (μ_{j}) \cdot ϵ .$ On the other hand, if ${lim}_{γ \to 1} BReg (π_{ζ}^{γ}, ζ, γ) \geq ζ (μ_{j}) \cdot ϵ$ , then there exists $γ_{0} \geq Γ$ such that $BReg (π_{ζ}^{γ_{0}}, ζ, γ_{0}) \geq ζ (μ_{j}) \cdot ϵ .$ Hence, $BReg (π^{γ_{0}}, ζ, γ_{0}) < BReg (π_{ζ}^{γ_{0}}, ζ, γ_{0}) .$

This implies

BReg (π^{γ_{0}}, ζ, γ_{0}) - BReg (π_{ζ}^{γ_{0}}, ζ, γ_{0}) > 0.

By definition and by linearity of expectation,

BReg (π_{ζ}^{γ_{0}}, ζ, γ_{0}) - BReg (π^{γ_{0}}, ζ, γ_{0})

= E_{ζ} [E_{μ^{π_{ζ}^{γ_{0}}}} [L^{γ_{0}} (h)] - min π \in Π E_{μ^{π}} [L^{γ_{0}} (h)]] - (E_{ζ} [E_{μ^{π^{γ_{0}}}} [L^{γ_{0}} (h)] - min π \in Π E_{μ^{π}} [L^{γ_{0}} (h)]])

= E_{ζ} E_{μ^{π_{ζ}^{γ_{0}}}} [L^{γ_{0}} (h)] - E_{ζ} E_{μ^{π^{γ_{0}}}} [L^{γ_{0}} (h)] .

Then $E_{ζ} E_{μ^{π_{ζ}^{γ_{0}}}} [L^{γ_{0}} (h)] - E_{ζ} E_{μ^{π^{γ_{0}}}} [L^{γ_{0}} (h)] > 0$ . Then by definition, ${π_{ζ}^{γ}}_{γ \in [0, 1)}$ is not the Bayes-optimal family for $ζ .$ $□$

References:

[1] Schilling, René L. 2005. Measures, Integrals and Martingales. Cambridge: Cambridge University Press.

[2] humanStampedist (https://math.stackexchange.com/users/474469/humanstampedist), How to integrate for a countable sum of measures?, URL (version: 2018-08-27): https://math.stackexchange.com/q/2896276

^
It is a standard approximation argument using an increasing sequence of step-functions. The difference between what is needed here and what is stated as the exercise in the book is that we have the coefficients determined by $ν .$ This is not a problem when applying the argument since all of the coefficients are nonnegative.