LBIT Proofs 6: Propositions 39-47

Proposition 39: Given a crisp infradistribution $ζ^{□}$ over $N$ , an infrakernel $K$ from $N$ to infradistributions over $X$ , and suggestively abbreviating $K (i)$ as $H_{i}$ (hypothesis $i$ ) and $K_{*} (ζ^{□})$ as $E_{ζ^{□}} H_{i}$ (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then
$((E_{ζ^{□}} H_{i}) |^{g} L) (f) = \frac{E_{ζ^{□}} (P_{H_{i}}^{g} (L) \cdot (H_{i} |^{g} L) (f) + H_{i} (0 ★^{L} g)) - E_{ζ^{□}} (H_{i} (0 ★^{L} g))}{E_{ζ^{□}} (H_{i} (1 ★^{L} g)) - E_{ζ^{□}} (H_{i} (0 ★^{L} g))}$

Proof: Assume that $L$ and $g$ are functions of type $X \to [0, 1]$ and $X \to R$ respectively, ie, likeliehood and utility doesn’t depend on which hypothesis you’re in, just what happens. First, unpack our abbreviations and what an update means.

$((E_{ζ^{□}} H_{i}) |^{g} L) (f) = (K_{*} (ζ^{□}) |^{g} L) (f) = \frac{K_{*} (ζ^{□}) (f ★^{L} g) - K_{*} (ζ^{□}) (0 ★^{L} g)}{K_{*} (ζ^{□}) (1 ★^{L} g) - K_{*} (ζ^{□}) (0 ★^{L} g)}$

Then use the definition of an infrakernel pushforward.

$= \frac{ζ^{□} (λ i . K (i) (f ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}{ζ^{□} (λ i . K (i) (1 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}$

For the next thing, we’re just making types a bit more explicit, $f, L, g$ only depend on $x$ , not $i$ .

$= \frac{ζ^{□} (λ i . K (i) (λ x . f ★^{L} g)) - ζ^{□} (λ i . K (i) (λ x .0 ★^{L} g))}{ζ^{□} (λ i . K (i) (λ x .1 ★^{L} g)) - ζ^{□} (λ i . K (i) (λ x .0 ★^{L} g))}$

Then we pack the semidirect product back up.

$= \frac{(ζ^{□} ⋉ K) (λ i, x . f ★^{L} g) - (ζ^{□} ⋉ K) (λ i, x .0 ★^{L} g)}{(ζ^{□} ⋉ K) (λ i, x .1 ★^{L} g) - (ζ^{□} ⋉ K) (λ i, x .0 ★^{L} g)}$

And pack the update back up.

$= ((ζ^{□} ⋉ K) |^{g} L) (f)$

At this point, we invoke the Infra-Disintegration Theorem.

$= (ζ^{□^{'}} ⋉ K^{'}) (f)$

$= ζ^{□^{'}} (λ i . K^{'} (i) (f))$

We unpack what our new modified prior is, via the Infra-Disintegration Theorem.

$= \frac{ζ^{□} (λ i . α (i) K^{'} (i) (f) + β (i)) - (ζ^{□} ⋉ K) (0 ★^{L} g)}{(ζ^{□} ⋉ K) (1 ★^{L} g) - (ζ^{□} ⋉ K) (0 ★^{L} g)}$

and unpack the semidirect product.

$= \frac{ζ^{□} (λ i . α (i) K^{'} (i) (f) + β (i)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}{ζ^{□} (λ i . K (i) (1 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}$

Now we unpack $α$ and $β$ .

$= \frac{ζ^{□} (λ i . P_{K (i)}^{g} (L) \cdot K^{'} (i) (f) + K (i) (0 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g)}{ζ^{□} (λ i . K (i) (1 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g)}$

And unpack what $K^{'}$ is

$= \frac{ζ^{□} (λ i . P_{K (i)}^{g} (L) \cdot (K (i) |^{g} L) (f) + K (i) (0 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}{ζ^{□} (λ i . K (i) (1 ★^{L} g)) - ζ^{□} (λ i . K (i) (0 ★^{L} g))}$

And reabbreviate $K (i)$ as $H_{i}$ ,

$= \frac{ζ^{□} (λ i . P_{H_{i}}^{g} (L) \cdot (H_{i} |^{g} L) (f) + H_{i} (0 ★^{L} g)) - ζ^{□} (λ i . H_{i} (0 ★^{L} g)}{ζ^{□} (λ i . H_{i} (1 ★^{L} g)) - ζ^{□} (λ i . H_{i} (0 ★^{L} g)}$

And then pack it back up into a suggestive form as a sort of expectation.

$= \frac{E_{ζ^{□}} (P_{H_{i}}^{g} (L) \cdot (H_{i} |^{g} L) (f) + H_{i} (0 ★^{L} g)) - E_{ζ^{□}} (H_{i} (0 ★^{L} g))}{E_{ζ^{□}} (H_{i} (1 ★^{L} g)) - E_{ζ^{□}} (H_{i} (0 ★^{L} g)}$

And we’re done.

Proposition 40: If a likelihood function $L : X \to [0, 1]$ is 0 when $f (x) < a$ , and $f \geq 0$ and $a > 0$ , then $h (L \cdot a) \leq h (f)$

$h (L \cdot a) = {inf}_{(λ μ, b) \in H} λ μ (L \cdot a) + b = {inf}_{(λ μ, b) \in H} a λ μ (L) + b$

And then we apply Markov’s inequality, that for any probability distribution,

$μ (1_{f (x) \geq a}) \leq \frac{μ (f)}{a}$

Also, $1_{f (x) \geq a} \geq L$ (because $L$ is 0 when $f (x) < a$ ), so monotonicity means that

$μ (L) \leq \frac{μ (f)}{a}$

So, we can get:

$\leq {inf}_{(λ μ, b) \in H} a λ \frac{μ (f)}{a} + b = {inf}_{(λ μ, b) \in H} λ μ (f) + b = h (f)$

And we’re done.

Proposition 41: The IKR-metric is a metric.

So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular $f$ (instead of the supremum), it would fulfill the triangle inequality, and then it’s an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.

First, if $d_{I K R} (h, h^{'}) = 0$ , then $h$ and $h^{'}$ must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions, $h$ and $h^{'}$ must perfectly agree on all the uniformly continuous functions.

Now, we’re going to need a somewhat more sophisticated argument. Let’s say that the sequence $f_{n}$ is uniformly bounded and limits to $f$ in $C_{B} (X)$ equipped with the compact-open topology (ie, we get uniform convergence of $f_{n}$ to $f$ on all compact sets). Then, for any infradistributions, $h (f_{n})$ will limit to $h (f)$ . Here’s why. For any $ϵ$ , there’s some compact set $C_{ϵ}$ that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that $h (f_{n})$ will, in the limit, be incredibly close to $h (f)$ , due to $f_{n}$ and $f$ disagreeing by a bounded amount outside the set $C_{ϵ}$ and only disagreeing by a tiny amount on the set $C_{ϵ}$ , and the Lipschitzness of $h$ .

Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when $C_{B} (X)$ is equipped with the compact-open topology, so given any function $f$ , we can find a sequence of uniformly continuous functions $f_{n}$ limiting to $f$ in the compact-open topology, and then,

$h (f) = {lim}_{n \to \infty} h (f_{n}) = {lim}_{n \to \infty} h^{'} (f_{n}) = h^{'} (f)$

And so, $h$ and $h^{'}$ agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that $d_{I K R}$ is a metric.

Proposition 42: The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.

Let’s show both directions of this. For the first one, if the Hausdorff-distance between $H, H^{'}$ is $d_{h a u} (H, H^{'})$ , then for all a-measures $(m, b)$ in $H$ , there’s an a-measure $(m^{'}, b^{'})$ in $H^{'}$ that’s only $d_{h a u} (H, H^{'})$ or less distance away, according to the KR-metric (on a-measures).

Now, by LF-duality, a-measures in H correspond to hyperplanes above $h$ . Two a-measures being $d_{h a u} (H, H^{'})$ apart means, by the definition of the KR-metric for a-measures, that they will assign values at most $d_{h a u} (H, H^{'})$ distance apart for 1-Lipschitz functions in $[- 1, 1]$ .

So, translating to the concave functional view of things, $H$ and $H^{'}$ being $d_{h a u} (H, H^{'})$ apart means that every hyperplane above h has another hyperplane above $h^{'}$ that can only differ on the 1-Lipschitz 1-bounded functions by at most $d_{h a u} (H, H^{'})$ , and vice-versa.

Let’s say we’ve got a Lipschitz function $f$ . Fix an affine functional/hyperplane $ψ^{h}$ that touches the graph of $h$ at $f$ . Let’s try to set an upper bound on what $h^{'} (f)$ can be. If $f$ is 1-Lipschitz and 1-bounded, then we can craft a $ψ^{h^{'}}$ above $h^{'}$ that’s nearby, and

$h^{'} (f) \leq ψ^{h^{'}} (f) \leq ψ^{h} (f) + d_{h a u} (H, H^{'}) = h (f) + d_{h a u} (H, H^{'})$

Symmetrically, we can swap $h^{'}$ and $h$ to get $h (f) \leq h^{'} (f) + d_{h a u} (H, H^{'})$ , and put them together to get:

$| h^{'} (f) - h (f) | \leq d_{h a u} (H, H^{'})$

For the 1-Lipschitz functions.

Let’s tackle the case where $f$ is either more than 1-Lipschitz, or strays outside of $[- 1, 1]$ . In that case, $\frac{f}{max (L i (f), | | f | |)}$ is 1-Lipschitz and bounded in $[- 1, 1]$ . We can craft a $ψ^{h^{'}}$ that only differs on 1-Lipschitz functions by $d_{h a u} (H, H^{'})$ or less. Then, since, for affine functionals, $ψ (a x) = a (ψ (x) - ψ (0)) + ψ (0)$ and using that $ψ^{h^{'}}$ and $ψ^{h}$ are close on 1-Lipschitz functions, which $\frac{f}{max (L i (f), | | f | |)}$ and 0 are, we can go:

$h^{'} (f) \leq ψ^{h^{'}} (f) = ψ^{h^{'}} (max (L i (f), | | f | |) \cdot \frac{f}{max (L i (f), | | f | |)})$

$= max (L i (f), | | f | |) (ψ^{h^{'}} (\frac{f}{max (L i (f), | | f | |)}) - ψ^{h^{'}} (0)) + ψ^{h^{'}} (0)$

And then we swap out $ψ^{h^{'}}$ for $ψ^{h}$ with a known penalty in value, we’re taking an overestimate at this point.

$\leq max (L i (f), | | f | |) ((ψ^{h} (\frac{f}{max (L i (f), | | f | |)}) + d_{h a u} (H, H^{'})) - (ψ^{h} (0) - d_{h a u} (H, H^{'})))$

$+ (ψ^{h} (0) - d_{h a u} (H, H^{'}))$

$= d_{h a u} (H, H^{'}) \cdot (2 (max (L i (f), | | f | |)) - 1)$

$+ max (L i (f), | | f | |) \cdot (ψ^{h} (\frac{f}{max (L i (f), | | f | |)}) - ψ^{h} (0)) + ψ^{h} (0)$

$< 2 d_{h a u} (H, H^{'}) \cdot (max (L i (f), | | f | |)) + ψ^{h} (max (L i (f), | | f | |) \cdot \frac{f}{max (L i (f), | | f | |)})$

$= 2 d_{h a u} (H, H^{'}) \cdot (max (L i (f), | | f | |)) + ψ^{h} (f)$

$= 2 d_{h a u} (H, H^{'}) \cdot (max (L i (f), | | f | |)) + h (f)$

This argument works for all $h$ . And, even though we just got an upper bound, to rule out $h^{'} (f)$ being significantly below $h (f)$ , we could run through the same upper bound argument with $h^{'}$ instead of $h$ , to show that $h (f)$ can’t be more than $2 d_{h a u} (H, H^{'}) \cdot (max (L i (f), | | f | |))$ above $h^{'} (f)$ .

So, for all Lipschitz $f$ , $| h (f) - h^{'} (f) | \leq 2 d_{h a u} (H, H^{'}) \cdot (max (L i (f), | | f | |, 1))$ . Thus, for all Lipschitz $f$ ,

$\frac{| h (f) - h^{'} (f) |}{max (L i (f), | | f | |, 1)} \leq 2 d_{h a u} (H, H^{'})$

And therefore,

$d_{I K R} (h, h^{'}) \leq 2 d_{h a u} (H, H^{'})$

This establishes one part of our inequalities. Now for the other direction.

Here’s how things are going to work. Let’s say we know the IKR-distance between $h$ and $h^{'}$ . Our task will be to stick an upper bound on the Hausdorff-distance between $H$ and $H^{'}$ . Remember that the Hausdorff-distance being low is equivalent to “any hyperplane above $h$ has a corresponding hyperplane above $h^{'}$ that attains similar values on the 1-or-less-Lipschitz functions”.

So, let’s say we’ve got $h$ , and a $ψ^{h} \geq h$ . Our task is, knowing $h^{'}$ , to craft a hyperplane above $h^{'}$ that’s close to $ψ$ on the 1-Lipschitz functions. Then we can just swap $h^{'}$ and $h$ , and since every hyperplane above $h$ is close (on the 1-Lipschitz functions) to a hyperplane above $h^{'}$ , and vice-versa, $H$ and $H^{'}$ can be shown to be close. We’ll use Hahn-Banach separation for this one.

Accordingly, let the set $A$ be the set of $f, b$ where $(f, b) = p (f^{'}, b^{'}) + (1 - p) (f^{*}, b^{*})$ , and:

$p \in [0, 1), f^{'} \in C_{1 - l i p} (X, [- 1, 1]), f^{*} \in C_{B} (X), b^{'} < ψ^{h} (f^{'}) - d_{I K R} (h, h^{'}), b^{*} < h^{'} (f^{*})$

That’s… quite a mess. It can be thought of as the convex hull of the hypograph of $h^{'}$ , and the hypograph of $ψ^{h}$ restricted to the 1-Lipschitz functions in $[- 1, 1]$ and shifted down a bit. If there was a $ψ^{h^{'}}$ that cuts into $h^{'}$ and scores lower than it, ie $ψ^{h^{'}} (f^{*}) < h^{'} (f^{*})$ , we could have $p = 0$ , and $b^{*} = ψ^{h^{'}} (f^{*}) < h^{'} (f^{*})$ to observe that $ψ^{h^{'}}$ cuts into the set $A$ . Conversely, if an affine functional doesn’t cut into the set $A$ , then it lies on-or-above the graph of $h^{'}$ .

Similarly, if $ψ^{h^{'}}$ undershoots $ψ^{h} - d_{I K R} (h, h^{'})$ over the 1-or-less-Lipschitz functions in $[- 1, 1]$ , it’d also cut into $A$ . Conversely, if the hyperplane $ψ^{h^{'}}$ doesn’t cut into $A$ , then it sticks close to $ψ^{h}$ over the 1-or-less-Lipschitz functions.

This is pretty much what $A$ is doing. If we don’t cut into it, we’re above $h^{'}$ and not too low on the functions with a Lipschitz norm of 1 or less.

For Hahn-Banach separation, we must verify that $A$ is convex and open. Convexity is pretty easy.

$q (p_{1} (f_{1}^{'}, b_{1}^{'}) + (1 - p_{1}) (f_{1}^{*}, b_{1}^{*})) + (1 - q) (p_{2} (f_{2}^{'}, b_{2}^{'}) + (1 - p_{2}) (f_{2}^{*}, b_{2}^{*}))$

$= (q p_{1} + (1 - q) p_{2}) (\frac{q p_{1}}{q p_{1} + (1 - q) p_{2}} (f_{1}^{'}, b_{1}^{'}) + \frac{(1 - q) p_{2}}{q p_{1} + (1 - q) p_{2}} (f_{2}^{'}, b_{2}^{'}))$

$+ (q (1 - p_{1}) + (1 - q) (1 - p_{2})) (\frac{q (1 - p_{1})}{q (1 - p_{1}) + (1 - q) (1 - p_{2})} (f_{1}^{*}, b_{1}^{*}) + \frac{(1 - q) (1 - p_{2})}{(q (1 - p_{1}) + (1 - q) (1 - p_{2})} (f_{2}^{*}, b_{2}^{*}))$

First verification: Those numbers at the front add up to 1 (easy to verify), are both in $[0, 1]$ (this is trivial to verify), and $q p_{1}$ + $(1 - q) p_{2}$ isn’t 1 (this is a mix of two numbers that are both below $1$ , so this is easy). Ok, that condition is down. Next up: Is our mix of $f_{1}^{'}$ and $f_{2}^{'}$ 1-Lipschitz and in $[- 1, 1]$ ? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of $f_{1}^{*}$ and $f_{2}^{*}$ still in $C_{B} (X)$ ? Yup.

That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below $ψ^{h^{'}} - d_{I K R} (h, h^{'})$ (a hyperplane) lies strictly below it as well. For the second one, since $h^{'}$ is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when $q p_{1} + (1 - q) p_{2}$ is 0, in which case, we can have our new $f^{'}$ and $b^{'}$ be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So $A$ is convex.

But… is $A$ open? Well, observe that the region under the graph of $h$ on $C_{B} (X)$ is open, due to Lipschitzness of $h$ . We can wiggle $b$ and $f$ around a tiny tiny little bit in any direction without matching or exceeding the graph of $h$ . So, given a point in $A$ , fix your tiny little open ball around $(f^{*}, b^{*})$ . Since $p$ can’t be 1, when you mix with $(f^{'}, b^{'})$ , you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn’t collapse to a point), making a little tiny open ball around your arbitrarily chosen point in $A$ . So $A$ is open.

Now, let’s define a $B$ that should be convex, so we can get Hahn-Banach separation going (as long as we can show that $A$ and $B$ are disjoint). It should be chosen to forbid our separating hyperplane being too much above $ψ^{h}$ over the 1-or-less Lipschitz functions. So, let $B$ be:

${(f, b) | f \in C_{1 - l i p} (X, [- 1, 1]), b \geq ψ^{h} (f) + d_{I K R} (h, h^{'})}$

Obviously, cutting into this means your hyperplane is too far above $ψ^{h}$ over the 1-or-less-Lipschitz functions in $[- 1, 1]$ . And it’s obviously convex, because 1-or-less-Lipschitz functions in $[- 1, 1]$ are a convex set, and so is the region above a hyperplane $(ψ^{h} + d_{I K R} (h, h^{'}))$ .

All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We’ll assume there’s a point in both of them and derive a contradiction. So, let’s say that $(f, b)$ is in both $A$ and $B$ . Since it’s in $B$ ,

$b \geq ψ^{h} (f) + d_{I K R} (h, h^{'})$

But also, $(f, b) = p (f^{'}, b^{'}) + (1 - p) (f^{*}, b^{*})$ with the $f$ ‘s and $b$ ’s and $p$ fulfilling the appropriate properties, because it’s in $A$ . Since $b^{*} < h^{'} (f^{*})$ and $b^{'} < ψ^{h} (f^{'}) - d_{I K R} (h, h^{'})$ , we’ll write $b^{*}$ as $h^{'} (f^{*}) - δ^{*}$ and $b^{'}$ as $ψ^{h} (f^{'}) - d_{I K R} (h, h^{'}) - δ^{'}$ , where $δ^{*}$ and $δ^{'}$ are nonzero. Thus, we rewrite as:

$p (ψ^{h} (f^{'}) - d_{K R} (h, h^{'}) - δ^{'}) + (1 - p) (h^{'} (f^{*}) - δ^{*}) \geq ψ^{h} (p f^{'} + (1 - p) f^{*}) + d_{I K R} (h, h^{'})$

We’ll be folding $- p δ^{'} - (1 - p) δ^{*}$ into a single $- δ$ term so I don’t have to write as much stuff. Also, $ψ^{h}$ is an affine function, so we can split things up with that, and make:

$p ψ^{h} (f^{'}) - p d_{I K R} (h, h^{'}) + (1 - p) h^{'} (f^{*}) - δ \geq p ψ^{h} (f^{'}) + (1 - p) ψ^{h} (f^{*}) + d_{I K R} (h, h^{'})$

$(1 - p) h^{'} (f^{*}) - δ \geq (1 - p) ψ^{h} (f^{*}) + (1 + p) d_{I K R} (h, h^{'})$

Remember, $ψ^{h} (f^{*}) \geq h (f^{*})$ because $ψ^{h} \geq h$ . So, we get:

$(1 - p) h^{'} (f^{*}) - δ \geq (1 - p) h (f^{*}) + (1 + p) d_{I K R} (h, h^{'})$

$(1 - p) (h^{'} (f^{*}) - h (f^{*})) - δ \geq (1 + p) d_{I K R} (h, h^{'})$

And, if $h (f^{*}) \geq h^{'} (f^{*})$ , we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, $h^{'} (f^{*}) > h (f^{*})$ , and we can rewrite as:

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | - δ \geq (1 + p) d_{I K R} (h, h^{'})$

And now, we should notice something really really important. Since $p$ can’t be $1$ , $f^{*}$ does consistute a nonzero part of $f$ , because $f = p f^{'} + (1 - p) f^{*}$ .

However, $f$ is a 1-or-less Lipschitz function, and bounded in $[- 1, 1]$ , due to being in $B$ ! If $f^{*}$ wasn’t Lipschitz, then given any slope, you could find areas where it’s ascending faster than that rate. This still happens when it’s scaled down, and $f^{'}$ can only ascend or descend at a rate of 1 or slower there since it’s 1-Lipschitz as well. So, in order for $f$ to be 1-or-less Lipschitz, $f^{*}$ must be Lipschitz as well. Actually, we get something stronger, if $f^{*}$ has a really high Lipschitz constant, then $p$ needs to be pretty high. Otherwise, again, $f$ wouldn’t be 1-or-less Lipschitz, since $1 - p$ of it is composed of $f^{*}$ , which has areas of big slope. Further, if $f^{*}$ has a norm sufficiently far away from 0, then $p$ needs to be pretty high, because otherwise f wouldn’t be in $[- 1, 1]$ , since $1 - p$ of it is composed of $f^{*}$ which has areas distant from 0.

Our most recent inequality (derived under the assumption that there’s a point in $A$ and $B$ ) was:

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | - δ \geq (1 + p) d_{I K R} (h, h^{'})$

Assuming hypothetically were were able to show that

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | \leq (1 + p) d_{I K R} (h, h^{'})$

then because $δ > 0$ , we’d get a contradiction, showing that $A$ and $B$ are disjoint. So let’s shift our proof target to trying to show

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | \leq (1 + p) d_{I K R} (h, h^{'})$

Let’s begin. So, our first order of business is that

$\frac{1 + p}{1 - p} \geq 1$

This should be trivial to verify, remember that $p \in [0, 1)$ .

Now, $f = p f^{'} + (1 - p) f^{*}$ , and $f$ is 1-Lipschitz, and so is $f^{'}$ . Our goal now is to impose an upper bound on the Lipschitz constant of $f^{*}$ . Let us assume that said Lipschitz constant of $f^{*}$ is above 1. We can find a pair of points where the rise of $f^{*}$ from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of $f^{*}$ , or equal. If we’re trying to have $f^{*}$ slope up as hard as it possibly can while mixing to make $f$ , which is 1-Lipschitz, then the best case for that is one where $f^{'}$ is sloping down as hard as it can, at a rate of −1. Therefore, we have that

$(1 - p) L i (f^{*}) + p \cdot (- 1) \leq 1$

Ie, mixing $f^{*}$ sloping up as hard as possible and $f^{'}$ sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:

$(1 - p) L i (f^{*}) \leq (1 + p)$

$L i (f^{*}) \leq \frac{1 + p}{1 - p}$

We can run through almost the same exact argument, but with the norm of $f^{*}$ . Let us assume that said norm is above 1. We can find a point where $f^{*}$ attains its maximum/minimum, whichever is further from 0. Now, if you’re trying to have $f^{*}$ be as negative/positive as it possibly can be, while mixing to make $f$ , which lies in $[- 1, 1]$ , then the best case for that is one where $f^{'}$ is as positive/negative as it can possibly be there, ie, has a value of −1 or 1. In both cases, we have:

$(1 - p) | | f^{*} | | + p \cdot (- 1) \leq 1$

$(1 - p) | | f^{*} | | \leq (1 + p)$

$| | f^{*} | | \leq \frac{1 + p}{1 - p}$

Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by $\frac{1 + p}{1 - p}$ , we have:

$max (L i (f^{*}) | | f^{*} | |, 1) \leq \frac{1 + p}{1 - p}$

$1 - p \leq \frac{1 + p}{max (L i (f^{*}), | | f^{*} | |, 1)}$

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | \leq (1 + p) \frac{| h^{'} (f^{*}) - h (f^{*}) |}{max (L i (f^{*}), | | f^{*} | |, 1)}$

$\leq (1 + p) {sup}_{f \in C_{B - l i p} (X)} \frac{| h^{'} (f) - h (f) |}{max (L i (f), | | f | |, 1)} = (1 + p) d_{I K R} (h, h^{'})$

And we have exactly our critical

$(1 - p) | h^{'} (f^{*}) - h (f^{*}) | \leq (1 + p) d_{I K R} (h, h^{'})$

inequality necessary to force a contradiction. Therefore, $A$ and $B$ must be disjoint. Since $A$ is open and convex, and $B$ is convex, we can do Hahn-Banach separation to get something that touches $B$ and doesn’t cut into $A$ .

Therefore, we’ve crafted a $ψ^{h^{'}}$ that lies above $h^{'}$ , and is within $d_{I K R} (h, h^{'})$ of $ψ^{h}$ over the 1-or-less-Lipschitz functions in $[- 1, 1]$ , because it doesn’t cut into $A$ and touches $B$ .

This same argument works for any $ψ^{h} \geq h$ , and it works if we swap $h^{'}$ and $h$ . Thus, since hyperplanes above the graph of an infradistribution function $h$ or $h^{'}$ correspond to points in the corresponding $H$ and $H^{'}$ , and we can take any point in $H$ /affine functional above $h$ and make a point in $H^{'}$ /affine functional above $h^{'}$ (and same if the two are swapped) that approximately agree on $C_{1 - l i p} (X, [- 1, 1])$ , there’s always a point in the other infradistribution set that’s close in KR-distance and so $H$ and $H^{'}$ have

$d_{h a u} (H, H^{'}) \leq d_{I K R} (h, h^{'})$

And with that, we get

$d_{h a u} (H, H^{'}) \leq d_{I K R} (h, h^{'}) \leq 2 d_{h a u} (H, H^{'})$

And we’re done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.

Proposition 43: A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space $□ X$ is complete under $d_{I K R}$ .

So, the space of closed subsets of $M^{a} (X)$ is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions $h_{n}$ in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets $H_{n}$ converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the $H_{n}$ sets fulfills the various different properties of an infradistribution. Every point in $H_{\infty}$ , the limiting set, has the property that there exists some Cauchy sequence of points from the $H_{n}$ sets that limit to it, and also every Cauchy sequence of points from the $H_{n}$ sets has its limit point be in $H_{\infty}$ .

So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.

For upper completion, given any point $(m, b) \in H_{\infty}$ , and any $(0, b^{'})$ a-measure, you can fix a Cauchy sequence $(m_{n}, b_{n}) \in H_{n}$ limiting to $(m, b)$ , and then consider the sequence $(m_{n}, b_{n} + b^{'})$ , which is obviously Cauchy (you’re just adding the same amount to everything, which doesn’t affect the KR-distance), and limits to $(m, b + b^{'})$ , certifying that $(m, b) + (0, b^{'}) \in H_{\infty}$ , so $H_{\infty}$ is upper-complete.

For closure, the Hausdorff limit of a sequence of closed sets is closed.

For convexity, given any two points $(m, b)$ and $(m^{'}, b^{'})$ in $H_{\infty}$ , and any $p \in [0, 1]$ , we can fix a Cauchy sequence $(m_{n}, b_{n}) \in H_{n}$ and $(m_{n}^{'}, b_{n}^{'}) \in H_{n}$ converging to those two points, respectively, and then consider the sequence $p (m_{n}, b_{n}) + (1 - p) (m_{n}^{'}, b_{n}^{'})$ , which lies in $H_{n}$ (due to convexity of all the $H_{n}$ ), and converges to $p (m, b) + (1 - p) (m^{'}, b^{'})$ , witnessing that this point is in $H_{\infty}$ , and we’ve just shown convexity.

For normalization, it’s most convenient to work with the positive functionals, and observe that, because all the $h_{n} (0) = 0$ and all the $h_{n} (1) = 1$ because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.

Finally, there’s the compact-projection property. We will observe that the projection of the a-measures in $H_{n}$ to just their measure components, call the set $p r (H_{n})$ , must converge in Hausdorff-distance. The reason for this is because if they didn’t, then you could find some $ϵ$ and arbitrarily late pairs of inframeasures where $p r (H_{n})$ and $p r (H_{m})$ have Hausdorff-distance $> ϵ$ , and then pick a point in $p r (H_{n})$ (or $p r (H_{m})$ ) that’s $> ϵ$ KR-distance away from the other projection. Then you can pair that measure with some gigantic $b$ term to get a point in $H_{n}$ (or $H_{m}$ , depending on which one you’re picking from), and there’d be no point in $H_{m}$ (or $H_{n}$ ) within $ϵ$ distance of it, because the measure component would only be able to change by $ϵ$ if you moved that far, and you need to change the measure component by $> ϵ$ to land within $H_{m}$ (or $H_{n}$ ).

Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the $H_{n}$ sequence, so the projections $p r (H_{n})$ must converge in Hausdorff distance on the space of measures over $X$ . Further, they’re precompact by the compact-projection property for the $H_{n}$ (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections $p r (H_{n})$ (technically, their closures) is a compact set of measures. Further, any sequence $(m_{n}, b_{n})$ which converges to some $(m, b) \in H_{\infty}$ , has its projection being $m_{n} \in p r (H_{n})$ , which limits to show that $m$ is in this Hausdorff limit. Thus, all points in $H_{\infty}$ project down to be in a compact set of measures, and we have compact-projection for $H_{\infty}$ , which is the last condition we need to check to see if it’s an infradistribution.

So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space $□ X$ is complete under the infra-KR metric.

Proposition 44: If a sequence of infradistributions converges in the IKR distance for one complete metric that $X$ is equipped with, it will converge in the IKR distance for all complete metrics that $X$ could be equipped with.

So, as a brief recap, $X$ could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on $X$ , a sequence of convergent infradistributions will still converge. Use $d_{1}$ for the original metric on $X$ and $d_{2}$ for the modified metric on $X$ , and similarly, $d_{K R 1}$ and $d_{K R 2}$ for the KR-metrics on measures, and $d_{h a u s 1}, d_{h a u s 2}$ for the Hausdorff distance induced by the two measures.

Remember, our infradistribution sets are closed under adding $+ b$ to them, and converge according to $d_{h a u s 1}$ to the set $H_{\infty}$ .

What we’ll be doing is slicing up the sets in a particular way. In order to do this, the first result we’ll need is that, for all $b^{*} \geq 1$ , the set

${(m_{n}, b_{n}) \in H_{n} | b_{n} \geq b^{*}}$

converges, according to $d_{h a u s 1}$ , to the set

${(m, b) \in H_{\infty} | b \geq b^{*}}$

So, here’s the argument for this. We know that the projection sets

${m_{n} | \exists b_{n} : (m_{n}, b_{n}) \in H_{n}}$

are precompact, ie, have compact closure, and Hausdorff-limit according to $d_{h a u 1}$ to the set

${m | \exists b : (m, b) \in H_{\infty}}$

(well, actually, they limit to the closure of that set)

According to our Lemma 3, this means that the set

${m | \exists b \geq 0, n \in N \cup {\infty} : (m, b) \in H_{n}}$

(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity $λ^{⊙}$ , the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn’t depend on the distance metric $X$ is equipped with.

Now, fix any $ϵ$ . There’s some timestep $n$ where, for all greater timesteps, $d_{h a u 1} (H_{n}, H_{\infty}) \leq ϵ$ .

Now, picking a point $(m_{n}, b_{n})$ in $H_{n}$ with $b_{n} \leq b^{*} - ϵ$ , we can travel $ϵ$ distance according to $d_{K R 1}$ and get a point in $H_{\infty}$ , and the $b$ term can only change by $ϵ$ or less when we move our a-measure a little bit, so we know that our nearby point lies in

${(m, b) \in H_{\infty} | b \leq b^{*}}$

But, what if our point $(m_{n}, b_{n})$ in $H_{n}$ has $b^{*} - ϵ \leq b_{n} \leq b^{*}$ ? Well then, we can pick some arbitrary point $(m_{n}^{l o}, 0) \in H_{n}$ (by normalization for $H_{n}$ ), and go:

$d_{K R 1} ((m_{n}, b_{n}), \frac{ϵ}{b^{*}} (m_{n}^{l o}, 0) + (1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n}))$

$= d_{K R 1} (\frac{ϵ}{b^{*}} (m_{n}, b_{n}) + (1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n}), \frac{ϵ}{b^{*}} (m_{n}^{l o}, 0) + (1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n}))$

$\leq d_{K R 1} (\frac{ϵ}{b^{*}} (m_{n}, b_{n}), \frac{ϵ}{b^{*}} (m_{n}^{l o}, 0)) + d_{K R 1} ((1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n}), (1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n}))$

$= \frac{ϵ}{b^{*}} d_{K R 1} ((m_{n}, b_{n}), (m_{n}^{l o}, 0)) + (1 - \frac{ϵ}{b^{*}}) d_{K R 1} ((m_{n}, b_{n}), (m_{n}, b_{n}))$

$= \frac{ϵ}{b^{*}} d_{K R 1} ((m_{n}, b_{n}), (m_{n}^{l o}, 0)) = \frac{ϵ}{b^{*}} (d_{K R 1} (m_{n}, m_{n}^{l o}) + | b_{n} - 0 |)$

And then we have to be a little careful. $b_{n} \leq b^{*}$ by assumption. Also, we can unpack the distance to get

$\leq \frac{ϵ}{b^{*}} ({sup}_{f \in C_{1 - L i p} (X, [- 1, 1])} | m_{n} (f) - m_{n}^{l o} (f) | + b^{*})$

And the worst-case for distance, since all the measures have their total amount of measure bounded above by $λ^{⊙}$ , would be $f$ being 1 on one of the measures and −1 on another one of the measures, producing:

$\leq \frac{ϵ}{b^{*}} (2 λ^{⊙} + b^{*})$

So, the distance from $(m_{n}, b_{n})$ to

$\frac{ϵ}{b^{*}} (m_{n}^{l o}, 0) + (1 - \frac{ϵ}{b^{*}}) (m_{n}, b_{n})$

according to $d_{K R 1}$ is at most $\frac{2 ϵ λ^{⊙}}{b^{*}} + ϵ$

And then, because this point has a $b$ value of at most

$(1 - \frac{ϵ}{b^{*}}) b^{*}$

Because $b_{n} \leq b^{*}$ , the $b$ value upper bound turns into $b^{*} - ϵ$

Which is a sufficient condition for that mix of two points to be only $ϵ$ distance from a point in $H_{\infty}$ with a $b^{*}$ upper bound on the $b$ term, so we have that the distance from

${(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}$

${(m, b) \in H_{\infty} | b \leq b^{*}}$

is at most

$\frac{2 ϵ λ^{⊙}}{b^{*}} + ϵ + ϵ = 2 ϵ (\frac{λ^{⊙}}{b^{*}} + 1)$

Conversely, we can flip $H_{n}$ and $H_{\infty}$ , to get this upper bound on the Hausdorff distance between these two sets according to $d_{h a u 1}$ .

And, since $b^{*}$ and $λ^{⊙}$ are fixed, and for any $ϵ$ , we can find some time where the distance between these two “lower parts” of the $H_{n}$ and $H_{\infty}$ sets is upper-bounded by $2 ϵ (\frac{λ^{⊙}}{b^{*}} + 1)$

We can have this quantity limit to 0, showing that

${lim}_{n \to \infty} d_{h a u 1} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) = 0$

For any $b^{*} \geq 1$ .

Ok, this is part of our result. No matter which $b^{*}$ we chop off the infradistribution sets at, we get convergence of those chopped pieces according to $d_{h a u 1}$ .

Now, we’ll need a second important result, that:

${lim}_{b^{*} \to \infty} d_{h a u 1} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) = 0$

Now, we only have to establish one direction of low Hausdorff distance in the limit, that any point in the latter set is close to a point in the former set, because the former set is a subset of the latter set and has distance 0 to it.

What we can do is, because $H_{\infty}$ has the compact-projection property, the set ${m | (m, b) \in H_{\infty}}$ is precompact, so for any $ϵ$ , we can select finitely many points in it such that every point in ${m | (m, b) \in H_{\infty}}$ is within $ϵ$ distance of our finite subset according to $d_{K R 1}$ . For these finitely many measures, there must be some $b$ term associated with them where $(m, b) \in H_{\infty}$ , so you can just take the largest one of them, and let that be your $b^{*}$ . Then, all your finitely many measures, when paired with $b^{*}$ or any larger number, will be present in $H_{\infty}$ , so

$d_{h a u 1} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) < ϵ$

Because all points in the latter set are close to one of finitely many points, which are all present in the former set, so the Hausdorff-1 distance must be low.

At this point, we can truly begin. We have produced the dual results:

$\forall b^{*} \geq 1 : {lim}_{n \to \infty} d_{h a u 1} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) = 0$

And

${lim}_{b^{*} \to \infty} d_{h a u 1} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) = 0$

And we also know that, because $H_{n}$ limits to $H$ according to 1-Hausdorff distance, and projection is 1-Lipschitz,

${lim}_{n \to \infty} d_{h a u 1} ({m_{n} | (m_{n}, b_{n}) \in H_{n}}, {m | (m, b) \in H_{\infty}}) = 0$

Now, here’s the thing. (The closure of) all of these sets are compact. For instance,

${(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}$

will always be compact, because any sequence in here must have a subsequence where its measure converges according to $d_{K R 1}$ (due to the compact-projection property applied to $H_{n}$ ), and then because $b_{n}$ is bounded in $[0, b^{*}]$ , we can pick out another convergent subsequence for that. Plus, it’s the intersection of a closed set ( $H_{n}$ ) and another closed set ${(m, b | b \leq b^{*}}$ , so it’s closed. All sequences have a convergent subsequence and it’s closed, so this set is compact. By identical arguments,

${(m, b) \in H_{\infty} | b \leq b^{*}}$

is compact. And for

${m | (m, b) \in H_{\infty}, b \leq b^{*}}$

it’s the projection of a compact set from earlier arguments, and

${m | (m, b) \in H_{\infty}}$

must be precompact by the compact-projection property, so it has compact closure. The exact same argument applies to

${m_{n} | (m_{n}, b_{n}) \in H_{n}}$

as well.

Now, for compact sets, convergence in Hausdorff-distance only depends on the topology of the underlying space, not the specific metric it’s equipped with. Just as long as the metrics induce the same topology. And the weak topology on the space of measures, or on the space of a-measures, doesn’t depend one bit on the metric that $X$ is equipped with, just with the topology. So, the properties of these sets limiting to each other still works when $X$ has its metric changed. Because, for measures/a-measures, we end up using the $d_{K R 2}$ metric, but that induces the same topology on the space of a-measures, so the compact sets still converge in the $d_{h a u 2}$ metric. So, we still have our triple results of:

$\forall b^{*} \geq 1 : {lim}_{n \to \infty} d_{h a u 2} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) = 0$

And

${lim}_{b^{*} \to \infty} d_{h a u 2} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) = 0$

And

${lim}_{n \to \infty} d_{h a u 2} ({m_{n} | (m_{n}, b_{n}) \in H_{n}}, {m | (m, b) \in H_{\infty}}) = 0$

Now, here’s how to argue that $H_{n}$ limits to $H_{\infty}$ in $d_{h a u 2}$ . Fix some $ϵ$ . From our limits above, there’s some value of $b^{*}$ where

$d_{h a u 2} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) \leq ϵ$

And for that value of $b^{*}$ , and that $ϵ$ , we have that there’s some value of $n$ where, for all greater numbers,

$d_{h a u 2} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) \leq ϵ$

And

$d_{h a u 2} ({m_{n} | (m_{n}, b_{n}) \in H_{n}}, {m | (m, b) \in H_{\infty}}) \leq ϵ$

Now, we’re going to need to go in two directions for this. First, we pick a point in $H_{n}$ and show that it’s close to a point in $H_{\infty}$ . Second, we pick a point in $H_{\infty}$ and show it’s close to a point in $H_{n}$ .

Let $(m_{n}, b_{n}) \in H_{n}$ . We have two possibilities. One possibility is that $b_{n} \leq b^{*}$ . Then, because

$d_{h a u 2} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) \leq ϵ$

we only have to go $ϵ$ distance to get to $H_{\infty}$ . The second possibility is that $b_{n} > b^{*}$ .

In this case, $(m_{n}, b_{n})$ lies in the set

$[b^{*}, \infty) \times {m_{n} | \exists b_{n} : (m_{n}, b_{n}) \in H_{n}}$

Which has distance $\leq ϵ$ from

$[b^{*}, \infty) \times {m | \exists b : (m, b) \in H_{\infty}}$

Because we have that

$d_{h a u 2} ({m_{n} | (m_{n}, b_{n}) \in H_{n}}, {m | (m, b) \in H_{\infty}}) \leq ϵ$

Just scooch over and keep the $b$ term the same. Additionally, the set

$[b^{*}, \infty) \times {m | \exists b : (m, b) \in H_{\infty}}$

has distance $\leq ϵ$ from the set

$[b^{*}, \infty) \times {m | \exists b \leq b^{*} : (m, b) \in H_{\infty}}$

Because we have:

$d_{h a u 2} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) \leq ϵ$

Further, the set

$[b^{*}, \infty) \times {m | \exists b \leq b^{*} : (m, b) \in H_{\infty}}$

is a subset of $H_{\infty}$ , because $H_{\infty}$ is upper-closed. So, either way, we only have to travel $2 ϵ$ 2-distance from $H_{n}$ to get to $H_{\infty}$

Now for the reverse direction, starting with a point $(m, b) \in H_{\infty}$ and getting to a nearby point in $H_{n}$ . Again, we can split into two cases. In our first case, $b \leq b^{*}$ , and because

$d_{h a u 2} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) \leq ϵ$

we only have to go $ϵ$ distance to get to $H_{n}$ . The second possibility is that $b > b^{*}$ . In such a case, $(m, b)$ would be guaranteed to lie in the set

$[b^{*}, \infty) \times {m | \exists b : (m, b) \in H_{\infty}}$

which has distance $\leq ϵ$ from the set

$[b^{*}, \infty) \times {m | \exists b \leq b^{*} : (m, b) \in H_{\infty}}$

Because we have:

$d_{h a u 2} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) \leq ϵ$

Further, the set

$[b^{*}, \infty) \times {m | \exists b \leq b^{*} : (m, b) \in H_{\infty}}$

has distance $\leq ϵ$ according to $d_{h a u 2}$ from the set

$[b^{*}, \infty) \times {m_{n} | \exists b_{n} \leq b^{*} : (m_{n}, b_{n}) \in H_{n}}$

Because the latter components are the projection of the sets

${(m, b) \in H_{\infty} | b \leq b^{*}}$

and

${(m_{n}, b_{n}) \in H_{\infty} | b_{n} \leq b^{*}}$

And we already know that

$d_{h a u 2} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) \leq ϵ$

So, given our point $(m, b) \in H_{\infty}$ , we just have to go $2 ϵ$ distance to get to the set

$[b^{*}, \infty) \times {m_{n} | \exists b_{n} \leq b^{*} : (m_{n}, b_{n}) \in H_{n}}$

And all points in this set lie in $H_{n}$ because of upper completion.

Thus, given any $ϵ$ , there’s a tail of the $H_{n}$ sequence where the $H_{n}$ are all within $2 ϵ$ distance (according to $d_{h a u 2}$ ) of $H_{\infty}$ , so if $d_{h a u 1}$ thinks that $H_{n}$ converge to $H_{\infty}$ , $d_{h a u 2}$ will think that as well. Further, the metric on $X$ which induces $d_{h a u 1}$ and $d_{h a u 2}$ are arbitrary, so a sequence of infradistributions converging happens regardless of which complete metric $X$ is equipped with.

Proposition 45: If a sequence of infradistributions $h_{n}$ converges to $h$ in the infra-KR distance, then for all bounded continuous functions $f$ , ${lim}_{n \to \infty} h_{n} (f) = h (f)$ .

Now, the infra-KR metric is:

$d_{I K R} (h, h^{'}) = {sup}_{f \in C_{l i p} (X)} \frac{| h (f) - h^{'} (f) |}{max (L i (f), | | f | |, 1)}$

So, to begin with, if $h_{n}$ converges to $h$ , all bounded Lipschitz functions must have ${lim}_{n \to \infty} h_{n} (f) = h (f)$ or else the infra-KR distance wouldn’t converge.

For the next two, since the infra-KR distance is strongly equivalent to Hausdorff distance, and we know that

$\forall n : {m_{n} | \exists b_{n} : (m_{n}, b_{n}) \in H_{n}}$

is always precompact, and they Hausdorff-limit to

${m | \exists b : (m, b) \in H_{\infty}}$

And we have our Lemma 3 that the union of compact-sets which Hausdorff-limit to something is compact, so the set

${m | \exists b, n : (m, b) \in H_{n}}$

is compact (well, actually precompact, but just take the closure).

Because compactness of a set of measures implies that the amount of measure doesn’t run off to infinity, there’s some $λ^{⊙} < \infty$ that’s a shared Lipschitz constant for all the $h_{n}$ .

Also, any uniformly continuous function can be built as the uniform limit of Lipschitz-continuous functions from above and below, so given some uniformly continuous $f$ , we can make a $f_{m}^{h i}$ sequence limiting to it from above, and a $f_{m}^{l o}$ sequence limiting to it from below. Then, we have:

${lim}_{n \to \infty} h_{n} (f) \leq {lim}_{n \to \infty} h_{n} (f_{m}^{h i}) = h (f_{m}^{h i})$

And similarly, we can get:

${lim}_{n \to \infty} h_{n} (f) \geq h (f_{m}^{l o})$

Now, regardless of $m$ and $n$ ,

$| h_{n} (f_{m}^{h i}) - h_{n} (f_{m}^{l o}) | \leq λ^{⊙} \cdot d (f_{m}^{h i}, f_{m}^{l o})$

So, even though we don’t necessarily know that the limit actually exists for $h_{n} (f)$ , we at least know that all the values are bounded in an interval of known maximum size, which converges to the interval

$[h (f_{m}^{l o}), h (f_{m}^{h i})]$

Which, by monotonicity for $h$ , $h (f)$ lies in that interval.

So, all the limit points of the $h_{n} (f)$ sequence are in that interval. Now, as $m$ gets unboundedly high, the difference between $f_{m}^{h i}$ and $f_{m}^{l o}$ gets unboundedly small, so for gigantic $m$ , we have that any limit points of the $h_{n}$ sequence must be in a really tiny interval. Taking the limit, we have that the interval crunches down to a single point, and $h_{n} (f)$ actually limits to $h (f)$ . We’ve shown it now for uniformly continuous functions.

Time to expand this to continuous functions in full generality. Again,

\{m|\exists b,n:(m,b)\in H_n\}

is precompact, so this implies that for all $ϵ$ , there is a compact set $C_{ϵ}$ where all minimal points of $H_{n}$ (regardless of the $n$ ! Even for the final infradistribution set $H_{\infty}$ !) have $< ϵ$ measure outside of that compact set.

Transferring to functionals, this means that for all the h_n (and h), C_{\eps} is an \eps-almost-support, and any two functions that differ on that set have expectations correspondingly close together.

Given some arbitrary $f$ , let $f_{m}$ be identical to $f$ on $C_{\frac{1}{m}}$ , (ie, uniformly continuous on that compact set), and extend it in an arbitrary uniformly continuous way to all of $X$ while staying in $[- | | f | |, | | f | |]$ , by the Tietze Extension Theorem.

Regardless of the $n$ , since $C_{\frac{1}{m}}$ is a $\frac{1}{m}$ -almost-support for $h_{n}$ , we have that

$| h_{n} (f) - h_{n} (f_{m}) | \leq \frac{2 | | f | |}{m}$

Why? Well, $f$ and $f_{m}$ are identical on a $\frac{1}{m}$ -almost support for $h_{n}$ , so the magnitude of their difference is proportional to $\frac{1}{m}$ , and the maximum level of difference between the two, and $f$ and $f_{m}$ are both in $[- | | f | |, | | f | |]$ , so they can differ by at most twice that much. The same result extends to the limit $h$ itself.

Because $| | f | |$ is bounded, and $n$ is arbitrary, we have that $h_{n} (f_{m})$ limits to $h_{n} (f)$ uniformly in $n$ .

Now, we can go:

${lim}_{n \to \infty} h_{n} (f) = {lim}_{n \to \infty} {lim}_{m \to \infty} h_{n} (f_{m})$

And now, to invoke the Moore-Osgood theorem to swap the two limits, we need two results. One is that, for all $m$ ,

${lim}_{n \to \infty} h_{n} (f_{m}) = h (f_{m})$

(which is true because $f_{m}$ was selected to be uniformly continuous).

The second result we need is that for all $n$ ,

${lim}_{m \to \infty} h_{n} (f_{m}) = h_{n} (f)$

uniformly in $n$ . Which is true. So, we can invoke the Moore-Osgood theorem and swap the two results, to get

$= {lim}_{m \to \infty} {lim}_{n \to \infty} h_{n} (f_{m})$

$= {lim}_{m \to \infty} h (f_{m}) = h (f)$

So, we have our final result that

${lim}_{n \to \infty} h_{n} (f) = h (f)$

For all continuous bounded functions $f$ , and we’re done.

Proposition 46: A set of infradistributions ${h_{i}}_{i \in I}$ is precompact in the topology induced by the IKR distance iff:
1:There’s an upper bound on the Lipschitz constant of all the infradistributions in the set
2: There’s a sequence of compact sets $C_{ϵ}$ , one for each $ϵ$ , that are compact $ϵ$ -almost-supports for all infradistributions in the set.
3: The set of infradistributions is b-uniform.

This proof will proceed in three phases. The first phase is showing that compactness implies conditions 1 and 2. The second phase is showing that a failure of condition 3 permits you to construct a sequence with no convergent subsequence, so a failure of condition 3 implies non-precompactness, and taking the contrapositive, precompactness implies condition 3. That gets us one half of the iff implication, that precompactness implies the three conditions. For the second half of the iff implication, we assume the three conditions, and construct a convergent subsequence.

So, for our first step, due to working in Hausdorff spaces, we can characterize precompactness as “is a subset of a compact set”

Also, the projection mapping of type

$C (M^{+} (X) \times R^{\geq 0}) \to K (M^{+} (X))$

Which takes a closed set of a-measures (an infradistribution) and projects it down (and takes the closure) to make a compact set of measures (by the compact-projection property), is Lipschitz (projection of sets down to one coordinate keeps their Hausdorff-distance the same or contracts it), so it’s continuous. So, a compact set of infradistributions (because the infra-KR metric is strongly equivalent to the Hausdorff-distance), would get mapped to a compact set of sets of measures (because the image of a compact set is compact), which by Lemma 3, unions together to make a compact set of measures.

Doing the same process (taking your precompact set of infradistributions, mapping it through the projection, unioning together all the sets) makes a subset of that compact set of measures, so it’s precompact.

Also, the necessary-and-sufficient condition for precompactness of a set of measures is that: There be a maximum amount of measure present, and for all $ϵ$ there is a compact set $C_{ϵ} \subseteq X$ where all the measures assign $\leq ϵ$ measure outside of that compact set.

So, if you take a precompact set of infradistributions, all the measure components of points in any of them have a uniform upper bound on the amount of measure present, and we also have the shared compact almost-support property. So, precompactness implies conditions 1 and 2.

Time for phase 2 of our proof, showing that a failure of condition 3 implies that there’s a sequence from it with no convergent subsequence in the KR-metric.

Assume, for contradiction, that we indeed have a precompact set which fails condition 3. Using I to index your set of infradistributions, Condition 3 is:

$\forall ϵ > 0 \exists b^{*} \forall i : d_{h a u} (H_{i}, H_{i}^{b^{*}}) \leq ϵ$

Where $H_{i}^{b^{*}}$ is the set formed from the set $H_{i}$ by deleting all points with $b > b^{*}$ and taking the upper completion again. Negating this, we see that the set of infradistribution sets $H_{i}$ failing this condition is stated as:

$\exists ϵ > 0 \forall b^{*} \exists i : d_{h a u} (H_{i}, H_{i}^{b^{*}}) > ϵ$

So, let $ϵ_{0}$ be your $ϵ$ of choice, and let $H_{n}$ be the infradistribution $H_{i}$ such that $d_{h a u} (H_{i}, H_{i}^{n}) \geq ϵ_{0}$ .

Because we’re assuming that this sequence of infradistributions was selected from a precompact set, we have a guarantee that the sequence $H_{n}$ has a convergent subsequence limiting to some $H_{\infty}$ . We’ll still be using n as our limiting variable, hopefully this doesn’t cause too much confusion.

Now, from our earlier proof of Proposition 44, we can crib two results from the proof. From this proof, we know that because $H_{n}$ limits to $H_{\infty}$ in Hausdorff-distance,

${lim}_{b^{*} \to \infty} d_{h a u} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) = 0$

and also,

${lim}_{n \to \infty} d_{h a u} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) = 0$

For any $b^{*} \geq 1$ . To craft this into a more usable form, we can realize that for all $b^{*}$ , $H_{\infty}^{b^{*}} \subseteq H_{\infty}$

So the distance from the former set to the latter set is 0. Also, any point in $H_{\infty}$ can be written as $(m, b)$ . Either $b \leq b^{*}$ , in which case the same point is present in $H_{\infty}^{b^{*}}$ and the distance to enter that set is 0, or $b > b^{*}$ , in which case the m component is present in ${m | (m, b) \in H_{\infty}}$ , and from

${lim}_{b^{*} \to \infty} d_{h a u} ({m | (m, b) \in H_{\infty}, b \leq b^{*}}, {m | (m, b) \in H_{\infty}}) = 0$

For large $b^{*}$ , you just have to adjust the $m$ component a little bit to $m^{'}$ and then you know there’s some $(m^{'}, b^{'}) \in H_{\infty}, b^{'} \leq b^{*}$ , so by upper completion, $(m^{'}, b) \in H_{\infty}^{b^{*}}$ , and this point is close to $(m, b)$ .

We took a point in $H_{\infty}^{b^{*}}$ and showed it’s in $H_{\infty}$ (trivially), and took a point in $H_{\infty}$ and showed there’s a nearby point in $H_{\infty}^{b^{*}}$ , so we have our modified result that:

${lim}_{b^{*} \to \infty} d_{h a u} (H_{\infty}^{b^{*}}, H_{\infty}) = 0$

For another modified result, due to the fact that we know

${lim}_{n \to \infty} d_{h a u} ({(m_{n}, b_{n}) \in H_{n} | b_{n} \leq b^{*}}, {(m, b) \in H_{\infty} | b \leq b^{*}}) = 0$

We can take any point in $H_{n}^{b^{*}}$ , descend to a point in $H_{n}$ (but cut off at $b^{*}$ ), shift over a bit to get to $H_{\infty}$ (but cut off at $b^{*}$ ), and add the same amount of $b$ value to this point as you took off, to make a point in $H_{\infty}^{b^{*}}$ that’s nearby to the point you started with, and flip the two sets, to argue that

$\forall b^{*} : {lim}_{n \to \infty} d_{h a u} (H_{n}^{b^{*}}, H_{\infty}^{b^{*}}) = 0$

Now, here’s what you do from here. We know our $ϵ_{0}$ value. Because of the fact that

${lim}_{b^{*} \to \infty} d_{h a u} (H_{\infty}^{b^{*}}, H_{\infty}) = 0$

we can identify some finite $b^{*}$ value (call it $b_{0}$ ) where, for it and all greater values,

$d_{h a u} (H_{\infty}^{b_{0}}, H_{\infty}) < \frac{ϵ_{0}}{3}$

Locking this value in, and because of

$\forall b^{*} : {lim}_{n \to \infty} d_{h a u} (H_{n}^{b^{*}}, H_{\infty}^{b^{*}}) = 0$

and $H_{n}$ limiting to $H_{\infty}$ , so

${lim}_{n \to \infty} d_{h a u} (H_{n}, H_{\infty}) = 0$

We can find some finite $n$ where, for all greater values,

$d_{h a u} (H_{n}^{b_{0}}, H_{\infty}^{b_{0}}) < \frac{ϵ_{0}}{3}$

and

$d_{h a u} (H_{n}, H_{\infty}) < \frac{ϵ_{0}}{3}$

There’s one last thing to note. The sequence $H_{n}$ was selected as a subsequence of a sequence of infradistributions selected so that the Hausdorff-distance between an infradistribution and its truncation of minimal points at a certain $b$ value was always $ϵ_{0}$ or more.

Accordingly let $ρ (n)$ be the value of the cutoff for $H_{n}$ (ie, the index of $H_{n}$ before we did the reindexing when we passed to a subsequence). Due to our construction process for the $H_{n}$ , we have that:

$\forall n : d_{h a u} (H_{n}, H_{n}^{ρ (n)}) \geq ϵ_{0}$

Further, $ρ (n)$ diverges to infinity, so there’s some $n$ where $ρ (n) \geq b_{0}$ . Because, for that $n$ , $H_{n}^{b_{0}} \subseteq H_{n}^{ρ (n)} \subseteq H_{n}$ , we have that $d_{h a u} (H_{n}^{b_{0}}, H_{n}) \geq d_{h a u} (H_{n}^{ρ (n)}, H_{n}) .$

Taking stock of all we have, we know that there is some n where:

$d_{h a u} (H_{\infty}^{b_{0}}, H_{\infty}}) < \frac{ϵ_{0}}{3}$

and

$d_{h a u} (H_{n}^{b_{0}}, H_{\infty}^{b_{0}}) < \frac{ϵ_{0}}{3}$

and

$d_{h a u} (H_{\infty}, H_{n}) < \frac{ϵ_{0}}{3}$

and

$d_{h a u} (H_{n}^{b_{0}}, H_{n}) \geq d_{h a u} (H_{n}^{ρ (n)}, H_{n})$

and, by our construction process for the $H_{n}$ sequence,

$d_{h a u} (H_{n}^{ρ (n)}, H_{n}) \geq ϵ_{0}$

So now we can go:

$ϵ_{0} \leq d_{h a u} (H_{n}^{ρ (n)}, H_{n}) \leq d_{h a u} (H_{n}^{b_{0}}, H_{n})$

$\leq d_{h a u} (H_{n}^{b_{0}}, H_{\infty}^{b_{0}}) + d_{h a u} (H_{\infty}^{b_{0}}, H_{\infty}) + d_{h a u} (H_{\infty}, H_{n}) < \frac{ϵ_{0}}{3} + \frac{ϵ_{0}}{3} + \frac{ϵ_{0}}{3} = ϵ_{0}$

But we just showed $ϵ_{0} > ϵ_{0}$ , a contradiction. Our one assumption that we made was that there could be a set of infradistributions that was both precompact and that failed to meet the shared b-uniformity condition. Therefore, if a set of infradistributions is precompact, it must fulfill the shared b-uniformity condition.

Because we’ve shown that precompactness implies a Lipschitz bound and shared compact-almost-support in part 1 of the proof, and that precompactness implies the shared b-uniformity condition, we have one direction of our iff statement. Precompactness implies these three properties.

Now we’ll go in the other direction and establish that if these three properties are fulfilled, then every sequence of infradistributions has a convergent subsequence.

So, let’s say we have some set of infradistributions $H_{i}$ that fulfills the following three properties:

$\exists λ^{⊙} \forall i, (m, b) \in H_{i} : m (1) \leq λ^{⊙}$

(this is bounded Lipschitz constant)

$\forall ϵ \exists C_{ϵ} \in K (X) \forall i, (m, b) \in H_{i} : m (X / C_{ϵ}) \leq ϵ$

(this is shared almost-compact-support)

$\forall ϵ \exists b^{*} \forall i : d_{h a u} (H_{i}, H_{i}^{b^{*}}) \leq ϵ$

(this is the b-uniformity condition)

Note that $H_{i}^{b^{*}}$ is $H_{i}$ but you chop off all the points in it with $b \geq b^{*}$ and regenerate it via upper-completion.

First, the compact almost-support condition and bounded amount of measure (and closure) are necessary-and-sufficient conditions for a set of measures to be compact. Thus, letting $Δ_{C, λ}$ be defined as:

${m \in M^{+} (X) | \forall ϵ : m (X / C_{ϵ}) \leq ϵ \land m (1) \leq λ^{⊙}}$

(ie, measures where the measure outside of the compact set $C_{ϵ}$ is $ϵ$ or less, for all $ϵ$ , and the amount of measure is upper-bounded by $λ^{⊙}$ , where that sequence of compact sets and measure upper bound came from the relevant sequence of compact sets and measure upper bound on the set ${H_{i} | i \in I}$ , from the fact that we assumed a Lipschitz upper bound and shared compact-almost-support for it).

We know that $Δ_{C, λ}$ is a compact set. All the measure components of all the points in all the $H_{i}$ lie in this set. Thus, all sets $H_{i}$ can be thought of as being a subset of the space $Δ_{C, λ} \times R^{\geq 0}$

In particular, all our $H_{n}$ (from our arbitrily selected sequence) are a subset of this space.

Now, here’s what we do. Fix any $m \geq 1$ . From the b-uniformity condition on the $H_{i}$ , there is some quantity $b_{m}$ where

$\forall i : d_{h a u} (H_{i}, H_{i}^{b_{m}}) \leq \frac{1}{m}$

What we’re going to do is find a subsequence of the $H_{n}$ sequence where the $H_{n}^{b_{m}}$ sequence converges in Hausdorff-distance.

Here’s how to do it. We can take each $H_{n}$ and chop it off at a $b$ value of $b_{m}$ , to make a closed set ${(H_{n}^{b_{m}})}^{'}$ which is a subset of $Δ_{C, λ} \times [0, b_{m}]$

Which, being a product of two compact sets, is compact. Further, the space of compact subsets of a compact space (equipped with a Hausdorff distance-metric) is compact. So, we can isolate some subsequence where the ${(H_{n}^{b_{m}})}^{'}$ sets converge in Hausdorff-distance. If sets converge in Hausdorff-distance, their upper completions do too, so we have isolated a subsequence of our $H_{n}$ sequence where the sets $H_{n}^{b_{m}}$ converge in Hausdorff-distance. Also, each $H_{n}^{b_{m}}$ infradistribution set is only $\frac{1}{m}$ Hausdorff-distance away, at most, from the corresponding $H_{n}$ . So, for sufficiently large $n$ , the $H_{n}$ subsequence we picked out is all wandering around in a ball of size $\frac{2}{m}$ .

Now, here’s what we do. Start with your $H_{n}$ sequence. Use the argument we described above for $m = 1$ to isolate a subsequence where the Hausdorff-distance of the subsequence eventually is wandering around in a ball (w.r.t. Hausdorff-distance) of size 2 in the tail. Now, use the argument for $m = 2$ to isolate a subsequence of that wandering around in a ball (w.r.t. Hausdorff-distance) of size 1 in the tail. And, y’know, repeat for all finite $m$ , to get a subsequence embedded in all previous subsequences which, in the tail, is wandering around in a ball of size $\frac{2}{m}$ in the tail.

Now build one final subsequence, which takes the first element of the $m = 1$ subsequence, the second element of the $m = 2$ subsequence, the third element of the $m = 3$ subsequence, and so on. It eventually enters the tail of the sequence for all finite $m$ , so, regardless of $m$ , the tail of that sequence starts wandering around in a ball of size $\frac{2}{m}$ . Thus, the sequence is actually Cauchy, and must converge, as we’ve previously shown that the space $□ X$ is complete in the KR/Hausdorff metric.

Assuming the three conditions on a set of infradistributions has let us show that every sequence has a convergent subsequence, and thus must be precompact, so we have the reverse direction of our iff statement and we’re done.

Proposition 47: When $X$ is a compact Polish space, the spaces of cohomogenous, crisp, and sharp infradistributions are all compact in $□ X$ equipped with the infra-KR metric.

So, from Proposition 46, necessary-and-sufficient conditions for a set of infradistributions to be compact is:

1: Bounded Lipschitz constant/bounded amount of measure on minimal points. 1-Lipschitz, C-additive, cohomogenous, crisp, and sharp infradistributions fulfill this because of their iff minimal point characterizations.

2: Shared compact almost-supports. $X$ is compact by assumption, and it’s the whole space so it must be a support of everything, and thus an $ϵ$ -almost-support of everything, so this is trivially fulfilled for all infradistributions when $X$ is compact.

3: b-uniformity. Homogenous, cohomogenous, crisp, and sharp infradistributions fulfill this because they all have their minimal points having $b \leq 1$ , and the condition is “there’s gotta be some $b$ value you can go up to in order to have a guarantee of being within $ϵ$ of the full $H$ set in Hausdorff-distance if you delete all the minimal points with a higher $b$ value, for all $ϵ$ ”.

Thus, cohomogenous, crisp, and sharp infradistributions fulfill the necessary-and-sufficient conditions for precompactness, and all we need is to check that the set of them is closed in the KR-metric.

To do this, we’ll invoke Proposition 45, that: If a sequence of infradistributions $h_{n}$ converges to $h$ in the infra-KR distance, then for all bounded continuous functions $f$ , ${lim}_{n \to \infty} h_{n} (f) = h (f)$ .

The characterization for cohomogenity was that $h (1 + a f) = 1 - a + a h (1 + f)$ So, we can go:

$h (1 + a f) = {lim}_{n \to \infty} h_{n} (1 + a f) = {lim}_{n \to \infty} 1 - a + a h_{n} (1 + f)$

$= 1 - a + a {lim}_{n \to \infty} h_{n} (1 + f) = 1 - a + a h (1 + f)$

Showing that the limit of cohomogenous infradistributions is cohomogenous, and we’ve verified closure, which is the last property we needed for cohomogenity.

The characterization for crispness was that: $h (c + a f) = c + a h (f)$ for $c \in R, a \geq 0$ . To show it’s preserved under limits, we can go:

$h (c + a f) = {lim}_{n \to \infty} h_{n} (c + a f) = {lim}_{n \to \infty} c + a h_{n} (f) = c + a {lim}_{n \to \infty} h_{n} (f) = c + a h (f)$

Showing that the limit of crisp infradistributions is crisp, and we’ve verified closure. Sharpness is a bit more tricky.

Let’s say a sequence of sharp infradistributions $h_{n}$ limits to $h$ , and all the $h_{n}$ are associated with the compact set $C_{n} \subseteq X$ . The minimal points of the $h_{n}$ consist of all probability distributions supported over $C_{n}$ , with a $b$ value of 0. Thus, all the $H_{n}$ sets can be written as $Δ C_{n} \times R^{\geq 0}$ , and so, if they converge in Hausdorff-distance, then the sets of probability distributions $Δ C_{n}$ must converge in Hausdorff-distance, which is impossible if $C_{n}$ don’t converge in Hausdorff-distance, because the dirac-delta distributions on points in the $C_{n}$ sets can transport a failure of Hausdorff-convergence of the $C_{n}$ sets up to a failure of Hausdorff-convergence of the $Δ C_{n}$ sets of probability distributions.

Thus, the $C_{n}$ converge to a compact set $C_{\infty}$ in Hausdorff-distance.

We also know that, because sharp infradistributions are crisp infradistributions, and crisp infradistributions are preserved under limits, all we have to check is if the minimal points of $H_{\infty}$ consist exactly of all probability distributions supported over $C_{\infty}$ . Now, $Δ C_{\infty}$ is the closed convex hull of all the dirac-delta distributions on points in $C_{\infty}$ , and all those points have a sequence from the $C_{n}$ that converge to them, so the associated dirac-delta distributions converge and witness that all the dirac-delta distributions on points in $C_{\infty}$ are present in the set $H_{\infty}$ . So, because infradistribution sets are closed and convex, all of $Δ C_{\infty}$ must be present as minimal points in $H_{\infty}$ . Now we just need to rule out the presence of additional points.

Let’s say we’ve got some probability distribution $μ \in H_{\infty}$ which is not supported entirely on $C_{\infty}$ , there’s $ϵ$ probability mass outside that set. Because probability distributions in Polish spaces have the property that the probability on open supersets of the set of interest can be shrunk down to have arbitrarily similar measure, we can find some open superset of $C_{\infty}$ , call it $O$ , which has $\frac{ϵ}{2}$ probability mass outside of it. Any point outside of $O$ must be some $δ$ distance away from $C_{\infty}$ , because otherwise, you could pick a sequence of points in $C_{\infty}$ which gets arbitrarily close to the (closed) complement of $O$ , find a convergent subsequence since $C_{\infty}$ is compact, and you’d have a limit point which is in $C_{\infty}$ (due to closure) and also in the complement of $O$ (due to getting arbitrarily close to said closed set), disproving that the two sets are disjoint (because $O$ is a superset of $C_{\infty}$ )

Ok, so our hypothetical “bad” probability distribution has $\frac{ϵ}{2}$ probability measure at a distance of $δ$ or more from our set of interest, $C_{\infty}$ . The KR distance is equivalent to the earthmover distance, which is “how much effort would it take to move this pile of dirt/pile of probability mass into the other distribution/pile of dirt”.

All minimal points in $H_{\infty}$ must have a sequence of minimal points in $H_{n}$ limiting to them, because it’s the Hausdorff-limit of those infradistributions. So, we’ve got some sequence $μ_{n}$ limiting to our hypothetical bad distribution $μ \notin C_{\infty}$ , but all the $μ_{n}$ lie in $Δ C_{n}$ .

There is some $n$ value where $d_{K R} (μ, μ_{n}) < \frac{δ ϵ}{4}$ , and also where $d_{h a u} (C_{\infty}, C_{n}) < \frac{δ ϵ}{4}$ . Now, we can get something really interesting.

So, we agree that $μ$ has $\frac{ϵ}{2}$ probability mass a distance of $δ$ or more away from the set $C_{\infty}$ , right? This means that the earthmover distance from $μ$ to any point in $Δ C_{\infty}$ must be $\frac{δ ϵ}{2}$ or more, because you’ve gotta move $\frac{ϵ}{2}$ measure a distance of $δ$ at the very least.

However, the earthmover distance from $μ$ to $μ_{n}$ is strictly below $\frac{δ ϵ}{4}$ , and because $μ_{n} \in Δ C_{n}$ , it’s only got an earthmover distance of less than $\frac{δ ϵ}{4}$ to go to arrive at a probability distribution in $Δ C_{\infty}$ , because all dirt piled up in $C_{n}$ is only $\frac{δ ϵ}{4}$ distance away from $C_{\infty}$ . So, the distance from $μ$ to $Δ C_{\infty}$ is only

$< \frac{δ ϵ}{4} + \frac{δ ϵ}{4} = \frac{δ ϵ}{2}$

distance. But we know it’s impossible for it to be any closer than $\frac{δ ϵ}{2}$ distance from that set, so we have a contradiction, and no such $μ$ can exist in $H_{\infty}$ . Thus, $H_{\infty}^{min}$ has all the probability distributions over $C_{\infty}$ and nothing else, so the limit of sharp infradistributions is sharp, and we’re done.