Lemma 1:prΓ×Φ(χy∈α(s∗(θ)))∈Θ occurs for all s:Γ→Γ iff, for all g:Γ×Φ→[0,1] and s:Γ→Γ, θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
This lemma will be implicitly used all over the place, in order to deal with the “presence in the bridge transform” condition algebraically, in terms of function expectations. The bulk of the conditions for a contribution to lie in the bridge transform is this endofunction condition (the support condition is trivial most of the time), so it’s advantageous to reformulate it. First-off, we have that
prΓ×Φ(χy∈α(s∗(θ)))∈Θ
iff, for all g:Γ×Φ→[0,1], we have
prΓ×Φ(χy∈α(s∗(θ)))(λyx.g(y,x))≤Θ(λyx.g(y,x))
By LF-duality for ultradistributions, proved in Less Basic Inframeasure theory. A contribution lies in an ultracontribution set iff its expectations, w.r.t. all functions, are less than or equal to the ultracontribution expectations.
Now, we just need to unpack the left-hand-side into our desired form. Start off with
prΓ×Φ(χy∈α(s∗(θ)))(λyx.g(y,x))
Apply how projections work
=χy∈α(s∗(θ))(λyαx.g(y,x))
Now, we can move the indicator function into the function we’re taking the expectation again, because there’s no difference between deleting all measure outside of an event, or taking the expectation of a function that’s 0 outside of that event. So, we get
=s∗(θ)(λyαx.χy∈αg(y,x))
Then we use how pushforwards are defined in probability theory.
=θ(λyαx.χs(y)∈αg(s(y),x))
And that’s our desired form. So, we’ve shown that for any particular s:Γ→Γ, we have prΓ×Φ(χy∈α(s∗(θ)))∈Θ iff, for all g:Γ×Φ→[0,1],
θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
And so, we get our desired iff statement by going from the iff for one s to the iff for all s. ■
Proposition 2.1:For any Γ, Φ and Θ∈□c(Γ×Φ),
Br(Θ) exists and satisfies prΓ×ΦBr(Θ)=Θ.
In particular, if Θ∈□(Γ×Φ) then Br(Θ)∈□(elΓ×Φ).
Proof sketch: We’ll show that for any particular contribution in θ∈Θ, there’s a contribution θ∗ which lies within Br(Θ) that projects down to equal θ. And then, show the other direction, that any contribution in Br(Θ) lands within Θ when you project it down. Thus, the projection of Br(Θ) must be Θ exactly.
For the first direction, given some θ∈Θ, let θ∗:=θ⋉(λy.{y}). Note that since θ∈Δc(Γ×Φ), this means that θ∗∈Δc(Γ×2Γ×Φ), so the type signatures line up. Clearly, projecting θ∗ down to Γ×Φ makes θ again. So that leaves showing that θ∗∈Br(Θ). Applying Lemma 1, an equivalent way of stating the bridge transform is that it consists precisely of all the θ′∈Δc(Γ×2Γ×Φ) s.t. for all s:Γ→Γ and g:Γ×Φ→[0,1],
θ′(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
and also, supp θ′⊆elΓ×Φ.
Clearly, for our given θ∗, everything works out with the support condition, so that leaves the endofunction condition. Let s,g be arbitrary.
θ∗(λyαx.χs(y)∈αg(s(y),x))=(θ⋉(λy.{y}))(λyαx.χs(y)∈αg(s(y),x))=θ(λyx.δ{y}(λα.χs(y)∈αg(s(y),x)))=θ(λyx.χs(y)∈{y}g(s(y),x))=θ(λyx.χs(y)=yg(s(y),x))=θ(λyx.χs(y)=yg(y,x))≤θ(λyx.g(y,x))≤maxθ∈Θθ(λyx.g(y,x))=Θ(λyx.g(y,x))
In order, the equalities were unpacking the semidirect product, substituting the dirac-delta in, reexpressing the condition for the indicator function, using that s(y)=y inside the indicator function, applying monotonicity of θ, using that θ∈Θ, and then packing up the definition of Θ.
And that inequality has been fulfilled, so, yes, θ∗ lies within Br(Θ). Since θ was arbitrary within Θ, this shows that the projection set is as-big-or-bigger than Θ.
Now to show the reverse direction, that anything in the projection set lies within Θ. For any particular θ′∈Θ, remember, it must fulfill, for all g,s, that
θ′(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
So, in particular, we can let s:Γ→Γ be the identity function, and g be anything, and we’d get
θ′(λyαx.χy∈αg(y,x))≤Θ(λyx.g(y,x))
And, since θ′ is in Br(Θ), it’s supported on the (y,α) pairs s.t. y∈α, so that indicator function drops out of existence, and we get
θ′(λyαx.g(y,x))≤Θ(λyx.g(y,x))
And then this can be written as a projection
prΓ×Φ(θ′)(λyx.g(y,x))≤Θ(λyx.g(y,x))
And since this holds for all the various g, this means that
prΓ×Φ(θ′)∈Θ
And we’re done with the other direction of things, the projection must be exactly equal to Θ since projections of contributions in Br(Θ) always land in Θ, and we can surject onto Θ. ■
Proposition 2.2:Let X be a poset, θ,η∈ΔcX.
Then, θ⪯η if and only if there exists κ:X→ΔX
s.t.:
For all x∈X, y∈supp κ(x): x≤y.
κ∗θ≤η
Vanessa proved this one, with a very nice proof involving the max-flow min-cut theorem.
Make the following directed graph: The nodes are the elements of {s}∪{t}∪X×{0,1}. Basically, two copies of the finite poset X, a source node s, and a sink node t. x will be used to denote a variable from X, while a subscript of 0 or 1 denotes that the 0 or 1 version of that x, respectively.
As for the edges, there is an edge from s to every point in X0. And an edge from every point in X1 to t. And, for some x0∈X0 and y1∈X1, there’s an edge iff x≤y. The capacities of the edges are, for all x∈X, cs,x0:=θ(x), and for all x∈X, cx1,t:=η(x), and for x,y∈X s.t. x≤y, cx0,y1:=∞.
To work up to applying min-cut max-flow, we must show that the cut of all the edges between s and X0 is a min-cut of the network.
Remember, all cuts are induced by partitions of the nodes into two sets, S and T, where S includes the source s and T includes the sink t. And the value of a cut is
c(S,T)=∑a∈S,b∈Tca,b=∑x0∈Tcs,x0+∑x0∈S,y1∈T:x≤ycx0,y1+∑y1∈Scy1,t
Implicitly converting both of the following sets to be subsets of X, we have (S∩X1)⊇(S∩X0)↑ for any non-infinite cut. Why? Well, if it was false, then there’d be some y≥x where x0∈S, and yet y1∉S. So y1∈T. Then, the cost of the cut would include cx0,y1=∞, contradiction. So, now, we have that for any non-infinite cut,
c(S,T)=∑a∈S,b∈Tca,b=∑x0∈Tcs,x0+∑y1∈Scy1,t
and (S∩X1)⊇(S∩X0)↑ holds.
Now, we’ll change our cut. Given some cut (S,T), let our new cut (S′,T′) be defined as S′:={s}∪((S∩X0)↑×{0,1}) (which still respects that superset property listed above since both sets would be the same), and T′ be the complement. Letting some stuff cancel out, we get
c(S,T)−c(S′,T′)=∑x0∈Tcs,x0+∑y1∈Scy1,t−∑x0∈T′cs,x0−∑y1∈S′cy1,t=∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t−∑x0∈T′/Tcs,x0−∑y1∈S′/Scy1,t
Now, because S′∩X1=(S∩X0)↑×{1}⊆S∩X1 by that subset property, it means that there’s no y1∈S′/S, so we get
=∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t−∑x0∈T′/Tcs,x0
As for that other minus term, we have that S′∩X0=(S∩X0)↑×{0}⊇S∩X0
by how it was defined, so T′∩X0⊆T∩X0, so that other minus term is 0, and we get
=∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t≥0
Rearranging this a bit, we have c(S,T)≥c(S′,T′).
And now we’ll show that c(S′,T′) is underscored by c({s},(X×{0,1})∪{t}). Let’s go. We have
c(S′,T′)−c({s},(X×{0,1})∪{t})=∑x0∈T′cs,x0+∑y1∈S′cy1,t−∑x0∈(X×{0,1})∪{t}cs,x0−∑y1∈{s}cy1,t
This simplifies a bit as
=∑x0∈T′cs,x0+∑y1∈S′cy1,t−∑x0∈X0cs,x0
This can be rewritten a bit as
=⎛⎝∑x0∈X0cs,x0−∑x0∈S′cs,x0⎞⎠+∑y1∈S′cy1,t−∑x0∈X0cs,x0=∑y1∈S′cy1,t−∑x0∈S′cs,x0
And then, using what S′ is defined as, we can get
=∑y∈(S∩X0)↑cy1,t−∑x∈(S∩X0)↑cs,x0
and using what the costs are, it’s
=∑y∈(S∩X0)↑η(y)−∑x∈(S∩X0)↑θ(y)=η((S∩X0)↑)−θ((S∩X0)↑)≥0
This holds because θ⪯η and the indicator function for (S∩X0)↑ is monotone. And so, we get c(S′,T′)≥c({s},(X×{0,1})∪{t}). And we previously showed that c(S,T)≥c(S′,T′). So, this means that the cut around s is a minimal cut, it underscores all other cuts.
By the max-flow min-cut theorem, there’s a way of having stuff flow from s to t that saturates the capacities of all the edges that are cut.fx0,y1 will be used to denote the flow from x0 to y1 according to this max-flow way. Let’s finish things up.
Define κ:X→ΔX as follows. For some x, if fs,x0>0, then
κ(x)(y):=fx0,y1fs,x0
If fs,x0=0, let κ(x) be any probability distribution on x↑. Note that κ(x) is always a probability distribution supported on x↑, by fiat if fs,x0=0, and otherwise,
∑y∈Xκ(x)(y)=∑y∈Xfx0,y1fs,x0=fs,x0fs,x0=1
This is because, the flow out of x0 must equal the flow into x0 from the source. And κ(x) is supported on x↑ because the only edges out of x0 go to y1 where y≥x. Now that we’ve got this demonstrated, we’ll show our desired inequality. Fix an arbitrary y. We have
κ∗(θ)(y)=∑x∈Xθ(x)⋅κ(x)(y)=∑x∈Xcs,x0⋅fx0,y1fs,x0
And then we use that all the capacities of the edges cut in the min-cut are saturated according to the max-flow plan, so cs,x0=fs,x0, and we have
=∑x∈Xfs,x0⋅fx0,y1fs,x0
Now, if fs,x0=0, then because the flow in equals the flow out, that means that fx0,y1=0, and otherwise we can cancel out, so we can rewrite as
=∑x∈Xfx0,y1=fy1,t≤cy1,t=η(y)
Where the first equality came from flow in equals flow out, the inequality came from the flow plan respecting the capacity of the paths, and the equality came from how the capacities were defined. So, for all y∈X, we have κ∗(θ)(y)≤η(y) so we have κ∗(θ)≤η.
For the reverse direction, if there’s some κ:X→ΔX s.t κ(x) is supported on x↑ s.t. κ∗(θ)≤η, then for any monotone function f, we have
η(f)≥κ∗(θ)(f)=θ(λx.κ(x)(f))
And then we use that, since κ(x) is supported on x↑, and f is monotone, f(x) is less than or equal to the expectation of f w.r.t κ(x) (remember, for κ(x) you have a 100 percent chance of drawing something at-or-above x, which guarantees that f of whatever you picked is above f(x)). And so, we get
≥θ(f)
And since this holds for every monotone f, we have θ⪯η.■
Proposition 2.3:Let X be a poset, θ,η∈ΔcX.
Then, θ⪯η if and only if there exists κ:X→ΔX
s.t.
For all x∈X, y∈supp κ(x): x≥y.
θ≤κ∗η
Proof: Because θ⪯η, we can get a maximum flow in exactly the same way as Proposition 2.2. Then, just flip the direction of all flows, which will be denoted by swapping the order of the subscripts. Now, define
κ(y)(x):=fy1,x0ft,y1
And, again, if it’s 0, it should be an arbitrary probability distribution supported on y↓.
Note that κ(y) is always a probability distribution supported on y↓, by fiat if f′t,y1=0, and otherwise,
∑x∈Xκ(y)(x)=∑x∈Xfy1,x0ft,y1=ft,y1ft,y1=1
This is because, the flow out of y1 must equal the flow into y1 from the source t (the old sink). And κ(y) is supported on y↓ because the only edges out of y1 go to x0 where y≥x. Now that we’ve got this demonstrated, we’ll show our desired inequality. Fix an arbitrary x. We have
κ∗(η)(x)=∑y∈Xη(y)⋅κ(y)(x)=∑y∈Xcy1,t⋅fy1,x0ft,y1
Then use that cy1,t≥ft,y1 (because even with the flow reversed, the flow through a path must be less than or the same as the capacity. Accordingly, we get
≥∑y∈Xfy1,x0=fx0,s=cs,x0=θ(x)
And we’re done, inequality established, using definitions and the flow saturating all the paths out of s.
So, for all x∈X, we have κ∗(η)(x)≥θ(x) so we have κ∗(η)≥θ.
For the reverse direction, if there’s some κ:X→ΔX s.t κ(y) is supported on y↓ s.t. κ∗(η)≥θ, then for any monotone function f, we have
θ(f)≤κ∗(η)(f)=η(λx.κ(x)(f))
And then we use that, since κ(x) is supported on x↓, and f is monotone, f(x) is greater than or equal to the expectation of f w.r.t κ(x) (remember, for κ(x) you have a 100 percent chance of drawing something at-or-below x, which guarantees that f of whatever you picked is below f(x)). And so, we get
≤η(f)
And since this holds for every monotone f, we have θ⪯η. ■
Proposition 2.4:For any Γ, Φ and Θ∈□c(Γ×Φ),Br(Θ)
is downwards closed w.r.t. the induced order on Δc(elΓ×Φ).
That is, if θ∈Br(Θ) and η⪯θ then η∈Br(Θ).
Remember, the condition for some θ to lie in Br(Θ) is that it be supported on elΓ, and that, for all s:Γ→Γ, and g:Γ×Φ→[0,1],
θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
So, given that η⪯θ (lower score for all monotone functions) we’ll show that η fulfills both conditions. The support thing is taken care of by η∈Δc(elΓ×Φ). As for the other one, we have
η(λyαx.χs(y)∈αg(s(y),x))≤θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
That inequality occurs because as you make α larger, ie, go up in the partial ordering, the value assigned to the relevant point increases since s(y)∈α is more likely now, so the function value increases from 0 to something that may be greater than 0. So, it’s a monotone function, and we then use that η⪯θ′. ■
Proposition 2.5:Consider a finite set X, ϕ?∈ΔX,
ϕ!:{0,1}→ΔX, p∈[0,1] and ϕ:=(1−p)ϕ?+pϕ!.
Then, p≥dTV(ϕ(0),ϕ(1)). Conversly, consider any ϕ:{0,1}→ΔX.
Then, there exist ϕ?∈ΔX and ϕ!:{0,1}→ΔX
s.t.ϕ=(1−p)ϕ?+pϕ! for p:=dTV(ϕ(0),ϕ(1)).
Ok, so let f be some arbitrary function X→[0,1]. We have an alternate characterization of the total variation distance as
dTV(ϕ(0),ϕ(1)):=supf∈X→[0,1]|ϕ(0)(f)−ϕ(1)(f)|
And then from there we can go
=supf∈X→[0,1]|((1−p)ϕ?(f)+pϕ!(0)(f))−((1−p)ϕ?(f)+pϕ!(1)(f))|=supf∈X→[0,1]|pϕ!(0)(f)−pϕ!(1)(f)|=psupf∈X→[0,1]|ϕ!(0)(f)−ϕ!(1)(f)|=pdTV(ϕ!(0),ϕ!(1))
And since p≥pdTV(ϕ!(0),ϕ!(1)), we have p≥dTV(ϕ(0),ϕ(1)) and are done.
Conversely, let the value of p be 1−(ϕ(0)∧ϕ(1))(1), and ϕ? be 1(1−p)(ϕ(0)∧ϕ(1)). It is clear that this is a probability distribution because of how p was defined. The ∧ is the minimum/common overlap of the two probability distributions. Then, let ϕ!(0)=1p(ϕ(0)−(ϕ(0)∧ϕ(1))), and similar for the 1. Well, as long as p>0. If p=0, it can be any probability distribution you want. It’s a probability distribution because
ϕ!(0)(1)=1p(ϕ(0)(1)−(ϕ(0)∧ϕ(1))(1))=1p(1−(ϕ(0)∧ϕ(1))(1))=1−(ϕ(0)∧ϕ(1))(1)1−(ϕ(0)∧ϕ(1))(1)=1
And since p>0, everything works out. Now we just need to show that these add up to make ϕ and that p is the same as the total variation distance.
ϕ(0)=(ϕ(0)∧ϕ(1))+ϕ(0)−(ϕ(0)∧ϕ(1))=(1−p)1(1−p)(ϕ(0)∧ϕ(1))+p1p(ϕ(0)−(ϕ(0)∧ϕ(1)))=(1−p)ϕ?+pϕ!(0)
And this works symmetrically for ϕ(1), showing that we indeed have equality. As for showing that p is the total variation distance, referring back to what we’ve already proved, we have
dTV(ϕ(0),ϕ(1))=pdTV(ϕ!(0),ϕ!(1))
And now, since ϕ!(0)=ϕ(0)−(ϕ(0)∧ϕ(1)) and ϕ!(1)=ϕ(1)−(ϕ(0)∧ϕ(1)), the supports of these two probability distributions are disjoint, which implies that the total variation distance is 1, so we have dTV(ϕ(0),ϕ(1))=p.
Proposition 2.6:Consider any Φ and ϕ:{0,1}→ΔΦ.
Denote U:={0,1}×{{0,1}}×Φ (the event ``program
is unrealized″). Let Λ:=Br(⊤{0,1}⋉ϕ).
Then,Λ(χU)=1−dTV(ϕ(0),ϕ(1))
This will take some work to establish. One direction, showing that the bridge transform value exceeds the total variation distance value, is fairly easy. As for the other direction, it’ll take some work. Let’s begin.
Λ(χU)=Br(⊤{0,1}⋉ϕ)(χU)=Br(⊤{0,1}⋉ϕ)(λyαx.χα={0,1})=maxθ∈Br(⊤{0,1}⋉ϕ)θ′(λyαx.χα={0,1})
We’ll make a particular contribution θ∗. It is defined as
δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1}
Or, put another way, restricting to 0,{0}, it’s ϕ(0)−ϕ(0)∧ϕ(1), and conditioning on 0,{0,1}, it’s ϕ(0)∧ϕ(1). So, for a given s:{0,1}→{0,1}, we have
θ∗(λyαx.χs(y)∈αg(s(y),x))=(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χs(y)∈αg(s(y),x))=(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0})(λyαx.χs(y)∈αg(s(y),x))+(δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χs(y)∈αg(s(y),x))=(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χs(0)∈{0}g(s(0),x))+(ϕ(0)∧ϕ(1))(λx.χs(0)∈{0,1}g(s(0),x))
Now, we’ll go through two exhaustive possibilities for what s could be. If it maps 0 to 0, then it’s
=(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ0∈{0}g(0,x))+(ϕ(0)∧ϕ(1))(λx.χ0∈{0,1}g(0,x))=(ϕ(0)−ϕ(0)∧ϕ(1))(λx.g(0,x))+(ϕ(0)∧ϕ(1))(λx.g(0,x))=ϕ(0)(λx.g(0,x))≤maxy∈{0,1}ϕ(y)(λx.g(y,x))=⊤{0,1}(λy.ϕ(y)(λx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.g(y,x))
And our desired inequality is established. If s maps 0 to 1, then it’s
=(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ1∈{0}g(1,x))+(ϕ(0)∧ϕ(1))(λx.χ1∈{0,1}g(1,x))=(ϕ(0)∧ϕ(1))(λx.g(1,x))≤ϕ(1)(λx.g(1,x))≤maxy∈{0,1}ϕ(y)(λx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.g(y,x))
And that is taken care of. So, our θ∗ lies in Br(⊤{0,1}⋉ϕ).
Now, where were we? Ah right, we were at
Λ(χU)=maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1})
But now we can continue with
≥θ∗(λyαx.χα={0,1})
Which unpacks as
=(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χα={0,1})=(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0})(λyαx.χα={0,1})+(δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χα={0,1})=(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ{0}={0,1})+(ϕ(0)∧ϕ(1))(λx.χ{0,1}={0,1})=(ϕ(0)∧ϕ(1))(1)
And then, the value of the overlap between two distributions is 1−dTV(ϕ(0),ϕ(1)). So, we’ve shown one direction, we have
Λ(χU)≥1−dTV(ϕ(0),ϕ(1))
In the other direction of the equality we’re trying to prove, we need to constrain what θ might be. Starting off with what we definitely know, we have
Λ(χU)=maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1})
Now it’s time to show that for any θ∈Br(⊤{0,1}⋉ϕ), that the measure on α={0,1} can’t be too high. Specifically, to proceed further, we’ll need to show that
∀x′∈Φ:θ(λyαx.χx=x′∧α={0,1})≤(ϕ(0)∧ϕ(1))(λx.χx=x′)
Time to establish it. On x′, either ϕ(0)(x′)≥ϕ(1)(x′), or vice-versa. Without loss of generality, assume that it’s ϕ(0)(x′) that’s lower.
Then, let s be the constant-0 function, and g(0,x′) be 1, and 0 everywhere else.
Then, we have
θ(λyαx.χx=x′∧α={0,1})≤θ(λyαx.χs(y)∈αg(s(y),x))≤(⊤{0,1}⋉ϕ)(λyx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.χy=0∧x=x′)=maxy∈{0,1}ϕ(y)(λx.χy=0∧x=x′)=ϕ(0)(λx.χx=x′)=ϕ(0)(x′)=(ϕ(0)∧ϕ(1))(x′)
Just abbreviating, using that θ lies in the bridge transform, deabbreviating with what g is, unpacking things a little bit and canceling out. At the end we used that ϕ(0)(x′)≤ϕ(1)(x′).
Ok, so, now that we’ve established the key fact that
∀x′∈Φ:θ(λyαx.χx=x′∧α={0,1})≤(ϕ(0)∧ϕ(1))(λx.χx=x′)
We can resume our work. We were previously at
=maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1})
and we can proceed to
=maxθ∈Br(⊤{0,1}⋉ϕ)∑x′∈Φθ(λyαx.χx=x′∧α={0,1})
and from there, using the inequality we proved, proceed to
≤∑x′∈Φ(ϕ(0)∧ϕ(1))(λx.χx=x′)=(ϕ(0)∧ϕ(1))(1)=1−dTV(ϕ(0),ϕ(1))
And we’re done, we established that it’s an upper bound as well a lower bound, so we have equality. ■
Lemma 2:If there’s a function h:Φ→2Γ and Θ(λyx.χy∉h(x))=0, then for all θ∈Br(Θ), θ is supported on the set {α,x|α⊆h(x)}.
Assume that the conclusion is false, that there is some nonzero probability of drawing a x∗,α∗ pair where α∗⊈h(x∗). In particular, there must be some special y∗∈Γ value that witnesses that α∗ isn’t a subset, by y∗∈α∗ and y∗∉h(x∗). Remember that for all s:Γ→Γ and g:Γ×Φ→[0,1], that since θ∈Br(Θ), we have
θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
Now, in particular, let g:=χy∉h(x), and s be the constant function that maps everything to y∗. Then, this turns into
0<θ(λyαx.χy∗∈αχy∗∉h(x))≤Θ(λyx.χy∉h(x))=0
which is impossible. The equality as the end was our starting assumption. The middle inequality was just specializing our inequality to a particular pair of functions. And it’s greater than 0 because there’s a nonzero probability of drawing x∗,α∗. And y∗ was selected to lie in α∗ and outside of h(x∗), so there’s a nonzero probability of getting a nonzero value. Therefore, the result follows. ■
Lemma 3:For any s:Γ→Γ, and Θ:□c(Γ×Φ), if θ∈Br(Θ), then χelΓ(s∗(θ′))∈Br(Θ).
So, the support condition is trivially fulfilled, because we’re restricting to the event y∈α. That just leaves the endofunction condition. Let s′:Γ→Γ be arbitrary, and g:Γ×Φ→[0,1] be arbitrary. Then we have
χelΓ(s∗(θ))(λyαx.χs′(y)∈αg(s′(y),x))=s∗(θ
Infra-Bayesian physicalism: proofs part I
This post is an appendix to “Infra-Bayesian physicalism: a formal theory of naturalized induction”.
Lemma 1: prΓ×Φ(χy∈α(s∗(θ)))∈Θ occurs for all s:Γ→Γ iff, for all g:Γ×Φ→[0,1] and s:Γ→Γ, θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x))
This lemma will be implicitly used all over the place, in order to deal with the “presence in the bridge transform” condition algebraically, in terms of function expectations. The bulk of the conditions for a contribution to lie in the bridge transform is this endofunction condition (the support condition is trivial most of the time), so it’s advantageous to reformulate it. First-off, we have that prΓ×Φ(χy∈α(s∗(θ)))∈Θ iff, for all g:Γ×Φ→[0,1], we have prΓ×Φ(χy∈α(s∗(θ)))(λyx.g(y,x))≤Θ(λyx.g(y,x)) By LF-duality for ultradistributions, proved in Less Basic Inframeasure theory. A contribution lies in an ultracontribution set iff its expectations, w.r.t. all functions, are less than or equal to the ultracontribution expectations.
Now, we just need to unpack the left-hand-side into our desired form. Start off with prΓ×Φ(χy∈α(s∗(θ)))(λyx.g(y,x)) Apply how projections work =χy∈α(s∗(θ))(λyαx.g(y,x)) Now, we can move the indicator function into the function we’re taking the expectation again, because there’s no difference between deleting all measure outside of an event, or taking the expectation of a function that’s 0 outside of that event. So, we get =s∗(θ)(λyαx.χy∈αg(y,x)) Then we use how pushforwards are defined in probability theory. =θ(λyαx.χs(y)∈αg(s(y),x)) And that’s our desired form. So, we’ve shown that for any particular s:Γ→Γ, we have prΓ×Φ(χy∈α(s∗(θ)))∈Θ iff, for all g:Γ×Φ→[0,1], θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x)) And so, we get our desired iff statement by going from the iff for one s to the iff for all s. ■
Proposition 2.1: For any Γ, Φ and Θ∈□c(Γ×Φ), Br(Θ) exists and satisfies prΓ×ΦBr(Θ)=Θ. In particular, if Θ∈□(Γ×Φ) then Br(Θ)∈□(elΓ×Φ).
Proof sketch: We’ll show that for any particular contribution in θ∈Θ, there’s a contribution θ∗ which lies within Br(Θ) that projects down to equal θ. And then, show the other direction, that any contribution in Br(Θ) lands within Θ when you project it down. Thus, the projection of Br(Θ) must be Θ exactly.
For the first direction, given some θ∈Θ, let θ∗:=θ⋉(λy.{y}). Note that since θ∈Δc(Γ×Φ), this means that θ∗∈Δc(Γ×2Γ×Φ), so the type signatures line up. Clearly, projecting θ∗ down to Γ×Φ makes θ again. So that leaves showing that θ∗∈Br(Θ). Applying Lemma 1, an equivalent way of stating the bridge transform is that it consists precisely of all the θ′∈Δc(Γ×2Γ×Φ) s.t. for all s:Γ→Γ and g:Γ×Φ→[0,1], θ′(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x)) and also, supp θ′⊆elΓ×Φ.
Clearly, for our given θ∗, everything works out with the support condition, so that leaves the endofunction condition. Let s,g be arbitrary. θ∗(λyαx.χs(y)∈αg(s(y),x))=(θ⋉(λy.{y}))(λyαx.χs(y)∈αg(s(y),x)) =θ(λyx.δ{y}(λα.χs(y)∈αg(s(y),x)))=θ(λyx.χs(y)∈{y}g(s(y),x)) =θ(λyx.χs(y)=yg(s(y),x))=θ(λyx.χs(y)=yg(y,x))≤θ(λyx.g(y,x)) ≤maxθ∈Θθ(λyx.g(y,x))=Θ(λyx.g(y,x)) In order, the equalities were unpacking the semidirect product, substituting the dirac-delta in, reexpressing the condition for the indicator function, using that s(y)=y inside the indicator function, applying monotonicity of θ, using that θ∈Θ, and then packing up the definition of Θ. And that inequality has been fulfilled, so, yes, θ∗ lies within Br(Θ). Since θ was arbitrary within Θ, this shows that the projection set is as-big-or-bigger than Θ.
Now to show the reverse direction, that anything in the projection set lies within Θ. For any particular θ′∈Θ, remember, it must fulfill, for all g,s, that θ′(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x)) So, in particular, we can let s:Γ→Γ be the identity function, and g be anything, and we’d get θ′(λyαx.χy∈αg(y,x))≤Θ(λyx.g(y,x)) And, since θ′ is in Br(Θ), it’s supported on the (y,α) pairs s.t. y∈α, so that indicator function drops out of existence, and we get θ′(λyαx.g(y,x))≤Θ(λyx.g(y,x)) And then this can be written as a projection prΓ×Φ(θ′)(λyx.g(y,x))≤Θ(λyx.g(y,x)) And since this holds for all the various g, this means that prΓ×Φ(θ′)∈Θ And we’re done with the other direction of things, the projection must be exactly equal to Θ since projections of contributions in Br(Θ) always land in Θ, and we can surject onto Θ. ■
Proposition 2.2: Let X be a poset, θ,η∈ΔcX. Then, θ⪯η if and only if there exists κ:X→ΔX s.t.: For all x∈X, y∈supp κ(x): x≤y. κ∗θ≤η
Vanessa proved this one, with a very nice proof involving the max-flow min-cut theorem.
Make the following directed graph: The nodes are the elements of {s}∪{t}∪X×{0,1}. Basically, two copies of the finite poset X, a source node s, and a sink node t. x will be used to denote a variable from X, while a subscript of 0 or 1 denotes that the 0 or 1 version of that x, respectively.
As for the edges, there is an edge from s to every point in X0. And an edge from every point in X1 to t. And, for some x0∈X0 and y1∈X1, there’s an edge iff x≤y. The capacities of the edges are, for all x∈X, cs,x0:=θ(x), and for all x∈X, cx1,t:=η(x), and for x,y∈X s.t. x≤y, cx0,y1:=∞.
To work up to applying min-cut max-flow, we must show that the cut of all the edges between s and X0 is a min-cut of the network.
Remember, all cuts are induced by partitions of the nodes into two sets, S and T, where S includes the source s and T includes the sink t. And the value of a cut is c(S,T)=∑a∈S,b∈Tca,b=∑x0∈Tcs,x0+∑x0∈S,y1∈T:x≤ycx0,y1+∑y1∈Scy1,t Implicitly converting both of the following sets to be subsets of X, we have (S∩X1)⊇(S∩X0)↑ for any non-infinite cut. Why? Well, if it was false, then there’d be some y≥x where x0∈S, and yet y1∉S. So y1∈T. Then, the cost of the cut would include cx0,y1=∞, contradiction. So, now, we have that for any non-infinite cut, c(S,T)=∑a∈S,b∈Tca,b=∑x0∈Tcs,x0+∑y1∈Scy1,t and (S∩X1)⊇(S∩X0)↑ holds.
Now, we’ll change our cut. Given some cut (S,T), let our new cut (S′,T′) be defined as S′:={s}∪((S∩X0)↑×{0,1}) (which still respects that superset property listed above since both sets would be the same), and T′ be the complement. Letting some stuff cancel out, we get c(S,T)−c(S′,T′)=∑x0∈Tcs,x0+∑y1∈Scy1,t−∑x0∈T′cs,x0−∑y1∈S′cy1,t =∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t−∑x0∈T′/Tcs,x0−∑y1∈S′/Scy1,t Now, because S′∩X1=(S∩X0)↑×{1}⊆S∩X1 by that subset property, it means that there’s no y1∈S′/S, so we get =∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t−∑x0∈T′/Tcs,x0 As for that other minus term, we have that S′∩X0=(S∩X0)↑×{0}⊇S∩X0 by how it was defined, so T′∩X0⊆T∩X0, so that other minus term is 0, and we get =∑x0∈T/T′cs,x0+∑y1∈S/S′cy1,t≥0 Rearranging this a bit, we have c(S,T)≥c(S′,T′).
And now we’ll show that c(S′,T′) is underscored by c({s},(X×{0,1})∪{t}). Let’s go. We have c(S′,T′)−c({s},(X×{0,1})∪{t}) =∑x0∈T′cs,x0+∑y1∈S′cy1,t−∑x0∈(X×{0,1})∪{t}cs,x0−∑y1∈{s}cy1,t This simplifies a bit as =∑x0∈T′cs,x0+∑y1∈S′cy1,t−∑x0∈X0cs,x0 This can be rewritten a bit as =⎛⎝∑x0∈X0cs,x0−∑x0∈S′cs,x0⎞⎠+∑y1∈S′cy1,t−∑x0∈X0cs,x0 =∑y1∈S′cy1,t−∑x0∈S′cs,x0 And then, using what S′ is defined as, we can get =∑y∈(S∩X0)↑cy1,t−∑x∈(S∩X0)↑cs,x0 and using what the costs are, it’s =∑y∈(S∩X0)↑η(y)−∑x∈(S∩X0)↑θ(y) =η((S∩X0)↑)−θ((S∩X0)↑)≥0 This holds because θ⪯η and the indicator function for (S∩X0)↑ is monotone. And so, we get c(S′,T′)≥c({s},(X×{0,1})∪{t}). And we previously showed that c(S,T)≥c(S′,T′). So, this means that the cut around s is a minimal cut, it underscores all other cuts.
By the max-flow min-cut theorem, there’s a way of having stuff flow from s to t that saturates the capacities of all the edges that are cut.fx0,y1 will be used to denote the flow from x0 to y1 according to this max-flow way. Let’s finish things up.
Define κ:X→ΔX as follows. For some x, if fs,x0>0, then κ(x)(y):=fx0,y1fs,x0 If fs,x0=0, let κ(x) be any probability distribution on x↑. Note that κ(x) is always a probability distribution supported on x↑, by fiat if fs,x0=0, and otherwise, ∑y∈Xκ(x)(y)=∑y∈Xfx0,y1fs,x0=fs,x0fs,x0=1 This is because, the flow out of x0 must equal the flow into x0 from the source. And κ(x) is supported on x↑ because the only edges out of x0 go to y1 where y≥x. Now that we’ve got this demonstrated, we’ll show our desired inequality. Fix an arbitrary y. We have κ∗(θ)(y)=∑x∈Xθ(x)⋅κ(x)(y)=∑x∈Xcs,x0⋅fx0,y1fs,x0 And then we use that all the capacities of the edges cut in the min-cut are saturated according to the max-flow plan, so cs,x0=fs,x0, and we have =∑x∈Xfs,x0⋅fx0,y1fs,x0 Now, if fs,x0=0, then because the flow in equals the flow out, that means that fx0,y1=0, and otherwise we can cancel out, so we can rewrite as =∑x∈Xfx0,y1=fy1,t≤cy1,t=η(y) Where the first equality came from flow in equals flow out, the inequality came from the flow plan respecting the capacity of the paths, and the equality came from how the capacities were defined. So, for all y∈X, we have κ∗(θ)(y)≤η(y) so we have κ∗(θ)≤η.
For the reverse direction, if there’s some κ:X→ΔX s.t κ(x) is supported on x↑ s.t. κ∗(θ)≤η, then for any monotone function f, we have η(f)≥κ∗(θ)(f)=θ(λx.κ(x)(f)) And then we use that, since κ(x) is supported on x↑, and f is monotone, f(x) is less than or equal to the expectation of f w.r.t κ(x) (remember, for κ(x) you have a 100 percent chance of drawing something at-or-above x, which guarantees that f of whatever you picked is above f(x)). And so, we get ≥θ(f) And since this holds for every monotone f, we have θ⪯η.■
Proposition 2.3: Let X be a poset, θ,η∈ΔcX. Then, θ⪯η if and only if there exists κ:X→ΔX s.t. For all x∈X, y∈supp κ(x): x≥y. θ≤κ∗η
Proof: Because θ⪯η, we can get a maximum flow in exactly the same way as Proposition 2.2. Then, just flip the direction of all flows, which will be denoted by swapping the order of the subscripts. Now, define κ(y)(x):=fy1,x0ft,y1 And, again, if it’s 0, it should be an arbitrary probability distribution supported on y↓. Note that κ(y) is always a probability distribution supported on y↓, by fiat if f′t,y1=0, and otherwise, ∑x∈Xκ(y)(x)=∑x∈Xfy1,x0ft,y1=ft,y1ft,y1=1 This is because, the flow out of y1 must equal the flow into y1 from the source t (the old sink). And κ(y) is supported on y↓ because the only edges out of y1 go to x0 where y≥x. Now that we’ve got this demonstrated, we’ll show our desired inequality. Fix an arbitrary x. We have κ∗(η)(x)=∑y∈Xη(y)⋅κ(y)(x)=∑y∈Xcy1,t⋅fy1,x0ft,y1 Then use that cy1,t≥ft,y1 (because even with the flow reversed, the flow through a path must be less than or the same as the capacity. Accordingly, we get ≥∑y∈Xfy1,x0=fx0,s=cs,x0=θ(x) And we’re done, inequality established, using definitions and the flow saturating all the paths out of s.
So, for all x∈X, we have κ∗(η)(x)≥θ(x) so we have κ∗(η)≥θ.
For the reverse direction, if there’s some κ:X→ΔX s.t κ(y) is supported on y↓ s.t. κ∗(η)≥θ, then for any monotone function f, we have θ(f)≤κ∗(η)(f)=η(λx.κ(x)(f)) And then we use that, since κ(x) is supported on x↓, and f is monotone, f(x) is greater than or equal to the expectation of f w.r.t κ(x) (remember, for κ(x) you have a 100 percent chance of drawing something at-or-below x, which guarantees that f of whatever you picked is below f(x)). And so, we get ≤η(f) And since this holds for every monotone f, we have θ⪯η. ■
Proposition 2.4: For any Γ, Φ and Θ∈□c(Γ×Φ),Br(Θ) is downwards closed w.r.t. the induced order on Δc(elΓ×Φ). That is, if θ∈Br(Θ) and η⪯θ then η∈Br(Θ).
Remember, the condition for some θ to lie in Br(Θ) is that it be supported on elΓ, and that, for all s:Γ→Γ, and g:Γ×Φ→[0,1], θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x)) So, given that η⪯θ (lower score for all monotone functions) we’ll show that η fulfills both conditions. The support thing is taken care of by η∈Δc(elΓ×Φ). As for the other one, we have η(λyαx.χs(y)∈αg(s(y),x))≤θ(λyαx.χs(y)∈αg(s(y),x)) ≤Θ(λyx.g(y,x)) That inequality occurs because as you make α larger, ie, go up in the partial ordering, the value assigned to the relevant point increases since s(y)∈α is more likely now, so the function value increases from 0 to something that may be greater than 0. So, it’s a monotone function, and we then use that η⪯θ′. ■
Proposition 2.5: Consider a finite set X, ϕ?∈ΔX, ϕ!:{0,1}→ΔX, p∈[0,1] and ϕ:=(1−p)ϕ?+pϕ!. Then, p≥dTV(ϕ(0),ϕ(1)). Conversly, consider any ϕ:{0,1}→ΔX. Then, there exist ϕ?∈ΔX and ϕ!:{0,1}→ΔX s.t.ϕ=(1−p)ϕ?+pϕ! for p:=dTV(ϕ(0),ϕ(1)).
Ok, so let f be some arbitrary function X→[0,1]. We have an alternate characterization of the total variation distance as dTV(ϕ(0),ϕ(1)):=supf∈X→[0,1]|ϕ(0)(f)−ϕ(1)(f)| And then from there we can go =supf∈X→[0,1]|((1−p)ϕ?(f)+pϕ!(0)(f))−((1−p)ϕ?(f)+pϕ!(1)(f))| =supf∈X→[0,1]|pϕ!(0)(f)−pϕ!(1)(f)| =psupf∈X→[0,1]|ϕ!(0)(f)−ϕ!(1)(f)| =pdTV(ϕ!(0),ϕ!(1)) And since p≥pdTV(ϕ!(0),ϕ!(1)), we have p≥dTV(ϕ(0),ϕ(1)) and are done.
Conversely, let the value of p be 1−(ϕ(0)∧ϕ(1))(1), and ϕ? be 1(1−p)(ϕ(0)∧ϕ(1)). It is clear that this is a probability distribution because of how p was defined. The ∧ is the minimum/common overlap of the two probability distributions. Then, let ϕ!(0)=1p(ϕ(0)−(ϕ(0)∧ϕ(1))), and similar for the 1. Well, as long as p>0. If p=0, it can be any probability distribution you want. It’s a probability distribution because ϕ!(0)(1)=1p(ϕ(0)(1)−(ϕ(0)∧ϕ(1))(1))=1p(1−(ϕ(0)∧ϕ(1))(1)) =1−(ϕ(0)∧ϕ(1))(1)1−(ϕ(0)∧ϕ(1))(1)=1 And since p>0, everything works out. Now we just need to show that these add up to make ϕ and that p is the same as the total variation distance. ϕ(0)=(ϕ(0)∧ϕ(1))+ϕ(0)−(ϕ(0)∧ϕ(1)) =(1−p)1(1−p)(ϕ(0)∧ϕ(1))+p1p(ϕ(0)−(ϕ(0)∧ϕ(1)))=(1−p)ϕ?+pϕ!(0) And this works symmetrically for ϕ(1), showing that we indeed have equality. As for showing that p is the total variation distance, referring back to what we’ve already proved, we have dTV(ϕ(0),ϕ(1))=pdTV(ϕ!(0),ϕ!(1)) And now, since ϕ!(0)=ϕ(0)−(ϕ(0)∧ϕ(1)) and ϕ!(1)=ϕ(1)−(ϕ(0)∧ϕ(1)), the supports of these two probability distributions are disjoint, which implies that the total variation distance is 1, so we have dTV(ϕ(0),ϕ(1))=p.
Proposition 2.6: Consider any Φ and ϕ:{0,1}→ΔΦ. Denote U:={0,1}×{{0,1}}×Φ (the event ``program is unrealized″). Let Λ:=Br(⊤{0,1}⋉ϕ). Then, Λ(χU)=1−dTV(ϕ(0),ϕ(1))
This will take some work to establish. One direction, showing that the bridge transform value exceeds the total variation distance value, is fairly easy. As for the other direction, it’ll take some work. Let’s begin. Λ(χU)=Br(⊤{0,1}⋉ϕ)(χU)=Br(⊤{0,1}⋉ϕ)(λyαx.χα={0,1}) =maxθ∈Br(⊤{0,1}⋉ϕ)θ′(λyαx.χα={0,1}) We’ll make a particular contribution θ∗. It is defined as δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1} Or, put another way, restricting to 0,{0}, it’s ϕ(0)−ϕ(0)∧ϕ(1), and conditioning on 0,{0,1}, it’s ϕ(0)∧ϕ(1). So, for a given s:{0,1}→{0,1}, we have θ∗(λyαx.χs(y)∈αg(s(y),x)) =(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χs(y)∈αg(s(y),x)) =(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0})(λyαx.χs(y)∈αg(s(y),x)) +(δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χs(y)∈αg(s(y),x)) =(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χs(0)∈{0}g(s(0),x))+(ϕ(0)∧ϕ(1))(λx.χs(0)∈{0,1}g(s(0),x)) Now, we’ll go through two exhaustive possibilities for what s could be. If it maps 0 to 0, then it’s =(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ0∈{0}g(0,x))+(ϕ(0)∧ϕ(1))(λx.χ0∈{0,1}g(0,x)) =(ϕ(0)−ϕ(0)∧ϕ(1))(λx.g(0,x))+(ϕ(0)∧ϕ(1))(λx.g(0,x)) =ϕ(0)(λx.g(0,x))≤maxy∈{0,1}ϕ(y)(λx.g(y,x)) =⊤{0,1}(λy.ϕ(y)(λx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.g(y,x)) And our desired inequality is established. If s maps 0 to 1, then it’s =(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ1∈{0}g(1,x))+(ϕ(0)∧ϕ(1))(λx.χ1∈{0,1}g(1,x)) =(ϕ(0)∧ϕ(1))(λx.g(1,x))≤ϕ(1)(λx.g(1,x)) ≤maxy∈{0,1}ϕ(y)(λx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.g(y,x)) And that is taken care of. So, our θ∗ lies in Br(⊤{0,1}⋉ϕ).
Now, where were we? Ah right, we were at Λ(χU)=maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1}) But now we can continue with ≥θ∗(λyαx.χα={0,1}) Which unpacks as =(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0}+δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χα={0,1}) =(δ0×(ϕ(0)−ϕ(0)∧ϕ(1))×δ{0})(λyαx.χα={0,1}) +(δ0×(ϕ(0)∧ϕ(1))×δ{0,1})(λyαx.χα={0,1}) =(ϕ(0)−ϕ(0)∧ϕ(1))(λx.χ{0}={0,1})+(ϕ(0)∧ϕ(1))(λx.χ{0,1}={0,1}) =(ϕ(0)∧ϕ(1))(1) And then, the value of the overlap between two distributions is 1−dTV(ϕ(0),ϕ(1)). So, we’ve shown one direction, we have Λ(χU)≥1−dTV(ϕ(0),ϕ(1)) In the other direction of the equality we’re trying to prove, we need to constrain what θ might be. Starting off with what we definitely know, we have Λ(χU)=maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1}) Now it’s time to show that for any θ∈Br(⊤{0,1}⋉ϕ), that the measure on α={0,1} can’t be too high. Specifically, to proceed further, we’ll need to show that ∀x′∈Φ:θ(λyαx.χx=x′∧α={0,1})≤(ϕ(0)∧ϕ(1))(λx.χx=x′) Time to establish it. On x′, either ϕ(0)(x′)≥ϕ(1)(x′), or vice-versa. Without loss of generality, assume that it’s ϕ(0)(x′) that’s lower.
Then, let s be the constant-0 function, and g(0,x′) be 1, and 0 everywhere else. Then, we have θ(λyαx.χx=x′∧α={0,1})≤θ(λyαx.χs(y)∈αg(s(y),x)) ≤(⊤{0,1}⋉ϕ)(λyx.g(y,x))=(⊤{0,1}⋉ϕ)(λyx.χy=0∧x=x′) =maxy∈{0,1}ϕ(y)(λx.χy=0∧x=x′)=ϕ(0)(λx.χx=x′)=ϕ(0)(x′) =(ϕ(0)∧ϕ(1))(x′) Just abbreviating, using that θ lies in the bridge transform, deabbreviating with what g is, unpacking things a little bit and canceling out. At the end we used that ϕ(0)(x′)≤ϕ(1)(x′).
Ok, so, now that we’ve established the key fact that ∀x′∈Φ:θ(λyαx.χx=x′∧α={0,1})≤(ϕ(0)∧ϕ(1))(λx.χx=x′) We can resume our work. We were previously at =maxθ∈Br(⊤{0,1}⋉ϕ)θ(λyαx.χα={0,1}) and we can proceed to =maxθ∈Br(⊤{0,1}⋉ϕ)∑x′∈Φθ(λyαx.χx=x′∧α={0,1}) and from there, using the inequality we proved, proceed to ≤∑x′∈Φ(ϕ(0)∧ϕ(1))(λx.χx=x′)=(ϕ(0)∧ϕ(1))(1)=1−dTV(ϕ(0),ϕ(1)) And we’re done, we established that it’s an upper bound as well a lower bound, so we have equality. ■
Lemma 2: If there’s a function h:Φ→2Γ and Θ(λyx.χy∉h(x))=0, then for all θ∈Br(Θ), θ is supported on the set {α,x|α⊆h(x)}.
Assume that the conclusion is false, that there is some nonzero probability of drawing a x∗,α∗ pair where α∗⊈h(x∗). In particular, there must be some special y∗∈Γ value that witnesses that α∗ isn’t a subset, by y∗∈α∗ and y∗∉h(x∗). Remember that for all s:Γ→Γ and g:Γ×Φ→[0,1], that since θ∈Br(Θ), we have θ(λyαx.χs(y)∈αg(s(y),x))≤Θ(λyx.g(y,x)) Now, in particular, let g:=χy∉h(x), and s be the constant function that maps everything to y∗. Then, this turns into 0<θ(λyαx.χy∗∈αχy∗∉h(x))≤Θ(λyx.χy∉h(x))=0 which is impossible. The equality as the end was our starting assumption. The middle inequality was just specializing our inequality to a particular pair of functions. And it’s greater than 0 because there’s a nonzero probability of drawing x∗,α∗. And y∗ was selected to lie in α∗ and outside of h(x∗), so there’s a nonzero probability of getting a nonzero value. Therefore, the result follows. ■
Lemma 3: For any s:Γ→Γ, and Θ:□c(Γ×Φ), if θ∈Br(Θ), then χelΓ(s∗(θ′))∈Br(Θ).
So, the support condition is trivially fulfilled, because we’re restricting to the event y∈α. That just leaves the endofunction condition. Let s′:Γ→Γ be arbitrary, and g:Γ×Φ→[0,1] be arbitrary. Then we have χelΓ(s∗(θ))(λyαx.χs′(y)∈αg(s′(y),x))=s∗(θ