After some back and forth last night with an LLM[1], we now have a proof of “chainability” for the redundancy diagrams in particular. (And have some hope that this will be most of what we need to rescue the stochastic->deterministic nat lat proof.)
(Theorem) Chainability of Redunds
Let P be a distribution over X1, X2, and Λ. Define: Q[X,Λ]:=P[X]P[Λ|X1] S[X,Λ]:=P[X]P[Λ|X2]=P[X]∑X1P[X1|X2]P[Λ|X]R[X,Λ]:=P[X]Q[Λ|X2]=P[X]∑X1P[X1|X2]P[Λ|X1]
Where you can think of Q as ‘forcing’ P into factorizing per one redundancy pattern: X2→X1→Λ, S as forcing the other pattern: X1→X2→Λ, and R as forcing one after the other: first X2→X1→Λ, and then X1→X2→Λ.
The theorem states, DKL(P||R)≤DKL(P||Q)+DKL(P||S),
Or in words: The error (in DKL from P) accrued by applying both factorizations to P, is bounded by the the sum of the errors accrued by applying each of the factorizations to P, separately.
Combining steps 1 and 2, DKL(P||Q)≥DKL(S||R)=DKL(P||R)−DKL(P||S)
⟹DKL(P||R)≤DKL(P||Q)+DKL(P||S)
which completes the proof.
Notes: In the second to last line of step 2, the expectation over P[X1|X2] is allowed because there are no free X1’s in the expression. Then, this aggregates into an expectation over S[X,Λ] as S[Λ|X2]=S[Λ|X].
We are hopeful that this, thought different than the invalidated result in the top level post, will be an important step to rescuing the stochastic natural latent ⇒ deterministic natural latent result.
Additional note which might be relevant later: we can also get proof step 1 in a somewhat more general way, which establishes that the function P[X,Λ]↦P[X]P[Λ|Xi] is a nonexpansive map under DKL. We’ll write that proof down later if we need it.
(Update 7)
After some back and forth last night with an LLM[1], we now have a proof of “chainability” for the redundancy diagrams in particular. (And have some hope that this will be most of what we need to rescue the stochastic->deterministic nat lat proof.)
(Theorem) Chainability of Redunds
Let P be a distribution over X1, X2, and Λ.
Define:
Q[X,Λ]:=P[X]P[Λ|X1]
S[X,Λ]:=P[X]P[Λ|X2]=P[X]∑X1P[X1|X2]P[Λ|X]R[X,Λ]:=P[X]Q[Λ|X2]=P[X]∑X1P[X1|X2]P[Λ|X1]
Where you can think of Q as ‘forcing’ P into factorizing per one redundancy pattern: X2→X1→Λ, S as forcing the other pattern: X1→X2→Λ, and R as forcing one after the other: first X2→X1→Λ, and then X1→X2→Λ.
The theorem states,
DKL(P||R)≤DKL(P||Q)+DKL(P||S),
Or in words: The error (in DKL from P) accrued by applying both factorizations to P, is bounded by the the sum of the errors accrued by applying each of the factorizations to P, separately.
Proof
The proof proceeds in 3 steps.
DKL(P||Q)≥DKL(S||R)
Pf.
Let aX1:=P[X1|X2]P[Λ|X]≥0
Let bX1:=P[X1|X2]P[Λ|X1]≥0
By the log-sum inequality:
∑X1(aX1lnaX1bX1)≥(∑X1aX1)ln(∑X1aX1∑X1bX1)
⟹DKL(P||Q)≥DKL(S||R) as desired.
DKL(P||R)=DKL(P||S)+DKL(S||R)
Pf.
DKL(P||R)−DKL(P||S)
=∑X,ΛP[X,Λ][lnP[Λ|X]−lnR[Λ|X2]−lnP[Λ|X]+lnS[Λ|X2]]
=∑X2P[X2]∑Λ∑X1P[X1|X2]P[Λ|X]lnS[Λ|X2]R[Λ|X2]
=∑X1∑X2P[X1|X2]P[X2]∑ΛS[Λ|X2]lnS[Λ|X2]P[X]R[Λ|X2]P[X]
=DKL(S||R)
Combining steps 1 and 2,
DKL(P||Q)≥DKL(S||R)=DKL(P||R)−DKL(P||S)
⟹DKL(P||R)≤DKL(P||Q)+DKL(P||S)
which completes the proof.
Notes:
In the second to last line of step 2, the expectation over P[X1|X2] is allowed because there are no free X1’s in the expression. Then, this aggregates into an expectation over S[X,Λ] as S[Λ|X2]=S[Λ|X].
We are hopeful that this, thought different than the invalidated result in the top level post, will be an important step to rescuing the stochastic natural latent ⇒ deterministic natural latent result.
A (small) positive update for me on their usefulness to my workflow!
Additional note which might be relevant later: we can also get proof step 1 in a somewhat more general way, which establishes that the function P[X,Λ]↦P[X]P[Λ|Xi] is a nonexpansive map under DKL. We’ll write that proof down later if we need it.
Hi, Jeremy and I have a couple of updates on this thread. I have put them in a shortform here.