David Lorell comments on Resampling Conserves Redundancy (Approximately)

David Lorell 28 Oct 2025 22:01 UTC
11 points
1
(Update 7)

After some back and forth last night with an LLM^[1], we now have a proof of “chainability” for the redundancy diagrams in particular. (And have some hope that this will be most of what we need to rescue the stochastic->deterministic nat lat proof.)
(Theorem) Chainability of Redunds
Let P be a distribution over $X_{1}$ , $X_{2}$ , and $Λ$ .
Define:
$Q [X, Λ] := P [X] P [Λ | X_{1}]$
$S [X, Λ] := P [X] P [Λ | X_{2}] = P [X] \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X]$ $R [X, Λ] := P [X] Q [Λ | X_{2}] = P [X] \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X_{1}]$
Where you can think of Q as ‘forcing’ P into factorizing per one redundancy pattern: $X_{2} \to X_{1} \to Λ$ , S as forcing the other pattern: $X_{1} \to X_{2} \to Λ$ , and R as forcing one after the other: first $X_{2} \to X_{1} \to Λ$ , and then $X_{1} \to X_{2} \to Λ$ .
The theorem states,
$D_{K L} (P | | R) \leq D_{K L} (P | | Q) + D_{K L} (P | | S)$ ,
Or in words: The error (in $D_{K L}$ from $P$ ) accrued by applying both factorizations to P, is bounded by the the sum of the errors accrued by applying each of the factorizations to P, separately.
Proof
The proof proceeds in 3 steps.
1. $D_{K L} (P | | Q) \geq D_{K L} (S | | R)$
  Pf.
  Let $a_{X_{1}} := P [X_{1} | X_{2}] P [Λ | X] \geq 0$
  Let $b_{X_{1}} := P [X_{1} | X_{2}] P [Λ | X_{1}] \geq 0$
  By the log-sum inequality:
  $\sum_{X_{1}} (a_{X_{1}} l n \frac{a_{X_{1}}}{b_{X_{1}}}) \geq (\sum_{X_{1}} a_{X_{1}}) l n (\frac{\sum_{X_{1}} a_{X_{1}}}{\sum_{X_{1}} b_{X_{1}}})$
  $⟹$ $D_{K L} (P | | Q) \geq D_{K L} (S | | R)$ as desired.
2. $D_{K L} (P | | R) = D_{K L} (P | | S) + D_{K L} (S | | R)$
  Pf.
  $D_{K L} (P | | R) - D_{K L} (P | | S)$
  $= \sum_{X, Λ} P [X, Λ] [l n P [Λ | X] - l n R [Λ | X_{2}] - l n P [Λ | X] + l n S [Λ | X_{2}]]$
  $= \sum_{X_{2}} P [X_{2}] \sum_{Λ} \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X] l n \frac{S [Λ | X_{2}]}{R [Λ | X_{2}]}$
  $= \sum_{X_{1}} \sum_{X_{2}} P [X_{1} | X_{2}] P [X_{2}] \sum_{Λ} S [Λ | X_{2}] l n \frac{S [Λ | X_{2}] P [X]}{R [Λ | X_{2}] P [X]}$
  $= D_{K L} (S | | R)$
3. Combining steps 1 and 2,
  $D_{K L} (P | | Q) \geq D_{K L} (S | | R) = D_{K L} (P | | R) - D_{K L} (P | | S)$
  $⟹ D_{K L} (P | | R) \leq D_{K L} (P | | Q) + D_{K L} (P | | S)$
  which completes the proof.
Notes:
In the second to last line of step 2, the expectation over $P [X_{1} | X_{2}]$ is allowed because there are no free $X_{1}$ ’s in the expression. Then, this aggregates into an expectation over $S [X, Λ]$ as $S [Λ | X_{2}] = S [Λ | X]$ .
We are hopeful that this, thought different than the invalidated result in the top level post, will be an important step to rescuing the stochastic natural latent ⇒ deterministic natural latent result.
1. ^
  A (small) positive update for me on their usefulness to my workflow!
- johnswentworth 28 Oct 2025 22:08 UTC
  9 points
  0
  Parent
  Additional note which might be relevant later: we can also get proof step 1 in a somewhat more general way, which establishes that the function $P [X, Λ] \mapsto P [X] P [Λ | X_{i}]$ is a nonexpansive map under $D_{K L}$ . We’ll write that proof down later if we need it.
  - Alfred Harwood 27 Nov 2025 12:07 UTC
    5 points
    0
    Parent
    Hi, Jeremy and I have a couple of updates on this thread. I have put them in a shortform here.

David Lorell comments on Resampling Conserves Redundancy (Approximately)

(Theorem) Chainability of Redunds

Proof