johnswentworth comments on Resampling Conserves Redundancy (Approximately)

johnswentworth 24 Oct 2025 22:36 UTC
10 points
0
(Update 6)
Most general version of the chainability conjecture (for arbitrary graphs) has now been falsified numerically by David, but the version specific to the DAGs we need (i.e. the redundancy conditions, or one redundancy and the mediation condition) still looks good.
Most likely proof structure would use this lemma:
Lemma
Let $f_{1}, f_{2}$ be nonexpansive maps under distance metric $D$ . (Nonexpansive maps are the non-strict version of contraction maps.)
By the nonexpansive map property, $D (x, f_{1} (x)) \geq D (f_{2} (x), f_{2} (f_{1} (x)))$ . And by the triangle inequality for the distance metric, $D (x, f_{2} (f_{1} (x))) \leq D (x, f_{2} (x)) + D (f_{2} (x), f_{2} (f_{1} (x)))$ . Put those two together, and we get
$D (x, f_{2} (f_{1} (x))) \leq D (x, f_{1} (x)) + D (x, f_{2} (x))$
(Note: this is a quick-and-dirty comment so I didn’t draw a nice picture, but this lemma is easiest to understand by drawing the picture with the four points and distances between them.)
I think that lemma basically captures my intuitive mental picture for how the chainability conjecture “should” work, for the classes of DAGs on which it works at all. Each DAG $j$ would correspond to one of the functions $f_{j}$ . where $f_{j}$ takes in a distribution and returns the distribution factored over the DAG $j$ , i.e.
$f_{j} (X \mapsto P [X]) := (X \mapsto \prod_{i} P [X_{i} | X_{p a^{j} (i)}])$
In order to apply the lemma to get our desired theorem, we then need to find a distance metric which:
- Is a distance metric (in particular, it must satisfy the triangle inequality, unlike $D_{K L}$ )
- Makes our DAG functions nonexpansive mappings
- Matches $D_{K L} (P, f_{j} (P))$ AT THE SPECIFIC POINT P (not necessarily anywhere else)
The first two of those are pretty easy to satisfy for the redundancy condition DAGs: those two DAG operators are convex combinations, so good ol’ Euclidean distance on the distributions should work fine. Making it match $D_{K L}$ at $P$ is trickier, still working that out.
- David Lorell 28 Oct 2025 22:01 UTC
  10 points
  0
  Parent
  (Update 7)
  
  After some back and forth last night with an LLM^[1], we now have a proof of “chainability” for the redundancy diagrams in particular. (And have some hope that this will be most of what we need to rescue the stochastic->deterministic nat lat proof.)
  (Theorem) Chainability of Redunds
  Let P be a distribution over $X_{1}$ , $X_{2}$ , and $Λ$ .
  Define:
  $Q [X, Λ] := P [X] P [Λ | X_{1}]$
  $S [X, Λ] := P [X] P [Λ | X_{2}] = P [X] \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X]$ $R [X, Λ] := P [X] Q [Λ | X_{2}] = P [X] \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X_{1}]$
  Where you can think of Q as ‘forcing’ P into factorizing per one redundancy pattern: $X_{2} \to X_{1} \to Λ$ , S as forcing the other pattern: $X_{1} \to X_{2} \to Λ$ , and R as forcing one after the other: first $X_{2} \to X_{1} \to Λ$ , and then $X_{1} \to X_{2} \to Λ$ .
  The theorem states,
  $D_{K L} (P | | R) \leq D_{K L} (P | | Q) + D_{K L} (P | | S)$ ,
  Or in words: The error (in $D_{K L}$ from $P$ ) accrued by applying both factorizations to P, is bounded by the the sum of the errors accrued by applying each of the factorizations to P, separately.
  Proof
  The proof proceeds in 3 steps.
  1. $D_{K L} (P | | Q) \geq D_{K L} (S | | R)$
    Pf.
    Let $a_{X_{1}} := P [X_{1} | X_{2}] P [Λ | X] \geq 0$
    Let $b_{X_{1}} := P [X_{1} | X_{2}] P [Λ | X_{1}] \geq 0$
    By the log-sum inequality:
    $\sum_{X_{1}} (a_{X_{1}} l n \frac{a_{X_{1}}}{b_{X_{1}}}) \geq (\sum_{X_{1}} a_{X_{1}}) l n (\frac{\sum_{X_{1}} a_{X_{1}}}{\sum_{X_{1}} b_{X_{1}}})$
    $⟹$ $D_{K L} (P | | Q) \geq D_{K L} (S | | R)$ as desired.
  2. $D_{K L} (P | | R) = D_{K L} (P | | S) + D_{K L} (S | | R)$
    Pf.
    $D_{K L} (P | | R) - D_{K L} (P | | S)$
    $= \sum_{X, Λ} P [X, Λ] [l n P [Λ | X] - l n R [Λ | X_{2}] - l n P [Λ | X] + l n S [Λ | X_{2}]]$
    $= \sum_{X_{2}} P [X_{2}] \sum_{Λ} \sum_{X_{1}} P [X_{1} | X_{2}] P [Λ | X] l n \frac{S [Λ | X_{2}]}{R [Λ | X_{2}]}$
    $= \sum_{X_{1}} \sum_{X_{2}} P [X_{1} | X_{2}] P [X_{2}] \sum_{Λ} S [Λ | X_{2}] l n \frac{S [Λ | X_{2}] P [X]}{R [Λ | X_{2}] P [X]}$
    $= D_{K L} (S | | R)$
  3. Combining steps 1 and 2,
    $D_{K L} (P | | Q) \geq D_{K L} (S | | R) = D_{K L} (P | | R) - D_{K L} (P | | S)$
    $⟹ D_{K L} (P | | R) \leq D_{K L} (P | | Q) + D_{K L} (P | | S)$
    which completes the proof.
  Notes:
  In the second to last line of step 2, the expectation over $P [X_{1} | X_{2}]$ is allowed because there are no free $X_{1}$ ’s in the expression. Then, this aggregates into an expectation over $S [X, Λ]$ as $S [Λ | X_{2}] = S [Λ | X]$ .
  We are hopeful that this, thought different than the invalidated result in the top level post, will be an important step to rescuing the stochastic natural latent ⇒ deterministic natural latent result.
  1. ^
    A (small) positive update for me on their usefulness to my workflow!
  - johnswentworth 28 Oct 2025 22:08 UTC
    8 points
    0
    Parent
    Additional note which might be relevant later: we can also get proof step 1 in a somewhat more general way, which establishes that the function $P [X, Λ] \mapsto P [X] P [Λ | X_{i}]$ is a nonexpansive map under $D_{K L}$ . We’ll write that proof down later if we need it.

johnswentworth comments on Resampling Conserves Redundancy (Approximately)

(Theorem) Chainability of Redunds

Proof