johnswentworth comments on johnswentworth’s Shortform

johnswentworth 9 Sep 2025 21:35 UTC
LW: 8 AF: 6
0
AF
Proof
Specifically, we’ll show that there exists an information throughput maximizing distribution which satisfies the undirected graph. We will not show that all optimal distributions satisfy the undirected graph, because that’s false in some trivial cases—e.g. if all the $Y$ ’s are completely independent of $X$ , then all distributions are optimal. We will also not show that all optimal distributions factor over the undirected graph, which is importantly different because of the $P [X] > 0$ caveat in the Hammersley-Clifford theorem.
First, we’ll prove the (already known) fact that an independent distribution $P [X] = P [X_{1}] P [X_{2}]$ is optimal for a pair of independent channels $(X_{1} \to Y_{1}, X_{2} \to Y_{2})$ ; we’ll prove it in a way which will play well with the proof of our more general theorem. Using standard information identities plus the factorization structure $Y_{1} - X_{1} - X_{2} - Y_{2}$ (that’s a Markov chain, not subtraction), we get
$M I (X; Y) = M I (X; Y_{1}) + M I (X; Y_{2} | Y_{1})$
$= M I (X; Y_{1}) + (M I (X; Y_{2}) - M I (Y_{2}; Y_{1}) + M I (Y_{2}; Y_{1} | X))$
$= M I (X_{1}; Y_{1}) + M I (X_{2}; Y_{2}) - M I (Y_{2}; Y_{1})$
Now, suppose you hand me some supposedly-optimal distribution $P [X]$ . From $P$ , I construct a new distribution $Q [X] := P [X_{1}] P [X_{2}]$ . Note that $M I (X_{1}; Y_{1})$ and $M I (X_{2}; Y_{2})$ are both the same under $Q$ as under $P$ , while $M I (Y_{2}; Y_{1})$ is zero under $Q$ . So, because $M I (X; Y) = M I (X_{1}; Y_{1}) + M I (X_{2}; Y_{2}) - M I (Y_{2}; Y_{1})$ , the $M I (X; Y)$ must be at least as large under $Q$ as under $P$ . In short: given any distribution, I can construct another distribution with as least as high information throughput, under which $X_{1}$ and $X_{2}$ are independent.
Now let’s tackle our more general theorem, reusing some of the machinery above.
I’ll split $Y$ into $Y_{1}$ and $Y_{2}$ , and split $X$ into $X_{1 - 2}$ (parents of $Y_{1}$ but not $Y_{2}$ ), $X_{2 - 1}$ (parents of $Y_{2}$ but not $Y_{1}$ ), and $X_{1 \cap 2}$ (parents of both). Then
$M I (X; Y) = M I (X_{1 \cap 2}; Y) + M I (X_{1 - 2}, X_{2 - 1}; Y | X_{1 \cap 2})$
In analogy to the case above, we consider distribution $P [X]$ , and construct a new distribution $Q [X] := P [X_{1 \cap 2}] P [X_{1 - 2} | X_{1 \cap 2}] P [X_{2 - 1} | X_{1 \cap 2}]$ . Compared to $P$ , $Q$ has the same value of $M I (X_{1 \cap 2}; Y)$ , and by exactly the same argument as the independent case $M I (X_{1 - 2}, X_{2 - 1}; Y | X_{1 \cap 2})$ cannot be any higher under $Q$ ; we just repeat the same argument with everything conditional on $X_{1 \cap 2}$ throughout. So, given any distribution, I can construct another distribution with at least as high information throughput, under which $X_{1 - 2}$ and $X_{2 - 1}$ are independent given $X_{1 \cap 2}$ .
Since this works for any Markov blanket $X_{1 \cap 2}$ , there exists an information thoughput maximizing distribution which satisfies the desired undirected graph.

johnswentworth comments on johnswentworth’s Shortform

Proof