Thane Ruthenis comments on $500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

Thane Ruthenis 16 Jun 2025 9:26 UTC
6 points
0
Actually nevermind, I think I see a counterexample to the exact formulation in the OP. I haven’t fully formalized it, so there may be holes, but here it is:
Imagine a “toxic piece of shared information”: some type of information such that it inextricably ties together some shared bits of information $S$ with unique bits of information $U$ , making it impossible to define a variable that contains only the shared bits from it. If all variables in a set $X$ are “toxic” in this way, then any redund $R$ , even ones that straight-up copy one of the $X_{j}$ , would necessarily have $I (R; X_{- i} | X_{i}) \geq U$ for all $X_{i}$ except $X_{j}$ .
Suppose we fix a specific $ϵ_{2}$ , i. e., state that only variables with fewer unique bits than that qualify as redunds.
Now consider a set of variables $A^{n}$ each variable $A_{i}^{n}$ in which is a set of $n$ independent clones of toxic-information-containing variables ${X_{i}^{1}, \dots, X_{i}^{n}}$ . Any variable which is a redund over only one set of clones is a valid redund over $A^{n}$ , and so there are $n$ redunds like this, all mutually independent.
What would a maximal redund $Ω^{n}$ look like? Well, to properly screen all other redunds off, it would need to include most of the information from each individual independent redund, and so its $I (Ω^{n}; A_{- i}^{n} | A_{i}^{n})$ would scale with $n$ (probably roughly equaling $n U$ ?). We can then set $n$ arbitrarily high, so that this quantity shoots well past any fixed $ϵ_{2}$ . And if we discard some $D$ bits from each redund to try and manage this, this would increasingly work against our ability to screen off redunds from $A_{- i}^{n}$ , and we can again scale $n$ to make any $Ω^{n}$ that tries to keep its $I (Ω; A_{- i}^{n} | A_{i}^{n})$ under control fail arbitrarily hard at the screen-off task for all-but-a-few redunds.
Concrete example: I think a binary symmetric channel should work here? I. e., set $X = {X_{1}, X_{2}}$ , where $X_{2} = XOR (X_{1}, Z)$ , $X_{1}$ is a fair coin, and $Z$ is a biased coin with bias $p$ . We get $H (X_{1}) = H (X_{2}) = 1$ , $I (X_{1}; X_{2}) = p {log}_{2} p + (1 - p) {log}_{2} (1 - p) = K$ , $H (X_{1} | X_{2}) = H (X_{2} | X_{1}) = 1 - K$ . Then an $R = X_{1}$ has $I (X_{1}; X_{2} | R) = 0$ , so it contains $K$ shared bits, and $I (R; X_{1} | X_{2}) = 1 - K$ , so it qualifies as a redund if we set $ϵ_{2} \geq 1 - K$ . Then, per the construction above, we design a set whose variables are sets of those basic variables’ independent clones, and watch the maximal redund explode.
The subtle caveat here is that I think we need to prove that for any variable $Ω$ , decreasing the amount of unique information (i. e., driving $I (Ω; X_{1} | X_{2})$ down) trades off against shared information (i. e., $I (X_{1}; X_{2} | Ω)$ ends up increasing), which leads to $I (X_{1}; R | Ω)$ growing. Specifically, we need to prove that this happens at some nontrivial, non-asymptoting rate that leads $Ω^{n}$ ’s $ϵ_{1}$ in the cloned construction to scale with $n$ , well past any small constant multiple of $ϵ_{2}$ .
The counter here would be a continuum of $Ω$ s which can drive $I (Ω; X_{1} | X_{2})$ arbitrarily low at flatlining decrease in $I (X_{1}; X_{2} | Ω)$ . The full counter would prove that this is possible in the general case, not only in the BSC setting.
That it’s impossible seems intuitively obvious (consider if that worked, what possibilities it’d unlock! the universe isn’t that kind), but maybe annoying to prove and the result wouldn’t be very exciting, so it’s a low priority to me.^[1] Still, throwing this out here in case you find this reasoning convincing.
(My guess is that “maximal redunds are possible for any sets of random variables” is too strong and we need to explicitly limit it to sets of variables that are approximately structured as sparse hierarchical graphs or something. This is probably okay, see my previous thoughts about the universe’s physics selecting for that.)
1. ^
  Here’s o3′s crack at it if you want a starting point. I’ve found it’s sometimes helpful, but I’m currently fatigued from trying to paranoidally spot the place where it’s trying to slip crap past me, so I’ve not verified it. Looks right at a glance, if you ignore some weird parts.
What links here?
- Thane Ruthenis's comment on Natural Latents: Latent Variables Stable Across Ontologies by johnswentworth (4 Sep 2025 7:52 UTC; 6 points)
- Thane Ruthenis 16 Jun 2025 11:58 UTC
  2 points
  0
  Parent
  Some more spitballing regarding how to fix this:
  One thing here is that, in the relative sense, $Ω^{n}$ isn’t any more “guilty” of caring about specific variables than any one $X_{j}^{i}$ -copying $R$ , given that it captures much more shared information about $A_{i}^{n}$ s than those $R$ s do. So, to suit, maybe we should allow $ϵ_{3}$ to scale with the amount of shared information in the max-redund? Which would be uhh $S_{H} (Ω) = H (Ω) - H (Ω | X) - {max}_{i} I (Ω; X_{- i} | X_{i})$ or something.
  Generalizing, maybe it should be “permitted” for redunds to contain more unique information the more shared information they have? Which would make the tuning parameter for “how pure of a redund $R$ is” not ${max}_{i} I (R; X_{- i} | X_{i})$ (aka your $ϵ_{2}$ ), but $\frac{S_{H} (R)}{H (R)}$ , i. e., the fraction of the redund’s total entropy^[1] that is shared information. That makes intuitive sense to me.
  1. ^
    Or its $X$ -relevant entropy? Subtract $H (R | X)$ from the denominator too then.