But note that synergistic information can be defined by referring purely to the system we’re examining, with no “external” target variable. If we have a set of variables X={x1,…,xn}, we can define the variable s such that I(X;s) is maximized under the constraint of ∀Xi∈(P(X)∖X):I(Xi;s)=0. (Where P(X)∖X is the set of all subsets of X except X itself.)
That’s a nice formulation of synergistic information, it’s independent with redundant info via the data-processing inequality 0=I(Xi;s)≥I(f(Xi);s) so somewhat promising that it can add up to total entropy.
You might be interested in this comment if distinguishing betweeen synergistic and redundant information is not your main objective: You can simply define redunds over collections of subsets, such that e.g. “dogness” is a redund over every subset of atoms that allows you to conclude you’re looking at a dog. In particular, the redundancy lattice approach seems simpler when the latent depends on not just synergistic but also redundant and unique information
One issue with PID worth mentioning is that they haven’t figured out what measure to use for quantifying multivariate redundant information. It’s the same problem we seem to have. But it’s probably not a major issue in the setting we’re working in (the well-abstracting universes).
Recent impossibility result seems to rule out general multivariate PID that guarantees non-negativity of all components, though partial entropy decomposition may be more tractable
If there’s a pair of qi, qk such that Xi⊂Xk, then qi necessarily contains all information in qk. Re-define qi, removing all information present in qk.
This seems similar to capturing unique information, where the constructive approach is probably harder in PID than PED. E.g. in BROJA it involves an optimization problem over distributions with some constraints on marginals, but it only estimates the magnitude of unique info, not an actual random variable that represents unique info
You can simply define redunds over collections of subsets
As in, take a set of variables X, then search for some set of its (non-overlapping?) subsets such that there’s a nontrivial natural latent over it? Right, it’s what we’re doing here as well.
Potentially, a single X can then generate several such sets, corresponding to different levels of organization. This should work fine, as long as we demand that the latents defined over “coarser” sets of subsets contain some information not present in “finer” latents.
The natural next step is to then decompose the set of all latents we’ve discovered, factoring out information redundant across them. The purpose of this is to remove lower-level information from higher-level latents.
Which almost replicates my initial picture: the higher-level latents now essentially contain just the synergistic information. The difference is that it’s “information redundant across all ‘coarse’ variables in some coarsening of X and not present in any subset of the ‘finer’ variables defining those coarse variables”, rather than defined in a “self-contained” manner for every subset of X.
That definition does feel more correct to me!
Recent impossibility result seems to rule out general multivariate PID that guarantees non-negativity of all components, though partial entropy decomposition may be more tractable
As in, take a set of variables X, then search for some set of its (non-overlapping?) subsets such that there’s a nontrivial natural latent over it? Right, it’s what we’re doing here as well.
I think the subsets can actually be partially overlapping, for instance you may have a λ that’s approximately deterministic w.r.t {X1,X2} and {X2,X3} but not X2 alone, weak redundancy (approximately deterministic w.r.t {¯¯¯¯¯¯Xi}∀i) is also an example of redunds across overlapping subsets
That’s a nice formulation of synergistic information, it’s independent with redundant info via the data-processing inequality 0=I(Xi;s)≥I(f(Xi);s) so somewhat promising that it can add up to total entropy.
You might be interested in this comment if distinguishing betweeen synergistic and redundant information is not your main objective: You can simply define redunds over collections of subsets, such that e.g. “dogness” is a redund over every subset of atoms that allows you to conclude you’re looking at a dog. In particular, the redundancy lattice approach seems simpler when the latent depends on not just synergistic but also redundant and unique information
Recent impossibility result seems to rule out general multivariate PID that guarantees non-negativity of all components, though partial entropy decomposition may be more tractable
This seems similar to capturing unique information, where the constructive approach is probably harder in PID than PED. E.g. in BROJA it involves an optimization problem over distributions with some constraints on marginals, but it only estimates the magnitude of unique info, not an actual random variable that represents unique info
As in, take a set of variables X, then search for some set of its (non-overlapping?) subsets such that there’s a nontrivial natural latent over it? Right, it’s what we’re doing here as well.
Potentially, a single X can then generate several such sets, corresponding to different levels of organization. This should work fine, as long as we demand that the latents defined over “coarser” sets of subsets contain some information not present in “finer” latents.
The natural next step is to then decompose the set of all latents we’ve discovered, factoring out information redundant across them. The purpose of this is to remove lower-level information from higher-level latents.
Which almost replicates my initial picture: the higher-level latents now essentially contain just the synergistic information. The difference is that it’s “information redundant across all ‘coarse’ variables in some coarsening of X and not present in any subset of the ‘finer’ variables defining those coarse variables”, rather than defined in a “self-contained” manner for every subset of X.
That definition does feel more correct to me!
Thanks, that’s useful.
I think the subsets can actually be partially overlapping, for instance you may have a λ that’s approximately deterministic w.r.t {X1,X2} and {X2,X3} but not X2 alone, weak redundancy (approximately deterministic w.r.t {¯¯¯¯¯¯Xi}∀i) is also an example of redunds across overlapping subsets