(See this post for background about the stochastic → deterministic natural latent conjecture)
We’ve shown that given fixed H(Λ), both the redundancy and mediation errors of a latent Λ are minimized when ∑iI(Xi,Λ) is maximized, while H(Λ) is exactly the parameter the determines the tradeoff between redundancy and mediation errors (among pareto-optimal latent). We’ll discuss how this could open up new angles of attack for the stochastic → deterministic natural latent conjecture.
Suppose that we have a stochastic natural latent Λ that satisfies:
I(X2;Λ|X1)≤ϵ
I(X1;Λ|X2)≤ϵ
TC(X|Λ)=I(X1;X2|Λ)≤ϵ
From our result, we know that to construct a deterministic natural latent Λ′, all we have to do is to determine the entropy H(Λ′) and then select the latent that maximizes ∑iI(Xi,Λ′). The latter ensures that the latent is pareto-optimal w.r.t the mediation and determinism conditions, while the former selects a particular point on the pareto-frontier.
Now suppose that our stochastic natural latent has a particular amount of mutual information with the joint observables I(X1,X2;Λ). If the stochastic natural latent was a deterministic function of the observables, then we would have:
H(Λ)=I(X1,X2;Λ) (as that would imply H(Λ|X1,X2)=H(Λ)−I(X1,X2;Λ)=0)
So one heuristic for constructing a deterministic natural latent is to just set H(Λ′)=I(X1,X2;Λ) and maximize ∑iI(Xi,Λ′) given the entropy constraint (so that Λ′ hopefully captures all the mutual info between Λ and X). We will show that if Λ′ preserves the mutual information with each observable (i.e. I(Xi;Λ′)=I(Xi;Λ),i=1,2), then the mediation condition is conserved and the stochastic redundancy conditions implies the deterministic redundancy conditions
Preserving mutual information terms ⟹ Mediation is conserved
Note that the mediation error is TC(X|Λ)=∑iH(Xi|Λ)−H(X|Λ)=∑iH(Xi)−I(Xi;Λ)−H(X)+I(X;Λ)
Since all H(Xi) and H(X) terms are fixed relative to Λ, the mediation error is completely unchanged if we replace Λ with a deterministic latent Λ′ that satisfies I(X;Λ′)=H(Λ′)=I(X;Λ) and I(Xi;Λ′)=I(Xi,Λ) for each i.
Preserving mutual information terms ⟹ Redundancy is conserved
where Syn(X1,X2;Λ) represents synergistic information of X2 and Λ w.r.t Λ while Uniq(X1;Λ) represents unique information of X1 w.r.t Λ. Intuitively, I(X1;Λ|X2) represents the information that X1 has about Λ when we have access to X2, which should include unique information that we can only derive from X1 but not X2, but also synergistic information that we can only derive when we have both X1 and X2.
We also have:
I(X1;Λ)=Red(X1,X2;Λ)+Uniq(X1;Λ)
I(X2;Λ)=Red(X1,X2;Λ)+Uniq(X2;Λ)
Intuitively, this is because I(X1;Λ) contains both the unique information about Λ that you can only derive from X1 but not X2, and also the redundant information that you can derive from either X1 or X2. Note that since0≤Uniq(X1;Λ)≤ϵ and 0≤Uniq(X2;Λ)≤ϵ, we have
As a result, both I(X1,X2;Λ)−I(X1;Λ) and I(X1,X2;Λ)−I(X2;Λ) are bounded by 2ϵ. This means that if we can find a deterministic latent Λ′ that conserves all the relevant mutual information terms I(X1,X2;|Λ), I(X1;|Λ) and I(X2;|Λ), then we can bound the deterministic redundancy errors:
We’ve shown that a sufficient condition for mediation and redundancy to transfer from the stochastic to deterministic case is if the deterministic latent preserves the mutual information of the stochastic latent with both the joint observable X as well as the individual observables X1 and X2. Given this, the remaining task would be to prove that such a deterministic latent always exists, or that it can preserve the mutual information terms up to some small error. In particular, if existence is guaranteed, then a tractable way to find the deterministic latent Λ′ given a stochastic latent Λ is to just set H(Λ′)=I(X;Λ) and maximize ∑iI(Xi;Λ′)
Preserving mutual information terms ⟹ ( Stochastic ⟹ Deterministic Natural latent)
(See this post for background about the stochastic → deterministic natural latent conjecture)
We’ve shown that given fixed H(Λ), both the redundancy and mediation errors of a latent Λ are minimized when ∑iI(Xi,Λ) is maximized, while H(Λ) is exactly the parameter the determines the tradeoff between redundancy and mediation errors (among pareto-optimal latent). We’ll discuss how this could open up new angles of attack for the stochastic → deterministic natural latent conjecture.
Suppose that we have a stochastic natural latent Λ that satisfies:
I(X2;Λ|X1)≤ϵ
I(X1;Λ|X2)≤ϵ
TC(X|Λ)=I(X1;X2|Λ)≤ϵ
From our result, we know that to construct a deterministic natural latent Λ′, all we have to do is to determine the entropy H(Λ′) and then select the latent that maximizes ∑iI(Xi,Λ′). The latter ensures that the latent is pareto-optimal w.r.t the mediation and determinism conditions, while the former selects a particular point on the pareto-frontier.
Now suppose that our stochastic natural latent has a particular amount of mutual information with the joint observables I(X1,X2;Λ). If the stochastic natural latent was a deterministic function of the observables, then we would have:
H(Λ)=I(X1,X2;Λ) (as that would imply H(Λ|X1,X2)=H(Λ)−I(X1,X2;Λ)=0)
So one heuristic for constructing a deterministic natural latent is to just set H(Λ′)=I(X1,X2;Λ) and maximize ∑iI(Xi,Λ′) given the entropy constraint (so that Λ′ hopefully captures all the mutual info between Λ and X). We will show that if Λ′ preserves the mutual information with each observable (i.e. I(Xi;Λ′)=I(Xi;Λ),i=1,2), then the mediation condition is conserved and the stochastic redundancy conditions implies the deterministic redundancy conditions
Preserving mutual information terms ⟹ Mediation is conserved
Note that the mediation error is TC(X|Λ)=∑iH(Xi|Λ)−H(X|Λ)=∑iH(Xi)−I(Xi;Λ)−H(X)+I(X;Λ)
Since all H(Xi) and H(X) terms are fixed relative to Λ, the mediation error is completely unchanged if we replace Λ with a deterministic latent Λ′ that satisfies I(X;Λ′)=H(Λ′)=I(X;Λ) and I(Xi;Λ′)=I(Xi,Λ) for each i.
Preserving mutual information terms ⟹ Redundancy is conserved
Note that using partial information decomposition[1], we can decompose the stochastic redundancy errors as the following:
I(X1;Λ|X2)=Syn(X1,X2;Λ)+Uniq(X1;Λ)<ϵ⟹Uniq(X1;Λ)<ϵ
I(X2;Λ|X1)=Syn(X1,X2;Λ)+Uniq(X2;Λ)<ϵ⟹Uniq(X2;Λ)<ϵ
where Syn(X1,X2;Λ) represents synergistic information of X2 and Λ w.r.t Λ while Uniq(X1;Λ) represents unique information of X1 w.r.t Λ. Intuitively, I(X1;Λ|X2) represents the information that X1 has about Λ when we have access to X2, which should include unique information that we can only derive from X1 but not X2, but also synergistic information that we can only derive when we have both X1 and X2.
We also have:
I(X1;Λ)=Red(X1,X2;Λ)+Uniq(X1;Λ)
I(X2;Λ)=Red(X1,X2;Λ)+Uniq(X2;Λ)
Intuitively, this is because I(X1;Λ) contains both the unique information about Λ that you can only derive from X1 but not X2, and also the redundant information that you can derive from either X1 or X2. Note that since0≤Uniq(X1;Λ)≤ϵ and 0≤Uniq(X2;Λ)≤ϵ, we have
Red(X1,X2;Λ)≤I(X1;Λ)≤Red(X1,X2;Λ)+ϵ
Red(X1,X2;Λ)≤I(X2;Λ)≤Red(X1,X2;Λ)+ϵ
Similarly, we have:
I(X1,X2;Λ)=Red(X1,X2;Λ)+Uniq(X1;Λ)+Uniq(X2;Λ)+Syn(X1,X2;Λ)
where
Uniq(X1;Λ)+Uniq(X2;Λ)+Syn(X1,X2;Λ)≤2ϵ⟹Red(X1,X2;Λ)≤I(X1,X2;Λ)≤Red(X1,X2;Λ)+2ϵ
As a result, both I(X1,X2;Λ)−I(X1;Λ) and I(X1,X2;Λ)−I(X2;Λ) are bounded by 2ϵ. This means that if we can find a deterministic latent Λ′ that conserves all the relevant mutual information terms I(X1,X2;|Λ), I(X1;|Λ) and I(X2;|Λ), then we can bound the deterministic redundancy errors:
H(Λ′|X1)=H(Λ′)−I(X1;Λ′)=I(X1,X2;Λ)−I(X1;Λ)<2ϵH(Λ′|X2)=H(Λ′)−I(X2;Λ′)=I(X1,X2;Λ)−I(X2;Λ)<2ϵ
Conclusion
We’ve shown that a sufficient condition for mediation and redundancy to transfer from the stochastic to deterministic case is if the deterministic latent preserves the mutual information of the stochastic latent with both the joint observable X as well as the individual observables X1 and X2. Given this, the remaining task would be to prove that such a deterministic latent always exists, or that it can preserve the mutual information terms up to some small error. In particular, if existence is guaranteed, then a tractable way to find the deterministic latent Λ′ given a stochastic latent Λ is to just set H(Λ′)=I(X;Λ) and maximize ∑iI(Xi;Λ′)
Note that PID depends on a choice of redundancy measure, but our proof holds for any choice that guarantees non-negativity of PID atoms