One thing here is that, in the relative sense, Ωn isn’t any more “guilty” of caring about specific variables than any one Xij-copying R, given that it captures much more shared information about Anis than those Rs do. So, to suit, maybe we should allow ϵ3 to scale with the amount of shared information in the max-redund? Which would be uhh SH(Ω)=H(Ω)−H(Ω|X)−maxiI(Ω;X−i|Xi) or something.
Generalizing, maybe it should be “permitted” for redunds to contain more unique information the more shared information they have? Which would make the tuning parameter for “how pure of a redund R is” not maxiI(R;X−i|Xi) (aka your ϵ2), but SH(R)H(R), i. e., the fraction of the redund’s total entropy[1] that is shared information. That makes intuitive sense to me.
Some more spitballing regarding how to fix this:
One thing here is that, in the relative sense, Ωn isn’t any more “guilty” of caring about specific variables than any one Xij-copying R, given that it captures much more shared information about Anis than those Rs do. So, to suit, maybe we should allow ϵ3 to scale with the amount of shared information in the max-redund? Which would be uhh SH(Ω)=H(Ω)−H(Ω|X)−maxiI(Ω;X−i|Xi) or something.
Generalizing, maybe it should be “permitted” for redunds to contain more unique information the more shared information they have? Which would make the tuning parameter for “how pure of a redund R is” not maxiI(R;X−i|Xi) (aka your ϵ2), but SH(R)H(R), i. e., the fraction of the redund’s total entropy[1] that is shared information. That makes intuitive sense to me.
Or its X-relevant entropy? Subtract H(R|X) from the denominator too then.