Some details mildly off, but I think you’ve got the big picture basically right.
Alternatively, if I didn’t tell you the label, you could estimate it from either X_1 or X_2 equally well. This is the other two diagrams.
Minor clarification here: the other two diagrams say not only that I can estimate the label equally well from either X1 or X2, but that I can estimate the label (approximately) equally well from X1, X2, or the pair (X1,X2).
I think that the full label Λ will be an approximate stochastic natural latent.
I’d have to run the numbers to check that 200 flips is enough to give a high-confidence estimate of Λ (in which case 400 flips from the pair of variables will also put high confidence on the same value with high probability), but I think yes.
But if we consider only the first bit[1] of the label (which roughly tells us whether the bias is above or below 50% heads) then this bit will be a deterministic natural latent because with reasonably high certainty, you can guess the first bit of Λ from X1 or X2.
Not quite; I added some emphasis. The first bit will (approximately) satisfy the two redundancy conditions, i.e.X1→X2→1bit(Λ) and X2→X1→1bit(Λ), and indeed will be an approximately deterministic function of X. But it won’t (approximately) satisfy the mediation condition X1←1bit(Λ)→X2; the two sets of flips will not be (approximately) independent given only the first bit. (At least not to nearly as good an approximation as the original label.)
That said, the rest of your qualitative reasoning is correct. As we throw out more low-order bits, the mediation condition becomes less well approximated, the redundancy conditions become better approximated, and the entropy of the coarse-grained latent given X falls.
So to build a proof along these lines, one would need to show that a bit-cutoff can be chosen such that bit_cutoff(Λ) still mediates (to an approximation roughly ϵ-ish), while making the entropy of bit_cutoff(Λ) low given X.
I do think this is a good angle of attack on the problem, and it’s one of the main angles I’d try.
If a latent satisfies the the three natural latents conditions within ϵ1, we can always find a (potentially much bigger) ϵ2 such that this latent also satisfies the deterministic latent condition, right? This is why you need to specify that the problem is showing that a deterministic natural latent exists with ‘almost the same’ ϵ. Does this sound right?
Yes. Indeed, if we allow large enough ϵ (possibly scaling with system size/entropy) then there’s always a deterministic natural latent regardless; the whole thing becomes trivial.
I’d have to run the numbers to check that 200 flips is enough to give a high-confidence estimate of Λ
It isn’t enough. See plot. Also, 200 not being enough flips is part of what makes this interesting. With a million flips, this would pretty much just be the exact case. The fact that it’s only 200 flips gives you a tradeoff in how many label_bits to include.
Some details mildly off, but I think you’ve got the big picture basically right.
Minor clarification here: the other two diagrams say not only that I can estimate the label equally well from either X1 or X2, but that I can estimate the label (approximately) equally well from X1, X2, or the pair (X1,X2).
I’d have to run the numbers to check that 200 flips is enough to give a high-confidence estimate of Λ (in which case 400 flips from the pair of variables will also put high confidence on the same value with high probability), but I think yes.
Not quite; I added some emphasis. The first bit will (approximately) satisfy the two redundancy conditions, i.e.X1→X2→1bit(Λ) and X2→X1→1bit(Λ), and indeed will be an approximately deterministic function of X. But it won’t (approximately) satisfy the mediation condition X1←1bit(Λ)→X2; the two sets of flips will not be (approximately) independent given only the first bit. (At least not to nearly as good an approximation as the original label.)
That said, the rest of your qualitative reasoning is correct. As we throw out more low-order bits, the mediation condition becomes less well approximated, the redundancy conditions become better approximated, and the entropy of the coarse-grained latent given X falls.
So to build a proof along these lines, one would need to show that a bit-cutoff can be chosen such that bit_cutoff(Λ) still mediates (to an approximation roughly ϵ-ish), while making the entropy of bit_cutoff(Λ) low given X.
I do think this is a good angle of attack on the problem, and it’s one of the main angles I’d try.
Yes. Indeed, if we allow large enough ϵ (possibly scaling with system size/entropy) then there’s always a deterministic natural latent regardless; the whole thing becomes trivial.
It isn’t enough. See plot. Also, 200 not being enough flips is part of what makes this interesting. With a million flips, this would pretty much just be the exact case. The fact that it’s only 200 flips gives you a tradeoff in how many label_bits to include.
Thanks for the clarifications, that all makes sense. I will keep thinking about this!