Epistemic status: Quick dump of something that might be useful to someone. o3 and Opus 4 independently agree on the numerical calculations for the bolded result below, but I didn’t check the calculations myself in any detail.
When we say “roughly”, e.g.2ϵ or 3ϵ would be fine; it may be a judgement call on our part if the bound is much larger than that.
Let Y∼Ber(p). With probability r, set Z:=X, and otherwise draw Z∼Ber(p). Let Y∼Ber(1/2). Let A=X⊕Y and B=Y⊕Z. We will investigate latents for (A,B).
Set Λ:=Y, then note that the stochastic error ϵ:=I(A;Y|B)) because Y induces perfect conditional independence and symmetry of A and B. Now compute the deterministic errors of Λ:=Y, Λ:=0, Λ:=A, which are equal to H(Y∣A),I(A;B),H(A|B) respectively.
Then it turns out that with p:=0.9,r:=0.44, all of these latents have error greater than 5ϵ, if you believe this claude opus 4 artifact (full chat here, corroboration by o3 here).Conditional on there not being some other kind of latent that gets better deterministic error, and the calculations being correct, I would expect that a bit more fiddling around could produce much better bounds, say 10ϵ or more, since I think I’ve explored very little of the search space.
e.g. one could create more As and Bs by either adding more Ys, or more Xs and Zs. Or one could pick the probabilities p,r out of some discrete set of possibilities instead of having them be fixed.
Set Λ:=Y, then note that the stochastic error ϵ:=I(A;Y|B)) because Y induces perfect conditional independence and symmetry of A and B.
I don’t think Y induces perfect conditional independence? Conditional on Y, we have:
(Probability r) A = B, else
(Probability 1 - r) A and B are independent
… which means that learning the value of A tells me something about the value of B, conditional on Y (specifically, B is more likely to have the same value A had).
Am I missing something here?
(Also, for purposes of me tracking how useful LLMs are for research: assuming I’m not missing something and this was a mistake, was the mistake originally made by you or an LLM?)
Yeah, my comment went through a few different versions and that statement doesn’t apply to the final setting. I should’ve checked it better before hitting submit, sorry. I only used LLMs for writing code for numerical calculations, so the error is mine. [1]
I think that I didn’t actually use this claim in the numerical calculations, so I’d hope that the rest of the comment continues to hold. I had hoped to verify that before replying, but given that it’s been two weeks already, I don’t know when I’ll manage to get to it.
To check my understanding: for random variables A,B, the stochastic error of a latent Λ is the maximum among I(A;B|Λ),I(A;Λ|B),I(B;Λ|A). The deterministic error is the maximum among I(A;B∣Λ),H(Λ|A),H(Λ|B). If so, the claim in my original comment holds—I also wrote code (manually) to verify. Here’s the fixed claim:
Let X∼Ber(p). With probability r, set Z:=X, and otherwise draw Z∼Ber(p). Let Y∼Ber(1/2). Let A=X⊕Y and B=Y⊕Z. We will investigate latents for (A,B). Let ϵ be the stochastic error of latent Λ:=Y. Now compute the deterministic errors of each of the latents X, Y, Z, A, B, A⊕B, X⊕Y⊕Z. Then for p:=0.9,r:=0.44, all of these latents have deterministic error greater than 5ϵ.
It should be easy to modify the code to consider other latents. I haven’t thought much about proving that there aren’t any other latents better than these, though.
On this particular example you can achieve deterministic error ≈2.5ϵ with latent A∧B, but it seems easy to find other examples with ratio > 5 (including over latents A∧B,A∨B) in the space of distributions over (X,Y,Z) with a random-restart hill-climb. Anyway, my takeaway is that if you think you can derandomize latents in general you should probably try to derandomize the latent Λ:=Y for variables A:=X⊕Y, B:=Z⊕Y for distributions over boolean variables X,Y,Z.
My impression is that prior discussion focused on discretizing Λ. Λ is already boolean here, so if the hypothesis is true then it’s for a different reason.
Epistemic status: Quick dump of something that might be useful to someone. o3 and Opus 4 independently agree on the numerical calculations for the bolded result below, but I didn’t check the calculations myself in any detail.
Let Y∼Ber(p). With probability r, set Z:=X, and otherwise draw Z∼Ber(p). Let Y∼Ber(1/2). Let A=X⊕Y and B=Y⊕Z. We will investigate latents for (A,B).
Set Λ:=Y, then note that the stochastic error ϵ:=I(A;Y|B)) because Y induces perfect conditional independence and symmetry of A and B. Now compute the deterministic errors of Λ:=Y, Λ:=0, Λ:=A, which are equal to H(Y∣A),I(A;B),H(A|B) respectively.
Then it turns out that with p:=0.9,r:=0.44, all of these latents have error greater than 5ϵ, if you believe this claude opus 4 artifact (full chat here, corroboration by o3 here). Conditional on there not being some other kind of latent that gets better deterministic error, and the calculations being correct, I would expect that a bit more fiddling around could produce much better bounds, say 10ϵ or more, since I think I’ve explored very little of the search space.
e.g. one could create more As and Bs by either adding more Ys, or more Xs and Zs. Or one could pick the probabilities p,r out of some discrete set of possibilities instead of having them be fixed.
I don’t think Y induces perfect conditional independence? Conditional on Y, we have:
(Probability r) A = B, else
(Probability 1 - r) A and B are independent
… which means that learning the value of A tells me something about the value of B, conditional on Y (specifically, B is more likely to have the same value A had).
Am I missing something here?
(Also, for purposes of me tracking how useful LLMs are for research: assuming I’m not missing something and this was a mistake, was the mistake originally made by you or an LLM?)
Yeah, my comment went through a few different versions and that statement doesn’t apply to the final setting. I should’ve checked it better before hitting submit, sorry. I only used LLMs for writing code for numerical calculations, so the error is mine. [1]
I think that I didn’t actually use this claim in the numerical calculations, so I’d hope that the rest of the comment continues to hold. I had hoped to verify that before replying, but given that it’s been two weeks already, I don’t know when I’ll manage to get to it.
I did try to see if it could write a message explaining the claim, but didn’t use that
To check my understanding: for random variables A,B, the stochastic error of a latent Λ is the maximum among I(A;B|Λ),I(A;Λ|B),I(B;Λ|A). The deterministic error is the maximum among I(A;B∣Λ),H(Λ|A),H(Λ|B). If so, the claim in my original comment holds—I also wrote code (manually) to verify. Here’s the fixed claim:
Let X∼Ber(p). With probability r, set Z:=X, and otherwise draw Z∼Ber(p). Let Y∼Ber(1/2). Let A=X⊕Y and B=Y⊕Z. We will investigate latents for (A,B). Let ϵ be the stochastic error of latent Λ:=Y. Now compute the deterministic errors of each of the latents X, Y, Z, A, B, A⊕B, X⊕Y⊕Z. Then for p:=0.9,r:=0.44, all of these latents have deterministic error greater than 5ϵ.
It should be easy to modify the code to consider other latents. I haven’t thought much about proving that there aren’t any other latents better than these, though.
On this particular example you can achieve deterministic error ≈2.5ϵ with latent A∧B, but it seems easy to find other examples with ratio > 5 (including over latents A∧B,A∨B) in the space of distributions over (X,Y,Z) with a random-restart hill-climb. Anyway, my takeaway is that if you think you can derandomize latents in general you should probably try to derandomize the latent Λ:=Y for variables A:=X⊕Y, B:=Z⊕Y for distributions over boolean variables X,Y,Z.
(edited to fix typo in definition of B)
My impression is that prior discussion focused on discretizing Λ. Λ is already boolean here, so if the hypothesis is true then it’s for a different reason.