You might be right. It makes sense to me that P(X|A)=.5, P(X|B)=.5, independent(A,B) ⇒ P(X|A,B) = .75. But I can’t derive it.
Of course you can’t derive it—you gave a counterexample! More precisely, let A,B,X be heads on three independent coinflips.
Jim gives a fallacy that it looks like people might actually commit[1], but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
ETA: the following paragraph is wrong and unnecessarily complicated. Maybe it would be better to skip down to my later comment.
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s [ETA: this is wrong] (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above (so I agree with Jim’s complaint that the two problems are unrelated). Causal structure has to do with sides of |, so maybe when you unwind this discussion of “evidence,” it turns into Jim’s fallacy, but I doubt it.
[1] ETA: Do people actually commit Jim’s fallacy? As I explain above, I don’t think that’s Phil’s original mistake, but he does make it in the quoted text. If Phil correctly abstracted his train of thought, then Jim is correct. But I think Phil probably learned this in the context of “evidence” and overgeneralized when trying to understand the problem in his example; and wouldn’t have abstracted it this way without Jim’s input.
Of course you can’t derive it—you gave a counterexample!
It’s not a real counterexample. The data sets aren’t actually independent. The real trick is that I try to convince you that the 2 datasets being used are independent because they don’t intersect. Common definitions of independence would say they are independent because you can’t compute any correlation between them. But they both have the same underlying probability distribution generated from the same source. I’m confused about what “independence” should mean in this case.
it seems to be written in a standard formal language and you have to interpret it with A and B as random variables, “independent” meaning independence of random variables. Then the formal statement is false by the example of three independent coin flips. When I first posted my comment, I had a panicked moment and deleted it until I realized that the three coins example is a counter example; whether it is the same as your example is not important.
Yes, the two data sets are not independent, if you’re not sure how coins are weighted. Two flips of a coin weighted in an unknown way are not independent random variables. But that’s good, because if they were independent in that way, their evidence wouldn’t add. But since they are independent conditional on the thing we want to measure, their evidence does add.
I repeated your original formulation of the consequence of independent evidence, but it is not correct. If you have two independent pieces of evidence that would each send you to 90% certainty, you do not conclude 99% certainty. It depends on your prior! If your prior were 50%, you conclude 98%. If your prior were 90%, neither piece of evidence told you anything, so together you learn nothing! (Moreover, if A and B are empty observations, that is a counterexample to pretty much any formulation.)
When is evidence independent? Evidence is the log of the likelihood ratio, which is what shows up in a bayesian update. The likelihood ratio for X involves only probabilities conditional on X. Thus independence of A and B conditional on X is exactly what we need for the probabilities to multiply; for the evidence of A and B to be the sum of the evidence of the individual events. In particular, the evidence from two flips of a single weighted coin give independent evidence about the weight.
The problem I was originally trying to approach is a problem I’m having at work, where I have multiple sources of evidence to conclude property(X); but I don’t have enough data to compute the correlations between these sources of evidence, because for most datapoints, only one source applies. When 2 or 3 of these sources agree, how do I compute the probability? But on reflection, it’s a different problem.
Of course you can’t derive it—you gave a counterexample!
More precisely, let A,B,X be heads on three independent coinflips.
Jim gives a fallacy that it looks like people might actually commit[1], but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
ETA: the following paragraph is wrong and unnecessarily complicated. Maybe it would be better to skip down to my later comment.
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s [ETA: this is wrong] (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above (so I agree with Jim’s complaint that the two problems are unrelated). Causal structure has to do with sides of |, so maybe when you unwind this discussion of “evidence,” it turns into Jim’s fallacy, but I doubt it.
[1] ETA: Do people actually commit Jim’s fallacy? As I explain above, I don’t think that’s Phil’s original mistake, but he does make it in the quoted text. If Phil correctly abstracted his train of thought, then Jim is correct. But I think Phil probably learned this in the context of “evidence” and overgeneralized when trying to understand the problem in his example; and wouldn’t have abstracted it this way without Jim’s input.
It’s not a real counterexample. The data sets aren’t actually independent. The real trick is that I try to convince you that the 2 datasets being used are independent because they don’t intersect. Common definitions of independence would say they are independent because you can’t compute any correlation between them. But they both have the same underlying probability distribution generated from the same source. I’m confused about what “independence” should mean in this case.
When you make the statement
it seems to be written in a standard formal language and you have to interpret it with A and B as random variables, “independent” meaning independence of random variables. Then the formal statement is false by the example of three independent coin flips. When I first posted my comment, I had a panicked moment and deleted it until I realized that the three coins example is a counter example; whether it is the same as your example is not important.
Yes, the two data sets are not independent, if you’re not sure how coins are weighted. Two flips of a coin weighted in an unknown way are not independent random variables. But that’s good, because if they were independent in that way, their evidence wouldn’t add. But since they are independent conditional on the thing we want to measure, their evidence does add.
I repeated your original formulation of the consequence of independent evidence, but it is not correct. If you have two independent pieces of evidence that would each send you to 90% certainty, you do not conclude 99% certainty. It depends on your prior! If your prior were 50%, you conclude 98%. If your prior were 90%, neither piece of evidence told you anything, so together you learn nothing! (Moreover, if A and B are empty observations, that is a counterexample to pretty much any formulation.)
When is evidence independent?
Evidence is the log of the likelihood ratio, which is what shows up in a bayesian update. The likelihood ratio for X involves only probabilities conditional on X. Thus independence of A and B conditional on X is exactly what we need for the probabilities to multiply; for the evidence of A and B to be the sum of the evidence of the individual events. In particular, the evidence from two flips of a single weighted coin give independent evidence about the weight.
The problem I was originally trying to approach is a problem I’m having at work, where I have multiple sources of evidence to conclude property(X); but I don’t have enough data to compute the correlations between these sources of evidence, because for most datapoints, only one source applies. When 2 or 3 of these sources agree, how do I compute the probability? But on reflection, it’s a different problem.