The mistake is that you can only decompose a conjunction that way when it’s on the left side of the |, not when it’s on the right.
That said, I think a collection of a dozen or so incorrect bits of statistical reasoning like this would make great exercises, and would encourage the creation of a top-level post based on that premise.
The mistake is that you can only decompose a conjunction that way when it’s on the left side of the |, not when it’s on the right.
That’s not the mistake.
You might be right. It makes sense to me that P(X|A)=.5, P(X|B)=.5, independent(A,B) ⇒ P(X|A,B) = .75. But I can’t derive it.
Suppose there’s a 50% chance of rain when the weatherman predicted rain; and a 50% chance your neighbor will turn on his lawn sprinkler on Saturday. (Your neighbor doesn’t watch the weather report.) It’s Saturday, and the weatherman predicted rain.
You might be right. It makes sense to me that P(X|A)=.5, P(X|B)=.5, independent(A,B) ⇒ P(X|A,B) = .75. But I can’t derive it.
Of course you can’t derive it—you gave a counterexample! More precisely, let A,B,X be heads on three independent coinflips.
Jim gives a fallacy that it looks like people might actually commit[1], but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
ETA: the following paragraph is wrong and unnecessarily complicated. Maybe it would be better to skip down to my later comment.
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s [ETA: this is wrong] (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above (so I agree with Jim’s complaint that the two problems are unrelated). Causal structure has to do with sides of |, so maybe when you unwind this discussion of “evidence,” it turns into Jim’s fallacy, but I doubt it.
[1] ETA: Do people actually commit Jim’s fallacy? As I explain above, I don’t think that’s Phil’s original mistake, but he does make it in the quoted text. If Phil correctly abstracted his train of thought, then Jim is correct. But I think Phil probably learned this in the context of “evidence” and overgeneralized when trying to understand the problem in his example; and wouldn’t have abstracted it this way without Jim’s input.
Of course you can’t derive it—you gave a counterexample!
It’s not a real counterexample. The data sets aren’t actually independent. The real trick is that I try to convince you that the 2 datasets being used are independent because they don’t intersect. Common definitions of independence would say they are independent because you can’t compute any correlation between them. But they both have the same underlying probability distribution generated from the same source. I’m confused about what “independence” should mean in this case.
it seems to be written in a standard formal language and you have to interpret it with A and B as random variables, “independent” meaning independence of random variables. Then the formal statement is false by the example of three independent coin flips. When I first posted my comment, I had a panicked moment and deleted it until I realized that the three coins example is a counter example; whether it is the same as your example is not important.
Yes, the two data sets are not independent, if you’re not sure how coins are weighted. Two flips of a coin weighted in an unknown way are not independent random variables. But that’s good, because if they were independent in that way, their evidence wouldn’t add. But since they are independent conditional on the thing we want to measure, their evidence does add.
I repeated your original formulation of the consequence of independent evidence, but it is not correct. If you have two independent pieces of evidence that would each send you to 90% certainty, you do not conclude 99% certainty. It depends on your prior! If your prior were 50%, you conclude 98%. If your prior were 90%, neither piece of evidence told you anything, so together you learn nothing! (Moreover, if A and B are empty observations, that is a counterexample to pretty much any formulation.)
When is evidence independent? Evidence is the log of the likelihood ratio, which is what shows up in a bayesian update. The likelihood ratio for X involves only probabilities conditional on X. Thus independence of A and B conditional on X is exactly what we need for the probabilities to multiply; for the evidence of A and B to be the sum of the evidence of the individual events. In particular, the evidence from two flips of a single weighted coin give independent evidence about the weight.
The problem I was originally trying to approach is a problem I’m having at work, where I have multiple sources of evidence to conclude property(X); but I don’t have enough data to compute the correlations between these sources of evidence, because for most datapoints, only one source applies. When 2 or 3 of these sources agree, how do I compute the probability? But on reflection, it’s a different problem.
These problems only look similar because you are hiding the assumption that (neighbor’s lawn wet) = (rain or sprinkler turned on). From this, P(neighbor’s lawn wet | Saturday, predicted rain) = P(rain or sprinkler turned on|.5 chance of rain and .5 chance of sprinkler) = 1-P(~rain|...)P(~sprinkler|...) = .75. But there is no valid similar statement for the coin; the analogous disjunction would be (coin heads) = (2008 coin heads or penny heads), in which case treating the clauses of the right hand side as independent means flipping two coins and checking if at least one of them is heads.
I disagree—that makes no difference. Just change “it will rain” to “my neighbor’s lawn will be wet”, and “my neighbor’s sprinkler will turn on” to “my neighbor’s lawn will be wet”.
Rephrase the problem without the inferential step from “it will rain” and “sprinkler will turn on” to “lawn will be wet”. Just use “lawn will be wet” in all those places. We can do this because we are using symbolic logic.
works because each part of the condition “Saturday, predicted rain” is involved in a different causal path to the result we are interested in. With the coins, neither being a penny specifically, or being minted in 2008, causes the coin flips to result in heads, so the analogous equation does not work.
That’s right—but the problem is not that “you can only decompose a conjunction that way when it’s on the left side of the |, not when it’s on the right.” The reasoning is syntactically correct. The two cases are syntactically identical when you do the syntactic substitution I suggested above.
The two cases are syntactically identical because you have not explicitly explained how you meet the conditions that allow the decomposition you used, conditions which are not in fact met in the case of the coins. Your “syntactic substitution” removes the information that could be used to show you meet the conditions.
(Just to be sure, you did intend the original “proof” as a “find the error” exercise, right?)
You might be right. It makes sense to me that P(X|A)=.5, P(X|B)=.5, independent(A,B) ⇒ P(X|A,B) = .75. But I can’t derive it.
Of course you can’t derive it—you gave a counterexample!
Jim gives a fallacy that it looks like people might actually commit (do they?), but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above. Causal structure has to do with sides of |, so maybe when you unwind this, it turns into Jim’s fallacy, but I doubt it.
The mistake is that you can only decompose a conjunction that way when it’s on the left side of the |, not when it’s on the right.
That said, I think a collection of a dozen or so incorrect bits of statistical reasoning like this would make great exercises, and would encourage the creation of a top-level post based on that premise.
I was wrong, and jimrandomh was right. I said:
P(X|A,B) = 1-(1-P(X|A))(1-P(X|B)) if P(A,B) = P(A)P(B)
But P(X|A,B) = 1-P(~X|A,B)
therefore P(~X|A,B) = (1-P(X|A))(1-P(X|B)) = P(~X|A)P(~X|B)
This is equivalent to claiming P(X|A,B) = P(X|A)P(X|B)
And this is wrong for most distributions.
That’s not the mistake.
You might be right. It makes sense to me that P(X|A)=.5, P(X|B)=.5, independent(A,B) ⇒ P(X|A,B) = .75. But I can’t derive it.Suppose there’s a 50% chance of rain when the weatherman predicted rain; and a 50% chance your neighbor will turn on his lawn sprinkler on Saturday. (Your neighbor doesn’t watch the weather report.) It’s Saturday, and the weatherman predicted rain.
P(neighbor’s lawn wet | Saturday, predicted rain) = 1 - (1-.5)(1-.5) = .75
Of course you can’t derive it—you gave a counterexample!
More precisely, let A,B,X be heads on three independent coinflips.
Jim gives a fallacy that it looks like people might actually commit[1], but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
ETA: the following paragraph is wrong and unnecessarily complicated. Maybe it would be better to skip down to my later comment.
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s [ETA: this is wrong] (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above (so I agree with Jim’s complaint that the two problems are unrelated). Causal structure has to do with sides of |, so maybe when you unwind this discussion of “evidence,” it turns into Jim’s fallacy, but I doubt it.
[1] ETA: Do people actually commit Jim’s fallacy? As I explain above, I don’t think that’s Phil’s original mistake, but he does make it in the quoted text. If Phil correctly abstracted his train of thought, then Jim is correct. But I think Phil probably learned this in the context of “evidence” and overgeneralized when trying to understand the problem in his example; and wouldn’t have abstracted it this way without Jim’s input.
It’s not a real counterexample. The data sets aren’t actually independent. The real trick is that I try to convince you that the 2 datasets being used are independent because they don’t intersect. Common definitions of independence would say they are independent because you can’t compute any correlation between them. But they both have the same underlying probability distribution generated from the same source. I’m confused about what “independence” should mean in this case.
When you make the statement
it seems to be written in a standard formal language and you have to interpret it with A and B as random variables, “independent” meaning independence of random variables. Then the formal statement is false by the example of three independent coin flips. When I first posted my comment, I had a panicked moment and deleted it until I realized that the three coins example is a counter example; whether it is the same as your example is not important.
Yes, the two data sets are not independent, if you’re not sure how coins are weighted. Two flips of a coin weighted in an unknown way are not independent random variables. But that’s good, because if they were independent in that way, their evidence wouldn’t add. But since they are independent conditional on the thing we want to measure, their evidence does add.
I repeated your original formulation of the consequence of independent evidence, but it is not correct. If you have two independent pieces of evidence that would each send you to 90% certainty, you do not conclude 99% certainty. It depends on your prior! If your prior were 50%, you conclude 98%. If your prior were 90%, neither piece of evidence told you anything, so together you learn nothing! (Moreover, if A and B are empty observations, that is a counterexample to pretty much any formulation.)
When is evidence independent?
Evidence is the log of the likelihood ratio, which is what shows up in a bayesian update. The likelihood ratio for X involves only probabilities conditional on X. Thus independence of A and B conditional on X is exactly what we need for the probabilities to multiply; for the evidence of A and B to be the sum of the evidence of the individual events. In particular, the evidence from two flips of a single weighted coin give independent evidence about the weight.
The problem I was originally trying to approach is a problem I’m having at work, where I have multiple sources of evidence to conclude property(X); but I don’t have enough data to compute the correlations between these sources of evidence, because for most datapoints, only one source applies. When 2 or 3 of these sources agree, how do I compute the probability? But on reflection, it’s a different problem.
These problems only look similar because you are hiding the assumption that (neighbor’s lawn wet) = (rain or sprinkler turned on). From this, P(neighbor’s lawn wet | Saturday, predicted rain) = P(rain or sprinkler turned on|.5 chance of rain and .5 chance of sprinkler) = 1-P(~rain|...)P(~sprinkler|...) = .75. But there is no valid similar statement for the coin; the analogous disjunction would be (coin heads) = (2008 coin heads or penny heads), in which case treating the clauses of the right hand side as independent means flipping two coins and checking if at least one of them is heads.
I disagree—that makes no difference. Just change “it will rain” to “my neighbor’s lawn will be wet”, and “my neighbor’s sprinkler will turn on” to “my neighbor’s lawn will be wet”.
1) P(lawn wet| Saturday, predicted rain) = P(lawn watered or rain| Saturday, predicted rain)
2) = 1 - P(not lawn watered and not rain| Saturday, predicted rain)
3) = 1 - P(not lawn watered | Saturday, predicted rain) * (not rain | Saturday, predicted rain)
4) = 1 - P(not lawn watered | Saturday) * (not rain | predicted rain)
What is the analogy to step 1 with the coins?
Rephrase the problem without the inferential step from “it will rain” and “sprinkler will turn on” to “lawn will be wet”. Just use “lawn will be wet” in all those places. We can do this because we are using symbolic logic.
That is throwing away the information that allows you to make the inference. The equation:
works because each part of the condition “Saturday, predicted rain” is involved in a different causal path to the result we are interested in. With the coins, neither being a penny specifically, or being minted in 2008, causes the coin flips to result in heads, so the analogous equation does not work.
That’s right—but the problem is not that “you can only decompose a conjunction that way when it’s on the left side of the |, not when it’s on the right.” The reasoning is syntactically correct. The two cases are syntactically identical when you do the syntactic substitution I suggested above.
Or… hmm, I may be confused.
The two cases are syntactically identical because you have not explicitly explained how you meet the conditions that allow the decomposition you used, conditions which are not in fact met in the case of the coins. Your “syntactic substitution” removes the information that could be used to show you meet the conditions.
(Just to be sure, you did intend the original “proof” as a “find the error” exercise, right?)
Yes—but also to help me figure out how you ask whether 2 data sets are independent when they don’t intersect.
Of course you can’t derive it—you gave a counterexample!
Jim gives a fallacy that it looks like people might actually commit (do they?), but I don’t think it’s what’s going on here. I think the issue is the meaning of “independent evidence.”
One kind of evidence is measurements. In that case the event of interest causes the measurement, plus there’s some noise. I think what we usually mean by “independent measurements” is that the noise for the one measurement is independent of the noise for the other measurement. How you combine the measurements depends on your noise model (as does even saying that the noises are independent). If your noise model is that there’s a large chance of a correct read and a small chance p of an incorrect read, then agreeing reads allow you to multiply the two p’s (if p is not small, what happens depends on the details of an incorrect read), which is roughly what you did, except that you confused the measurement of a probability with the probability of noise. You might be able to struggle through interpreting the measured p=.5 as a noise, but it would require a detailed noise model.
The lawn has opposite causal structure from the kinds of measurements/evidence above. Causal structure has to do with sides of |, so maybe when you unwind this, it turns into Jim’s fallacy, but I doubt it.