We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent.
On the contrary, on two points.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way— or, to put it another way, if that were your prior and you observed 100 cases and A and B agreed each time (sometimes true, sometimes false), you’d still assume they were independent.
What you should have said, I think, is that a reasonable prior would have “A and B independent” as one of the most probable options for their relation, as it is one of the simplest. But it should also give some substantial weight to simple dependencies like “A and B identical” and “A and B opposite”.
Second, the sense in which we have no prior information about relations between A and B is not a sense that justifies ignoring A. We had no prior information before we observed them agreeing twice, which raises the probability of “A and B identical” while somewhat lowering that of “A and B independent”.
It’s true that the prior should not be “A and B are independent”. But shouldn’t symmetries of how they may be dependent give essentially the same result as assuming independence? Similar as to how any symmetric prior for how a coin is biased gives the same results for a prediction of probability of heads -- 1⁄2.
I don’t think independence is a good way to analyze things when the probabilities are near zero or one. Independence is just P[A] P[B] = P[AB]. If P[A] or P[B] are near zero or one, this is automatically “nearly true”.
Put another way, two observation of (A, B) give essentially no information about dependence by themselves. This is encoded into ratios between the four possibilities.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence. Almost everyone has two eyes and two legs, and therefore almost everyone has both two eyes and two legs, but it does not follow from those observations alone that possession of two eyes either is, or is not, independent of having two legs. For example, it is well-known (in some possible world) that the rare grey-green greasy Limpopo bore worm invariably attacks either the eyes, or the legs, but never both in the same patient, and thus observing someone walking on healthy legs conveys a tiny positive amount of probability that they have no eyes; while (in another possible world) the venom of the giant rattlesnake of Sumatra rapidly causes both the eyes and the legs of anyone it bites to fall off, with the opposite effect on the relationship between the two misfortunes. I can predict that someone has both two eyes and two legs from the fact that they are a human being. The extra information about their legs that I gain from examining their eyes could go either way.
But that is just an intuitive ramble. What is needed here is a calculation, akin to the Laplace rule of succession, for observations in a 2x2 contingency table. Starting from an ignorance prior that the probabilities of A&B, A&~B, B&~A, and ~A&~B are each 1⁄4, and observing a, b, c, and d examples of each, what is the appropriate posterior? Then fill in the values 2, 0, 0, and 0.
ETA: On reading the comments, I realise that the above is almost all wrong.
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
In order to have a probability distribution rather than just a probability, you need to ask a question that isn’t boolean, ie one with more than two possible answers. If you ask “Will this coin come up heads on the next flip?”, you get a probability, because there are only two possible answers. If you ask “How many times will this coin come up heads out of the next hundred flips?”, then you get back a probability for each number from 0 to 100 - that is, a probability distribution. And if you ask “what kind of coin do I have in my pocket?”, then you get a function that takes any possible description (from “copper” to “slightly worn 1980 American quarter”) and returns a probability of matching that description.
Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value?
Depends on how you’re doing this; if you have a continuous prior for the probability of C, with an expected value of 0.469, then no— and future evidence will continue to modify your probability distribution. If your prior for the probability of C consists of a delta mass at 0.469, then yes, your model perhaps should be criticized, as one might criticize Rosenkrantz for continuing to assume his coin is fair after 30 consecutive heads.
A Bayesian reasoner actually would have a hierarchy of uncertainty about every aspect of ver model, but the simplicity weighting would give them all low probabilities unless they started correctly predicting some strong pattern.
A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
Independence has a specific meaning in probability theory, and it’s a very delicate state of affairs. Many statisticians (and others) get themselves in trouble by assuming independence (because it’s easier to calculate) for variables that are actually correlated.
And depending on your reference class (things with human DNA? animals? macroscopic objects?), having 2 eyes is extremely well correlated with having 2 legs.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence.
Even without any math It already tells you that they are not mutually exclusive. See wnoise’s reply to the grandparent post for the Laplace rule equivalent.
On the contrary, on two points.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way— or, to put it another way, if that were your prior and you observed 100 cases and A and B agreed each time (sometimes true, sometimes false), you’d still assume they were independent.
What you should have said, I think, is that a reasonable prior would have “A and B independent” as one of the most probable options for their relation, as it is one of the simplest. But it should also give some substantial weight to simple dependencies like “A and B identical” and “A and B opposite”.
Second, the sense in which we have no prior information about relations between A and B is not a sense that justifies ignoring A. We had no prior information before we observed them agreeing twice, which raises the probability of “A and B identical” while somewhat lowering that of “A and B independent”.
It’s true that the prior should not be “A and B are independent”. But shouldn’t symmetries of how they may be dependent give essentially the same result as assuming independence? Similar as to how any symmetric prior for how a coin is biased gives the same results for a prediction of probability of heads -- 1⁄2.
I don’t think independence is a good way to analyze things when the probabilities are near zero or one. Independence is just P[A] P[B] = P[AB]. If P[A] or P[B] are near zero or one, this is automatically “nearly true”.
Put another way, two observation of (A, B) give essentially no information about dependence by themselves. This is encoded into ratios between the four possibilities.
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence. Almost everyone has two eyes and two legs, and therefore almost everyone has both two eyes and two legs, but it does not follow from those observations alone that possession of two eyes either is, or is not, independent of having two legs. For example, it is well-known (in some possible world) that the rare grey-green greasy Limpopo bore worm invariably attacks either the eyes, or the legs, but never both in the same patient, and thus observing someone walking on healthy legs conveys a tiny positive amount of probability that they have no eyes; while (in another possible world) the venom of the giant rattlesnake of Sumatra rapidly causes both the eyes and the legs of anyone it bites to fall off, with the opposite effect on the relationship between the two misfortunes. I can predict that someone has both two eyes and two legs from the fact that they are a human being. The extra information about their legs that I gain from examining their eyes could go either way.
But that is just an intuitive ramble. What is needed here is a calculation, akin to the Laplace rule of succession, for observations in a 2x2 contingency table. Starting from an ignorance prior that the probabilities of A&B, A&~B, B&~A, and ~A&~B are each 1⁄4, and observing a, b, c, and d examples of each, what is the appropriate posterior? Then fill in the values 2, 0, 0, and 0.
ETA: On reading the comments, I realise that the above is almost all wrong.
In order to have a probability distribution rather than just a probability, you need to ask a question that isn’t boolean, ie one with more than two possible answers. If you ask “Will this coin come up heads on the next flip?”, you get a probability, because there are only two possible answers. If you ask “How many times will this coin come up heads out of the next hundred flips?”, then you get back a probability for each number from 0 to 100 - that is, a probability distribution. And if you ask “what kind of coin do I have in my pocket?”, then you get a function that takes any possible description (from “copper” to “slightly worn 1980 American quarter”) and returns a probability of matching that description.
Depends on how you’re doing this; if you have a continuous prior for the probability of C, with an expected value of 0.469, then no— and future evidence will continue to modify your probability distribution. If your prior for the probability of C consists of a delta mass at 0.469, then yes, your model perhaps should be criticized, as one might criticize Rosenkrantz for continuing to assume his coin is fair after 30 consecutive heads.
A Bayesian reasoner actually would have a hierarchy of uncertainty about every aspect of ver model, but the simplicity weighting would give them all low probabilities unless they started correctly predicting some strong pattern.
Independence has a specific meaning in probability theory, and it’s a very delicate state of affairs. Many statisticians (and others) get themselves in trouble by assuming independence (because it’s easier to calculate) for variables that are actually correlated.
And depending on your reference class (things with human DNA? animals? macroscopic objects?), having 2 eyes is extremely well correlated with having 2 legs.
Even without any math It already tells you that they are not mutually exclusive. See wnoise’s reply to the grandparent post for the Laplace rule equivalent.