Using the information that she is my grandmother, I speculate on the reason why she did not call on Thursday. Perhaps it is because she does not intend to come on Friday: P(Friday) is lowered. Perhaps it is because she does intend to come but judges the regularity of the event to make calling in advance unnecessary unless she had decided not to come: P(Friday) is raised. Grandmothers tend to be old and consequently may be forgetful: perhaps she intends to come but has forgotten to call: P(Friday) is raised. Grandmothers tend to be old, and consequently may be frail: perhaps she has been taken unwell; perhaps she is even now lying on the floor of her home, having taken a fall, and no-one is there to help: P(Friday) is lowered, and perhaps I should phone her.
My answer to the problem is therefore: I phone her to see how she is and ask if she is coming tomorrow.
I know—this is not an answer within the terms of the question. However, it is my answer.
The more abstract version you later posted is a different problem. We have two observations of A and B occurring together, and that is all. Unlike the case of Grandma’s visits, we have no information about any causal connection between A and B. (The sequence of revealing A before B does not affect anything.) What is then the best estimate of P(B|~A)?
We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent. Therefore A can be ignored and the Laplace rule of succession applied to the two observations of B, giving 3⁄4.
ETA: I originally had a far more verbose analysis of the second problem based on modelling it as an urn problem, which I then deleted. But the urn problem may be useful for the intuition anyway. You have an urn full of balls, each of which is either rough or smooth (A or ~A), and either black or white (B or ~B). You pick two balls which turn out to be both rough and black. You pick a third and feel that it is smooth before you look at it. How likely is it to be black?
We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent.
On the contrary, on two points.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way— or, to put it another way, if that were your prior and you observed 100 cases and A and B agreed each time (sometimes true, sometimes false), you’d still assume they were independent.
What you should have said, I think, is that a reasonable prior would have “A and B independent” as one of the most probable options for their relation, as it is one of the simplest. But it should also give some substantial weight to simple dependencies like “A and B identical” and “A and B opposite”.
Second, the sense in which we have no prior information about relations between A and B is not a sense that justifies ignoring A. We had no prior information before we observed them agreeing twice, which raises the probability of “A and B identical” while somewhat lowering that of “A and B independent”.
It’s true that the prior should not be “A and B are independent”. But shouldn’t symmetries of how they may be dependent give essentially the same result as assuming independence? Similar as to how any symmetric prior for how a coin is biased gives the same results for a prediction of probability of heads -- 1⁄2.
I don’t think independence is a good way to analyze things when the probabilities are near zero or one. Independence is just P[A] P[B] = P[AB]. If P[A] or P[B] are near zero or one, this is automatically “nearly true”.
Put another way, two observation of (A, B) give essentially no information about dependence by themselves. This is encoded into ratios between the four possibilities.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence. Almost everyone has two eyes and two legs, and therefore almost everyone has both two eyes and two legs, but it does not follow from those observations alone that possession of two eyes either is, or is not, independent of having two legs. For example, it is well-known (in some possible world) that the rare grey-green greasy Limpopo bore worm invariably attacks either the eyes, or the legs, but never both in the same patient, and thus observing someone walking on healthy legs conveys a tiny positive amount of probability that they have no eyes; while (in another possible world) the venom of the giant rattlesnake of Sumatra rapidly causes both the eyes and the legs of anyone it bites to fall off, with the opposite effect on the relationship between the two misfortunes. I can predict that someone has both two eyes and two legs from the fact that they are a human being. The extra information about their legs that I gain from examining their eyes could go either way.
But that is just an intuitive ramble. What is needed here is a calculation, akin to the Laplace rule of succession, for observations in a 2x2 contingency table. Starting from an ignorance prior that the probabilities of A&B, A&~B, B&~A, and ~A&~B are each 1⁄4, and observing a, b, c, and d examples of each, what is the appropriate posterior? Then fill in the values 2, 0, 0, and 0.
ETA: On reading the comments, I realise that the above is almost all wrong.
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
In order to have a probability distribution rather than just a probability, you need to ask a question that isn’t boolean, ie one with more than two possible answers. If you ask “Will this coin come up heads on the next flip?”, you get a probability, because there are only two possible answers. If you ask “How many times will this coin come up heads out of the next hundred flips?”, then you get back a probability for each number from 0 to 100 - that is, a probability distribution. And if you ask “what kind of coin do I have in my pocket?”, then you get a function that takes any possible description (from “copper” to “slightly worn 1980 American quarter”) and returns a probability of matching that description.
Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value?
Depends on how you’re doing this; if you have a continuous prior for the probability of C, with an expected value of 0.469, then no— and future evidence will continue to modify your probability distribution. If your prior for the probability of C consists of a delta mass at 0.469, then yes, your model perhaps should be criticized, as one might criticize Rosenkrantz for continuing to assume his coin is fair after 30 consecutive heads.
A Bayesian reasoner actually would have a hierarchy of uncertainty about every aspect of ver model, but the simplicity weighting would give them all low probabilities unless they started correctly predicting some strong pattern.
A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
Independence has a specific meaning in probability theory, and it’s a very delicate state of affairs. Many statisticians (and others) get themselves in trouble by assuming independence (because it’s easier to calculate) for variables that are actually correlated.
And depending on your reference class (things with human DNA? animals? macroscopic objects?), having 2 eyes is extremely well correlated with having 2 legs.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence.
Even without any math It already tells you that they are not mutually exclusive. See wnoise’s reply to the grandparent post for the Laplace rule equivalent.
Using the information that she is my grandmother, I speculate on the reason why she did not call on Thursday. Perhaps it is because she does not intend to come on Friday: P(Friday) is lowered. Perhaps it is because she does intend to come but judges the regularity of the event to make calling in advance unnecessary unless she had decided not to come: P(Friday) is raised. Grandmothers tend to be old and consequently may be forgetful: perhaps she intends to come but has forgotten to call: P(Friday) is raised. Grandmothers tend to be old, and consequently may be frail: perhaps she has been taken unwell; perhaps she is even now lying on the floor of her home, having taken a fall, and no-one is there to help: P(Friday) is lowered, and perhaps I should phone her.
My answer to the problem is therefore: I phone her to see how she is and ask if she is coming tomorrow.
I know—this is not an answer within the terms of the question. However, it is my answer.
The more abstract version you later posted is a different problem. We have two observations of A and B occurring together, and that is all. Unlike the case of Grandma’s visits, we have no information about any causal connection between A and B. (The sequence of revealing A before B does not affect anything.) What is then the best estimate of P(B|~A)?
We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent. Therefore A can be ignored and the Laplace rule of succession applied to the two observations of B, giving 3⁄4.
ETA: I originally had a far more verbose analysis of the second problem based on modelling it as an urn problem, which I then deleted. But the urn problem may be useful for the intuition anyway. You have an urn full of balls, each of which is either rough or smooth (A or ~A), and either black or white (B or ~B). You pick two balls which turn out to be both rough and black. You pick a third and feel that it is smooth before you look at it. How likely is it to be black?
Directly using the Laplace rule of succession on the sample space A \tensor B gives weights proportional to:
Conditioning on ~A, P(B|~A) = 1⁄2. Assuming independence does make a significant difference on this little data.
On the contrary, on two points.
First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way— or, to put it another way, if that were your prior and you observed 100 cases and A and B agreed each time (sometimes true, sometimes false), you’d still assume they were independent.
What you should have said, I think, is that a reasonable prior would have “A and B independent” as one of the most probable options for their relation, as it is one of the simplest. But it should also give some substantial weight to simple dependencies like “A and B identical” and “A and B opposite”.
Second, the sense in which we have no prior information about relations between A and B is not a sense that justifies ignoring A. We had no prior information before we observed them agreeing twice, which raises the probability of “A and B identical” while somewhat lowering that of “A and B independent”.
It’s true that the prior should not be “A and B are independent”. But shouldn’t symmetries of how they may be dependent give essentially the same result as assuming independence? Similar as to how any symmetric prior for how a coin is biased gives the same results for a prediction of probability of heads -- 1⁄2.
I don’t think independence is a good way to analyze things when the probabilities are near zero or one. Independence is just P[A] P[B] = P[AB]. If P[A] or P[B] are near zero or one, this is automatically “nearly true”.
Put another way, two observation of (A, B) give essentially no information about dependence by themselves. This is encoded into ratios between the four possibilities.
This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence. Almost everyone has two eyes and two legs, and therefore almost everyone has both two eyes and two legs, but it does not follow from those observations alone that possession of two eyes either is, or is not, independent of having two legs. For example, it is well-known (in some possible world) that the rare grey-green greasy Limpopo bore worm invariably attacks either the eyes, or the legs, but never both in the same patient, and thus observing someone walking on healthy legs conveys a tiny positive amount of probability that they have no eyes; while (in another possible world) the venom of the giant rattlesnake of Sumatra rapidly causes both the eyes and the legs of anyone it bites to fall off, with the opposite effect on the relationship between the two misfortunes. I can predict that someone has both two eyes and two legs from the fact that they are a human being. The extra information about their legs that I gain from examining their eyes could go either way.
But that is just an intuitive ramble. What is needed here is a calculation, akin to the Laplace rule of succession, for observations in a 2x2 contingency table. Starting from an ignorance prior that the probabilities of A&B, A&~B, B&~A, and ~A&~B are each 1⁄4, and observing a, b, c, and d examples of each, what is the appropriate posterior? Then fill in the values 2, 0, 0, and 0.
ETA: On reading the comments, I realise that the above is almost all wrong.
In order to have a probability distribution rather than just a probability, you need to ask a question that isn’t boolean, ie one with more than two possible answers. If you ask “Will this coin come up heads on the next flip?”, you get a probability, because there are only two possible answers. If you ask “How many times will this coin come up heads out of the next hundred flips?”, then you get back a probability for each number from 0 to 100 - that is, a probability distribution. And if you ask “what kind of coin do I have in my pocket?”, then you get a function that takes any possible description (from “copper” to “slightly worn 1980 American quarter”) and returns a probability of matching that description.
Depends on how you’re doing this; if you have a continuous prior for the probability of C, with an expected value of 0.469, then no— and future evidence will continue to modify your probability distribution. If your prior for the probability of C consists of a delta mass at 0.469, then yes, your model perhaps should be criticized, as one might criticize Rosenkrantz for continuing to assume his coin is fair after 30 consecutive heads.
A Bayesian reasoner actually would have a hierarchy of uncertainty about every aspect of ver model, but the simplicity weighting would give them all low probabilities unless they started correctly predicting some strong pattern.
Independence has a specific meaning in probability theory, and it’s a very delicate state of affairs. Many statisticians (and others) get themselves in trouble by assuming independence (because it’s easier to calculate) for variables that are actually correlated.
And depending on your reference class (things with human DNA? animals? macroscopic objects?), having 2 eyes is extremely well correlated with having 2 legs.
Even without any math It already tells you that they are not mutually exclusive. See wnoise’s reply to the grandparent post for the Laplace rule equivalent.
I really like your urn formulation.