Born as the seventh month dies …

Epistemic status: Mathematical reasoning by an amateur, but I feel confident that it’s mostly correct.

Update: There is a Wikipedia page on this with lots of details and other similar problems with different answers. This paper referenced there seems a good summary.

The problem

I was reading The Equation of Knowledge, and it starts with this little cute problem:

Suppose a dad has two kids. At least one of them is a boy born on Tuesday. What’s the odds of his sibling being a boy?

Generalizing the problem:

P(2 boys | 1Bn := At least one boy with an independent characteristic (named N here) that has the probability 1/n)

The original problem can now be seen as an instance of P(2 Boys | 1B7).

Simple Bayesian solution

I started solving this with a simple application of Bayes:

P(1Bn | 2 boys) = P(First child being a boy having N | 2 boys) + P(Second child being a boy having N | 2 boys) - P(Both children being boys having N | 2 boys) = 1/n + 1/n - 1/(n^2)

P(1Bn) = P(First child being a boy having N) + P(Second child being a boy having N) - P(Both children being boys having N) = (1/2)(1/n) + (1/2)(1/n) - ((1/2)(1/n))^2 = 1/n + 1/4(n^2)

BayesFactor(2 boys | 1Bn) = P(1Bn | 2 boys)/P(1Bn) = (8n - 4)/(4n - 1)

P(2 boys | 1Bn) = BayesFactor(2 boys | 1Bn) * P(2 boys)=((8n - 4)/(4n - 1)) * (1/4) = (2n - 1)/(4n - 1)

We have:

P(2 boys | 1B1 == At least one boy) = 1/3

P(2 boys | 1B7 == At least one boy born on Sunday) = 13/27

lim{n -> +Inf}[P(2 boys | 1Bn == At least one boy born exactly x seconds after the big bang)] = 2n/4n = 1/2

So … I am somewhat confused. It’s intuitively obvious that having two boys creates more opportunity for specific independent phenomena to happen. But, at first blush, my intuition was firmly suggesting that I throwaway the additional information as useless, and only careful thinking lead me to the (hopefully) correct answer. I also can’t quite think of any practical examples for this epistemic error. Your thoughts appreciated.

Generalizing more

Repeating the same analysis, but generalizing the probability of “being a boy” to 1/​k,

BayesFactor(2 boys | P(boy)=1/k, 1Bn) = (2n(k^2) - (k^2))/(2nk - 1) 

lim{n -> +Inf}[P(2 Boys | P(boy)=1/k, 1Bn)] = 1/k

Generalizing to random variables

Suppose we have two independent, identically distributed variables X1 and X2, and another two i.i.d variables Z1 and Z2. All of these variables are mutually independent. Repeating the exact same calculations, we’ll have:

Px := P(X1=x) = P(X2=x)
Pz := P(Z1=z) = P(Z2=z)

BayesFactor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = ... = (2Pz - Pz^2)/(2PxPz - (PxPz)^2)
P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = BayesFactor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) * P(X1=X2=x) = ... = (2Px - PxPz)/(2 - PxPz)

lim{Pz -> 0+}[P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) )] = 2Px/2 = Px

If we set Pz = 1 (basically nuking the Z variables), we’ll have:

P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = P(X1=X2=x | X1=x or X2=x) =  Px/(2 - Px)

So the independent information provided by the Z variables can, maximally, improve the odds by a ratio of 2 - Px >= 1.