Born as the seventh month dies …

Epistemic sta­tus: Math­e­mat­i­cal rea­son­ing by an am­a­teur, but I feel con­fi­dent that it’s mostly cor­rect.

Up­date: There is a Wikipe­dia page on this with lots of de­tails and other similar prob­lems with differ­ent an­swers. This pa­per refer­enced there seems a good sum­mary.

The problem

I was read­ing The Equa­tion of Knowl­edge, and it starts with this lit­tle cute prob­lem:

Sup­pose a dad has two kids. At least one of them is a boy born on Tues­day. What’s the odds of his sibling be­ing a boy?

Gen­er­al­iz­ing the prob­lem:

P(2 boys | 1Bn := At least one boy with an in­de­pen­dent char­ac­ter­is­tic (named N here) that has the prob­a­bil­ity 1/​n)

The origi­nal prob­lem can now be seen as an in­stance of P(2 Boys | 1B7).

Sim­ple Bayesian solution

I started solv­ing this with a sim­ple ap­pli­ca­tion of Bayes:

P(1Bn | 2 boys) = P(First child be­ing a boy hav­ing N | 2 boys) + P(Se­cond child be­ing a boy hav­ing N | 2 boys) - P(Both chil­dren be­ing boys hav­ing N | 2 boys) = 1/​n + 1/​n − 1/​(n^2)

P(1Bn) = P(First child be­ing a boy hav­ing N) + P(Se­cond child be­ing a boy hav­ing N) - P(Both chil­dren be­ing boys hav­ing N) = (1/​2)(1/​n) + (1/​2)(1/​n) - ((1/​2)(1/​n))^2 = 1/​n + 1/​4(n^2)

BayesFac­tor(2 boys | 1Bn) = P(1Bn | 2 boys)/​P(1Bn) = (8n − 4)/​(4n − 1)

P(2 boys | 1Bn) = BayesFac­tor(2 boys | 1Bn) * P(2 boys)=((8n − 4)/​(4n − 1)) * (1/​4) = (2n − 1)/​(4n − 1)

We have:

P(2 boys | 1B1 == At least one boy) = 1/​3

P(2 boys | 1B7 == At least one boy born on Sun­day) = 13/​27

lim{n → +Inf}[P(2 boys | 1Bn == At least one boy born ex­actly x sec­onds af­ter the big bang)] = 2n/​4n = 1/​2

So … I am some­what con­fused. It’s in­tu­itively ob­vi­ous that hav­ing two boys cre­ates more op­por­tu­nity for spe­cific in­de­pen­dent phe­nom­ena to hap­pen. But, at first blush, my in­tu­ition was firmly sug­gest­ing that I throw­away the ad­di­tional in­for­ma­tion as use­less, and only care­ful think­ing lead me to the (hope­fully) cor­rect an­swer. I also can’t quite think of any prac­ti­cal ex­am­ples for this epistemic er­ror. Your thoughts ap­pre­ci­ated.

Gen­er­al­iz­ing more

Re­peat­ing the same anal­y­sis, but gen­er­al­iz­ing the prob­a­bil­ity of “be­ing a boy” to 1/​k,

BayesFac­tor(2 boys | P(boy)=1/​k, 1Bn) = (2n(k^2) - (k^2))/​(2nk − 1) 

lim{n → +Inf}[P(2 Boys | P(boy)=1/​k, 1Bn)] = 1/​k

Gen­er­al­iz­ing to ran­dom variables

Sup­pose we have two in­de­pen­dent, iden­ti­cally dis­tributed vari­ables X1 and X2, and an­other two i.i.d vari­ables Z1 and Z2. All of these vari­ables are mu­tu­ally in­de­pen­dent. Re­peat­ing the ex­act same calcu­la­tions, we’ll have:

Px := P(X1=x) = P(X2=x)
Pz := P(Z1=z) = P(Z2=z)

BayesFac­tor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = … = (2Pz—Pz^2)/​(2PxPz - (PxPz)^2)
P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = BayesFac­tor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) * P(X1=X2=x) = … = (2Px—PxPz)/​(2 - PxPz)

lim{Pz → 0+}[P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) )] = 2Px/​2 = Px

If we set Pz = 1 (ba­si­cally nuk­ing the Z vari­ables), we’ll have:

P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = P(X1=X2=x | X1=x or X2=x) =  Px/​(2 - Px)

So the in­de­pen­dent in­for­ma­tion pro­vided by the Z vari­ables can, max­i­mally, im­prove the odds by a ra­tio of 2 - Px >= 1.