Birth order effect found in Nobel Laureates in Physics

[Epistemic sta­tus: Three differ­ent data sets point­ing to some­thing similar is at least in­ter­est­ing, make your own mind up as to how in­ter­est­ing!]

Fol­low-up to: Fight Me, Psy­chol­o­gists, Birth Order Effects are Real and Very Strong, 2012 Sur­vey Re­sults, His­tor­i­cal math­e­mat­i­ci­ans ex­hibit a birth or­der effect too

In Eli Tyre’s anal­y­sis of birth or­der in his­tor­i­cal math­e­mat­i­ci­ans, he men­tioned analysing other STEM sub­jects for similar effects. In the com­ments I kinda–sorta pre­reg­istered a study into this. Fol­low­ing his com­ments I dropped the age re­quire­ment I men­tioned as it no longer seemed nec­es­sary.

I found that No­bel Lau­re­ates in Physics are more likely to be first­born than would be ex­pected by chance. This effect (10 per­centage points) is smaller than the effect found in the ra­tio­nal­ist com­mu­nity or his­tor­i­cal math­e­mat­i­ci­ans (22 and 16.7 per­centage points re­spec­tively) but is sig­nifi­cant (p=0.044).

More broth­ers were found in the study then sisters (125:92 (58%)). After cor­rect­ing for the cor­rect ex­pected ra­tio (~52%) this was found to not be sig­nifi­cant (p=0.11).

I was un­able to find suffi­cient data on Fields medal, Abel prize and Tur­ing award win­ners.

My data and anal­y­sis is doc­u­mented here. With Eli’s kind per­mis­sion I used his spread­sheet as a tem­plate. I have kept Eli’s data on the same Table – rows 4-153 are his.

Methodology

My meth­ods matched Eli’s closely ex­cept for the data sets I looked at, see his post for more in­for­ma­tion.

Ini­tially I at­tempted to repli­cate Eli’s re­sults in other math­e­mat­i­ci­ans by analysing Fields medal and Abel prize win­ners. Un­for­tu­nately I was un­able to gather suffi­cient ad­di­tional data. This is partly due to crossover in names be­tween these math­e­mat­i­ci­ans and the list from which Eli was work­ing.

It also seems to be the case that less bi­o­graph­i­cal in­for­ma­tion is available for peo­ple born af­ter ~1950. This might be partly due to these peo­ple and their siblings be­ing more likely to be still al­ive so data pro­tec­tion rules pre­vent e.g. geni from list­ing their full de­tails (siblings’ de­tails are of­ten set to “pri­vate”) but there could be other rea­sons. For Fields medals awarded be­fore 1986 I found data on 1230 re­cip­i­ents, af­ter that only 330.

I had a brief look at Tur­ing award win­ners, as this would have seemed a rele­vant field to com­pare to the re­sults from the ra­tio­nal­ist com­mu­nity that in­spired the stud­ies, but came across the same prob­lem.

Fi­nally, I looked at No­bel lau­re­ates in Physics. A mas­sive help in data col­lec­tion here was the fact that since the 1970s No­bel lau­re­ates have been asked to sup­ply an au­to­bi­og­ra­phy, which is pub­lished on the No­bel web­site. Even be­fore then there are bi­ogra­phies of each lau­re­ate al­though these sel­dom men­tion birth or­der.

Between the No­bel site, Wikipe­dia and geni I was able to find use­ful data on 100207 Physics lau­re­ates. The other 107 ei­ther had no siblings or I couldn’t find suffi­cient data on them – ei­ther way they weren’t in­cluded in the anal­y­sis.

As a com­ment on data sources, I found geni to be some­what un­re­li­able. It con­tra­dicted the au­to­bi­ogra­phies or some­times even con­tra­dicted it­self. At other times, the list of siblings was in­com­plete or miss­ing com­pletely.

Results

Cat­e­goris­ing by fam­ily size shows that for all fam­ily sizes with ≥10 data points there are more first­borns than would be ex­pected by chance.

Due to small sam­ple size I have grouped all fam­i­lies of 6+ siblings into a sin­gle bucket and even then n=14. Ex­pected birth or­der then varies with higher birth or­der as there are fewer fam­i­lies in the sam­ple with at least that many chil­dren.

Analysing the data as a whole gives a 10 per­centage point effect (0.2 to 19.8 per­centage points, 95% con­fi­dence). This is less than both the SSC /​ Less Wrong sur­veys and Eli’s his­tor­i­cal math­e­mat­i­ci­ans anal­y­sis (22 and 17 per­centage point re­spec­tively). I haven’t got a num­ber for over­all con­fi­dence level for the SSC data but due to the large data set and very low p quoted for the 2 sibling ex­am­ple, it is un­likely that the 95% con­fi­dence in­ter­val over­laps with this new data, sug­gest­ing that the effect is truly a differ­ent size and not due to chance.

Discussion

Au­to­bi­ogra­phies as source material

Us­ing au­to­bi­ogra­phies as the source for a sig­nifi­cant num­ber of the data points should have helped with the re­li­a­bil­ity of the data. It is pos­si­ble that when writ­ing an au­to­bi­og­ra­phy one would be more likely to men­tion siblings and birth or­der if one was the el­dest but this doesn’t seem likely.

Gen­der imbalance

Eli dis­cussed un­der re­port­ing of fe­males as a po­ten­tial source of bias. How­ever, he found that the broth­ers:sisters ra­tio in his data was not un­rea­son­able.

Run­ning the same anal­y­sis on the physics No­bel lau­re­ate data I get a ra­tio of 125:92 broth­ers:sisters. This makes the siblings 58% male, with p=0.03 (bino­mial dis­tri­bu­tion, two tailed). This effect is ac­tu­ally more sig­nifi­cant than the birth or­der effect.

Look­ing at the SSC data and Eli’s data and found that there were 52% broth­ers in both. I did a lit­tle re­search and found that ac­tu­ally 51-52% is roughly the ex­pected brother:sister ra­tio. I feel like this is some­thing I should have already known but didn’t.

Another effect which might in­crease the pro­por­tion of No­bel lau­re­ates broth­ers is that men can have a dis­po­si­tion to have boys or a dis­po­si­tion to have girls. As al­most all of the lau­re­ates are male it would be rea­son­able to think more of their Dads were pre­dis­posed to hav­ing boys. How­ever as this isn’t seen in SSC or his­tor­i­cal math­e­mat­i­ci­ans data (both also male dom­i­nated) this doesn’t re­ally get us much fur­ther.

Us­ing 52% as the ex­pected ra­tio (in­stead of 50%) means that the 58% re­sult from No­bel lau­re­ates no longer rises to sig­nifi­cance (p=0.11) and should in­stead be la­bel­led as “hey, look at this in­ter­est­ing sub­group anal­y­sis” or pos­si­bly “slightly odd but not im­plau­si­ble”.

As I men­tioned pre­vi­ously, most of the data since the 1970s No­bels is based on au­to­bi­ogra­phies. Look­ing at only data since then, the brother:sister ra­tio is 51:35 (59%). It seems un­likely that No­bel lau­re­ates for­got about some of their sisters, mak­ing it less likely that the gen­der im­bal­ance is due to in­cor­rect data.

One po­ten­tial source of er­ror in the gen­der bal­ance may be in the siblings whose gen­der I was un­able to de­ter­mine. There were 50 of these. Most (41) of these came from fam­i­lies where I had no data ex­cept the num­ber of siblings and the po­si­tion of the lau­re­ate within the fam­ily (e.g. “I was the fourth of five chil­dren.”). It is pos­si­ble that some of the miss­ing sisters are in this cat­e­gory.

How­ever, this would im­ply that if some­one has more broth­ers they are more likely to list the gen­ders of their siblings than if they have more sisters. Per­haps as most of the lau­re­ates were male they might have had more in com­mon with broth­ers and spend more time with them, mak­ing them statis­ti­cally more likely to men­tion their broth­ers’ gen­der. This seems plau­si­ble but un­likely to cause a big effect even if it were true.

For the mo­ment, I am work­ing with the as­sump­tion that the sam­ple is ac­cu­rate and that the gen­der im­bal­ance is just an out­lier. Any other thoughts on causes of bias are wel­come. Th­ese would have to ex­plain how this effect was seen both in data from both geni and the lau­re­ates’ au­to­bi­ogra­phies.

Conclusion

No­bel lau­re­ates in physics ex­hibit a birth or­der effect such that they are 10 per­centage points more likely to be the el­dest child than would be ex­pected (p=0.044). This effect is less than data from both SSC read­ers and his­tor­i­cal math­e­mat­i­ci­ans (22 and 17 per­centage points re­spec­tively).

There was a gen­der im­bal­ance be­tween broth­ers and sisters (58% broth­ers) but, tak­ing into ac­count the ex­pected ra­tio of 52%, this was not sig­nifi­cant (p=0.11). This effect is not seen in SSC read­ers or his­tor­i­cal math­e­mat­i­ci­ans (52% in both)

I would recom­mend that any­one who wishes to col­late ad­di­tional his­tor­i­cal data con­sider No­bel lau­re­ates in other awards due to the availa­bil­ity of ac­cu­rate data from the au­to­bi­ogra­phies. My anal­y­sis took per­haps 12 hours but a lot of that was spent on wild goose chases in look­ing for data on Fields medal and Abel prize re­cip­i­ents. I saved a lot of time by reusing Eli’s spread­sheet (thanks for the per­mis­sion). I would es­ti­mate get­ting data on the en­tire his­tory of an­other No­bel prize cat­e­gory and analysing it would take ~6-8 hours so it shouldn’t be too daunt­ing for some­one to take on.