Historical mathematicians exhibit a birth order effect too

[Epistemic sta­tus: pi­lot study. I’m hop­ing that oth­ers will help to ver­ify or falsify my con­clu­sion here. I’ve never done an anal­y­sis of this sort be­fore, and would ap­pre­ci­ate cor­rec­tion of any er­rors.

A pre­vi­ous ver­sion of this post has some minor er­rors in the anal­y­sis, which have since been cor­rected. Most no­tably, de­vi­a­tion from ex­pected rate of first borns was origi­nally noted as 14.98 per­centage points. It is ac­tu­ally 16.65 per­centage points.]

A big thank you to Dan Keys for work­ing through the statis­tics with me.

Fol­low-up to: Fight Me, Psy­chol­o­gists, Birth Order Effects are Real and Very Strong, 2012 Sur­vey Results

Since the late 1800′s, pop psy­chol­ogy has pos­tu­lated that a per­son’s birth or­der (whether one is the first, last, mid­dle, etc. of one’s siblings) has an im­pact on his/​her life­time per­son­al­ity traits. How­ever, rigor­ous large-scale analy­ses have re­li­ably found no sig­nifi­cant effect on sta­ble per­son­al­ity, with some ev­i­dence for a small effect on in­tel­li­gence. (The Wikipe­dia page lists some rele­vant pa­pers on birth or­der effects on per­son­al­ity (1, 2, 3) and on in­tel­li­gence (1, 2, 3).)

So, we were all pretty sur­prised when, around 2012, sur­vey data sug­gested a very strong birth or­der effect amongst those in the broader ra­tio­nal­ity com­mu­nity.

The Less Wrong com­mu­nity is de­mo­graph­i­cally dom­i­nated by first-borns: a startlingly large per­centage of us have only younger siblings. On av­er­age, it looks like there’s about a twenty-two per­centage point differ­ence be­tween the ac­tual rate of first borns and the ex­pected rate, from the 2018 Slate Star Codex Sur­vey data Scott cites in the linked post above. (More speci­fi­cally, the ex­pected rate of first-borns is 39% and the ac­tual oc­cur­rence in the sur­vey data is 62%.) The 2012 Less Wrong sur­vey also found a 22 per­centage point differ­ence. This effect is highly sig­nifi­cant, in­clud­ing af­ter tak­ing into ac­count other de­mo­graphic fac­tors.

A few weeks ago, Scott Garrabrant (one of the re­searchers at MIRI) off-hand­edly won­dered aloud if great math­e­mat­i­ci­ans (who plau­si­bly share some im­por­tant fea­tures with LessWrongers), also ex­hibit this same trend to­wards be­ing first born.

The short an­swer: Yes, they do, as near as I can tell, but not as strongly as LessWrongers.

My data and anal­y­sis is doc­u­mented here.

Methodology

Fol­low­ing Sarah Con­stantin’s fact post method­ol­ogy, I started by tak­ing a list of the 150 great­est math­e­mat­i­ci­ans from here. This is per­haps not the most ac­cu­rate or sci­en­tific rank­ing of his­tor­i­cal math tal­ent, but in prac­tice, there’s enough broad agree­ment about who the big names are, that quib­bles over who should be in­cluded are mostly ir­rele­vant to our pur­pose. If a per­son could plau­si­bly be in­cluded on a list of the great­est 150 math­e­mat­i­ci­ans in his­tory, he/​she was prob­a­bly a pretty good math­e­mat­i­cian.

I then went through the list, and tried to find out how many older and younger siblings each math­e­mat­i­cian had. For the most part this amounted to googling “[math­e­mat­i­cian’s name] siblings” and then trawl­ing through the re­sults to find one that gave me the in­for­ma­tion I wanted. Where pos­si­ble, I noted not just the birth or­der and num­ber of siblings, but also the sex of the siblings and whether they died dur­ing in­fancy. (For the ones for whom I couldn’t get data, I marked the row as “Couldn’t find” or “Un­known”)

Most bi­o­graph­i­cal sources don’t list the num­ber of siblings of the fam­ily of ori­gin. The sources that I ended up rely­ing on the most were:

This was a very quick cur­sory search, so my data is prob­a­bly not su­per re­li­able. At least twice, I found two sources that dis­agreed, and I don’t know how much I would have en­coun­tered con­flict­ing in­for­ma­tion if I had dug deeper into each per­son’s bi­og­ra­phy, in­stead of mov­ing on to the next math­e­mat­i­cian as soon as I found a sen­tence that an­swered my query.

If you hap­pen to per­son­ally know bi­o­graph­i­cal de­tails of elite math­e­mat­i­ci­ans and you can cor­rect any er­rors in these data, I’d be pleased to make those cor­rec­tions.

Results

The sim­plest anal­y­sis is to cat­e­go­rize the data by fam­ily size (all the math­e­mat­i­ci­ans that had no siblings, one sibling, 2 siblings, etc.), count how many first borns there were in each bucket, and com­pare that to the num­ber we would ex­pect by chance.

For nearly ev­ery bucket, the fre­quency of first born chil­dren ex­ceeded ran­dom chance. Across all cat­e­gories, the differ­ence in per­centage points be­tween the ac­tual and ex­pected fre­quen­cies was about 16.5%.

After re­mov­ing the in­di­vi­d­u­als that I couldn’t find data for, we had a sam­ple size of 82. A paired t-test, com­par­ing the num­ber of first-borns with the ex­pected num­ber of first-borns (one data point for each of the 82 math­e­mat­i­ci­ans) was statis­ti­cally sig­nifi­cant, t(81)=3.14, p = 0.00239.

I can show you some bar graphs, like Scott uses in his post, but be­cause this data is of a much smaller sam­ple and the effect isn’t as large, they don’t look as neat. (Also, I don’t know how to in­clude those nice dot­ted lines mark­ing the ex­pected fre­quency.)

Nev­er­the­less, you can see a sys­tem­atic trend: be­ing the first of n siblings is over­rep­re­sented among the math­e­mat­i­ci­ans in the sam­ple I used.

The effect in these data (17 per­centage points) is smaller than the effect in ei­ther the Less Wrong or Slate Star Codex sur­veys (22 per­centage points). The 95% con­fi­dence in­ter­val for the math­e­mat­i­cian data is a range of 6 per­centage points to 27 per­centage points. Given this range, we can’t rule out that the differ­ence in effect sizes is due to noise, but it seems most plau­si­ble that there is a real differ­ence in the size of the un­der­ly­ing effect be­tween the pop­u­la­tions.

A dis­cus­sion of bias in this data

As I say, my data is not very re­li­able, it seems plau­si­ble that some of my sources were faulty, and I was go­ing quickly, so I may have made some er­rors in do­ing data col­lec­tion. Fur­ther­more, I was only able to find data for 82 of the 150 math­e­mat­i­ci­ans.

But in ex­pec­ta­tion, those er­rors will can­cel out, un­less there’s some sys­tem­atic bias in the sources I was us­ing. I can think of at least two causes of bias, but nei­ther one seems like it could be the cause of the ob­served trend.

Higher re­port­ing rate for first born children

First, maybe first borns are recorded more read­ily? If the first born child was the heir to a fam­ily’s prop­erty, then they may have been more likely to be men­tioned in le­gal and other doc­u­ments, so there may be much bet­ter his­tor­i­cal records of first-born chil­dren.

But our sub­jects are all fa­mous math­e­mat­i­ci­ans, in­de­pen­dent from their in­her­i­tance-sta­tus. So, if there was a his­tor­i­cal re­port­ing bias that fa­vored the first born, this would ac­tu­ally push against our ob­served effect. First born mem­bers of our sam­ple would ei­ther be listed as only chil­dren, or noted as hav­ing an un­known num­ber of siblings. Younger-sibling math­e­mat­i­ci­ans, on the other hand, would be noted as younger siblings, be­cause their older brother is added to the his­tor­i­cal record on the ba­sis of their heir­ship.

Un­der­re­port­ing of females

Another way in which the available record of sibling data may be bi­ased, which does not di­rectly af­fect the val­idity of this anal­y­sis, is that women might have gone un­recorded more of­ten than men. The size of this effect tells us some­thing about the ex­tent to which the available record of sibling data is bi­ased.

It was rel­a­tively easy to do a quick check for a re­port­ing bias in fa­vor of male siblings: I just summed all the broth­ers that I found, and all the sisters.

All to­gether, I recorded 110.5 broth­ers and half broth­ers and 100.5 sisters and half sisters. (The point five comes from Jean-Bap­tiste Joseph Fourier’s en­try. I found that he had 3 half siblings by his father’s first mar­riage, but I didn’t know of what sex. So I split the differ­ence by say­ing he had 1.5 half broth­ers and 1.5 half sisters, in ex­pec­ta­tion. I was com­fortable do­ing this be­cause I mostly care about whether siblings are younger or older, and only sec­on­dar­ily about if they are male or fe­male.)

So there are slightly more males listed, at least in the sources I could find. But a differ­ence of 10 out of 211 siblings with recorded sex, isn’t very large. I’m sure there are some statis­tics I could do to show it, but I don’t think that slight bias is suffi­cient to ac­count for our ob­served birth or­der effect.

I’m hop­ing that oth­ers can think of rea­sons why we might see a trend in these data even if the birth or­der effect wasn’t real.

Conclusion

This is a pretty in­trigu­ing re­sult, and I’m sur­prised no one (that I know of) has no­ticed it be­fore now.

I think this post should be thought of as a pi­lot study. I put in about 20 hours to in­ves­ti­gate the hy­poth­e­sis, but only in a quick and cur­sory way. I would be ex­cited for oth­ers, who are bet­ter in­formed and bet­ter-equipped than I am, to do a more in-depth anal­y­sis into these top­ics.

Do math­e­mat­i­ci­ans of lesser renown dis­play this birth or­der effect? What about promi­nent (or av­er­age) in­di­vi­d­u­als from other STEM fields? Non-STEM fields? I’d be in­ter­ested to see an anal­y­sis of the most suc­cess­ful busi­ness ex­ec­u­tives, for in­stance.

Fur­ther­more, more in­ves­ti­ga­tion could un­cover de­tail about how hav­ing older siblings gives rise to this effect.

Some ex­pla­na­tions for this phe­nomenon rest on so­cial in­ter­ac­tion with older siblings in one’s first few years. Others de­pend on biolog­i­cal con­se­quences of spend­ing one’s fe­tal pe­riod in a womb that was pre­vi­ously oc­cu­pied by older siblings. In prin­ci­ple we should be able to tease out which of these mechanisms gen­er­ates the effect by look­ing at much more data that tracks older siblings that died in in­fancy, and older half siblings. (Siblings that died in in­fancy can’t me­di­ate the so­cial effect, while half siblings can me­di­ate a biolog­i­cal effect de­pend­ing on which par­ent is shared, and can me­di­ate a so­cial effect de­pend­ing on whether they were liv­ing in the house­hold at the time of birth.) If some­one found a larger dataset that tracked these fac­tors, we might be able to falsify one or the other of these sto­ries.

And again, please in­form me of any er­rors.