A couple of weeks ago three European economists published this paper studying the female income penalty after childbirth. The surprising headline result: there is no penalty.
Setting and Methodology
The paper uses Danish data that tracks IVF treatments as well as a bunch of demographic factors and economic outcomes over 25 years. Lundborg et al identify the causal effect of childbirth on female income using the success or failure of the first attempt at IVF as an instrument for fertility.
What does that mean? We can’t just compare women with children to those without them because having children is a choice that’s correlated with all of the outcomes we care about. So sorting out two groups of women based on observed fertility will also sort them based on income and education and marital status etc.
Successfully implanting embryos on the first try in IVF is probably not very correlated with these outcomes. Overall success is, because rich women may have the resources and time to try multiple times, for example, but success on the first try is pretty random. And success on the first try is highly correlated with fertility.
So, if we sort two groups of women based on success on the first try in IVF, we’ll get two groups that differ a lot in fertility, but aren’t selected for on any other traits. Therefore, we can attribute any differences between the groups to their difference in fertility and not any other selection forces.
Results
How do these two groups of women differ?
First of all, women who are successful on the first try with IVF are persistently more likely to have children. This random event causing a large and persistent fertility difference is essential for identifying the causal effect of childbirth.
This graph is plotting the regression coefficients on a series of binary variables which track whether a woman had a successful first-time IVF treatment X years ago. When the IVF treatment is in the future (i.e X is negative), whether or not the woman will have a successful first-time IVF treatment has no bearing on fertility since fertility is always zero; these are all first time mothers.
When the IVF treatment was one year in the past (X = 1), women with a successful first-time treatment are about 80% more likely to have a child that year than women with an unsuccessful first time treatment. This first year coefficient isn’t 1 because some women who fail their first attempt go through multiple IVF attempts in year zero and still have a child in year one. The coefficient falls over time as more women who failed their first IVF attempt eventually succeed and have children in later years, but it plateaus around 30%.
Despite having more children, this group of women do not have persistently lower earnings.
This is the same type of graph as before, it’s plotting the regression coefficients of binary variables that track whether a woman had a successful first-time treatment X years ago, but this time the outcome variable isn’t having a child, it’s earnings.
One year after a the first IVF treatment attempt the successful women earn much less than their unsuccessful counterparts. They are taking time off for pregnancy and receiving lower maternity leave wages (this is in Denmark so everyone gets those). But 10 years after the first IVF attempt the earnings of successful and unsuccessful women are the same, even though the successful women are still ~30% more likely to have a child. 24 years out from the first IVF attempt the successful women are earning more on average than the unsuccessful ones.
Given the average age of women attempting IVF in Denmark of about 32 and a retirement age of 65, these women have 33 years of working life after their IVF attempt. We can’t see their earnings that far out, but if we assume that the differences plateau after 20 years, the lifetime earnings of the first time successful women are about 2% higher (though the confidence interval includes zero).
Comparison to Previous Results
This is a huge change from previous results. You’ve probably seen graphs like this floating around twitter.
This is based off of an “event study” specification from this influential 2019 paper that also using Danish data. Why are the results from the instrumental variables design so different from these previous event studies and which results are more reliable? The 2024 IVF paper replicates these negative and persistent event study effects.
The authors argue for two reasons why their instrumental variables design is a more reliable measure than the event study.
First, the assumptions required for the event study to identify causal effects are stretched when trying to get at long term effects. Event studies compare women of the same age, education status, and profession but where one group of women has their first kid at a later age than the other e.g one at 28 and another at 30. This relies on an assumption that, conditional on all of these characteristics, age at first birth is random. There are very close parallel trends in earnings before first birth which lends some evidence towards this. Two or three years difference can conceivably be semi-randomly assigned e.g by matchmaking, but it is a bit hard to believe that women who have kids 10 or 15 years apart differ in only this respect, even with the 5 years of parallel pre-trends.
The groups of women defined by first-try success in IVF are more believably randomly assigned to having or not having a child, even decades after the treatment. These two groups also have the same coincident pre-trends in earnings that justify the event study.
Second, Lomborg’s 2024 paper finds evidence that women time their births to just before a wage growth plateau. The evidence it gives again comes from IVF failures. Women who were planning to have a birth, but never succeed, have much flatter wage growth after their planned birth year, even though they didn’t actually have any kids. So the divergence between childrearing mothers and non-childbearing mothers shows up even in this placebo case when neither group actually had kids. Therefore, the event study is overstating the earnings impact of childbirth.
This paper is also a bit inconsistent with an extremely similar paper by the same author from 7 years ago. This paper has the same methodology, the same setting, and the same data source, but has fewer years of data. It only tracks earnings to ten years out from the first IVF attempt. The author concludes finding “negative , large, and long-lasting” effects of childbirth on earnings. Quite different than the results in this more recent version. This reversal of results with longer data isn’t mentioned in the 2024 paper. The old version shows negative earnings effects persisting after 10 years while the new one shows the earnings effect at zero after 10 years. Even though both papers cover this period, they don’t match because the later paper has more cohorts ten years out, i.e the old paper only has 10 years of earnings data for women first trying IVF 1996-1999 but the new paper has 10+ years of earnings data for every cohort tracked in the IVF data 1996-2005.
What This Means For Global Fertility Trends
The authors don’t have any replication materials available as far as I can tell, the data probably has privacy protections too. One social science paper with no replication materials is not something you’d want to update on too much. The data and methods seem straightforward and solid. The main results hold up in a specification with no control variables which is good since there’s a lot of degrees of freedom when researchers can pick and choose which controls to include. Still, there could be massive fraud under the hood of this paper and it wouldn’t be that unusual so definitely take these results with a grain of salt.
If the results really are solid, there are also external validity concerns. We’d have solid results showing that childbirth does not have a lifetime earnings penalty for rich, middle-aged, Danish, otherwise infertile women, who chose to enroll in IVF. Denmark provides IVF for free for anyone a doctor says is infertile. Denmark has some of the most generous parental leave policies in the world and a highly gender-equal labor market. Older and otherwise infertile women are more established in their careers, have already completed education, and plan their births. All of these differences and more threaten the generalizability of these results.
If the paper does generalize, even just to rich western women, it would be an important change to existing models of fertility decline. On the one hand it’s good news. It’s further evidence that the opportunity cost of childbirth is not an insurmountable barrier to combining high fertility and high incomes. On the other hand, fertility in Denmark is still very low and falling. If fertility is falling even though mothers don’t have to sacrifice returns from their career, then economics is not the main motivator of that trend. Instead, it’s a deeper cultural trend which is much more difficult to amend with policy.
This is interesting and important research and I hope to see replications and generalizations in the future!
Something that has always seemed a bit weird to me is that it seems like economists normally assume (or seem to assume from a distance) that laborers “live to make money (at work)” rather than that they “work to have enough money (to live)”.
Microeconomically, especially for parents I think this is not true.
You’d naively expect, for most things, that if the price goes down, the supply goes down.
But for the labor of someone with a family, if the price given for their labor goes down in isolation, then they work MORE (hunt for overtime, get a second job, whatever) because they need to make enough to hit their earning goals in order to pay for the thing they need to protect: their family. (Things that really cause them to work more: a kid needs braces. Thing that causes them to work less: a financial windfall.)
Looking at that line, the thing it looks like to me is “the opportunity cost is REAL” but then also, later, the amount of money that had to be earned went up too (because of “another mouth to feed and clothe and provide status goods for and so on”). Maybe?
The mechanistic hypothesis here (that parents work to be able to hit spending targets which must rise as family size goes up) implies a bunch of additional details: (1) the husband’s earnings should be tracked as well and the thing that will most cleanly go up is the sum of their earnings, (2) if a couple randomly has and keeps twins then the sum of the earnings should go up more.
Something I don’t know how to handle is that (here I reach back into fuzzy memories and might be trivially wrong from trivially misremembering) prior to ~1980 having kids caused marriages to be more stable (maybe “staying together for the kids”?), and afterwards it caused marriages to be more likely to end in divorce (maybe “more kids, more financial stress, more divorce”?) and if either of those effects apply (or both, depending on the stress reactions and family values of the couple?) then it would entangle with the data on their combined earnings?
Scanning the paper for whether or how they tracked this lead me to this bit (emphasis not in original), which gave me a small groan and then a cynical chuckle and various secondary thoughts...
(NOTE: this ~falsifies the prediction I made a mere 3 paragraphs ago, but I’m leaving that in, rather than editing it out to hide my small local surprise.)
If I’m looking for a hypothetical framing that isn’t “uncomplimentary towards fathers” then maybe that could be spun as the idea that men are simply ALWAYS “doing their utmost at their careers” (like economists might predict, with a normal labor supply curve) and they don’t have any of that mama bear energy where they have “goals they will satisfice if easy or kill themselves or others to achieve if hard” the way women might when the objective goal is the wellbeing of their kids?
Second order thoughts: I wonder if economists and anthropologists could collaborate here, to get a theory of “family economics” modulo varying cultural expectations?
I’ve heard of lots of anthropological stuff about how men and women in Africa believe that farming certain crops is “for men” or “for women” and then they execute these cultural expectations without any apparent microeconomic sensitivity (although the net upshot is sort of a reasonable portfolio that insures families against droughts).
Also, I’ve heard that on a “calorie in, calorie out” basis in hunter-gatherer cultures, it is the grandmothers who are the huge breadwinners (catch lots of rabbits with traps, and generally forage super efficiently) whereas the men hunt big game (which they and the grandmas know is actually inefficient, if an anthropologist asks this awkward question) so that, when the men (rarely) succeed in a hunt they can throw a big BBQ for the whole band and maybe get some nookie in the party’s aftermath.
It seems like it would be an interesting thing to read a paper about: “how and where the weirdly adaptive foraging and family economic cultures” even COME FROM.
My working model is that it is mostly just “monkey see, monkey do” on local role models, with re-calibration cycle times of roughly 0.5-2 generations. I remember writing a comment about mimetic economic learning in the past… and the search engine says it was for Unconscious Economics :-)
I think they mention in Economics 101 that there are two major exceptions to this: labor and land.
It’s usually said the other way round (if the price goes up, the supply goes up), and then it’s obvious that the supply of land is more or less constant, and the supply of labor of poor people is “as much as they can” and if you pay them too much they become rich and now they can choose to work less and have more free time.
This is maybe a dumb question, but I would have imagined that successful implantation would be related to good health outcomes (based on some intiution that successful implantation represents an organ of your body functioning properly, and imagining that the higher success rates of younger people has to do with their health). Is that not true?
This is what I came to ask about. Randomizing based on health and then finding that the healthier group makes more despite other factors seems like it doesn’t really prove the thing the paper is claiming.
Although the fact that wages matched between the groups beforehand is pretty interesting.
Couldn’t it also be that the women in question plan their career based on the expectation to have children and this is what leads to the plateau? In that case it seems like it would be incorrect to interpret these results as evidence against a child penalty, as it’s merely that the child penalty affects women regardless of whether they have the children. To check, I think you should ask the study participants why their career plateaued then.
Yes, that was my first guess as well. Increased income from employment is most strongly associated with major changes, such as promotion to a new position with changed (and usually increased) responsibilities, or leaving one job and starting work somewhere else that pays more.
It seems plausible that these are not the sorts of changes that women are likely to seek out at the same rate when planning to devote a lot of time in the very near future to being a first-time parent. Some may, but all? Seems unlikely. Men seem more likely to continue to pursue such opportunities at a similar rate due to gender differences in child-rearing roles.