Sample means, how do they work?
You know how people make public health decisions about food fortification, and medical decisions about taking supplements, based on things like the Recommended Daily Allowance? Well, there’s an article in Nutrients titled A Statistical Error in the Estimation of the Recommended Dietary Allowance for Vitamin D. This paper says the following about the info used to establish the US recommended daily allowance for vitamin D:
The correct interpretation of the lower prediction limit is that 97.5% of study averages are predicted to have values exceeding this limit. This is essentially different from the IOM’s conclusion that 97.5% of individuals will have values exceeding the lower prediction limit.
The whole point of looking at averages is that individuals vary a lot due to a bunch of random stuff, but if you take an average of a lot of individuals, that cancels out most of the noise, so the average varies hardly at all. How much variation there is from individual to individual determines the population variance. How much variation you’d expect in your average due to statistical noise from sample to sample determines what we call the variation of the sample mean.
When you look at frequentist statistical confidence intervals, they are generally expressing how big the ordinary range of variation is for your average. For instance, 90% of the time, your average will not be farther off from the “true” average than it is from the boundaries of your confidence interval. This is relevant for answering questions like, “does this trend look a lot bigger than you’d expect from random chance?” The whole point of looking at large samples is that the errors have a chance to cancel out, leading to a very small random variation in the mean, relative to the variation in the population. This allows us to be confident that even fairly small differences in the mean are unlikely to be due to random noise.
The error here, was taking the statistical properties of the mean, and assuming that they applied to the population. In particular, the IOM looked at the dose-response curve for vitamin D, and came up with a distribution for the average response to vitamin D dosage. Based on their data, if you did another study like theirs on new data, it ought to predict that 600 IU of vitamin D is enough for the average person 97.5% of the time.
They concluded from this that 97.5% of people get enough vitamin D from 600 IU.
This is not an arcane detail. This is confusing the attributes of a population, with the attributes of an average. This is bad. This is real, real bad. In any sane world, this is mathematical statistics 101 stuff. I can imagine that someone who’s heard about a margin of error a lot doesn’t understand this stuff, but anyone who has to actually use the term should understand this.
Political polling is a simple example. Let’s say that a poll shows 48% of Americans voting for the Republican and 52% for the Democrat, with a 5% margin of error. This means that 95% of polls like this one are expected to have an average within 5 percentage points of the true average. This does not mean that 95% of individual Americans have somewhere between a 43% and 53% chance of voting for the Republican. Most of them are almost definitively decided on one candidate, or the other. The average does not behave the same as the population. That’s how fundamental this error is – it’s like saying that all voters are undecided because the population is split.
Remember the famous joke about how the average family has two and a half kids? It’s a joke because no one actually has two and a half kids. That’s how fundamental this error is – it’s like saying that there are people who have an extra half child hopping around. And this error caused actual harm:
The public health and clinical implications of the miscalculated RDA for vitamin D are serious. With the current recommendation of 600 IU, bone health objectives and disease and injury prevention targets will not be met. This became apparent in two studies conducted in Canada where, because of the Northern latitude, cutaneous vitamin D synthesis is limited and where diets contribute an estimated 232 IU of vitamin D per day. One study estimated that despite Vitamin D supplementation with 400 IU or more (including dietary intake that is a total intake of 632 IU or more) 10% of participants had values of less than 50 nmol/L. The second study reported serum 25(OH)D levels of less than 50 nmol/L for 15% of participants who reported supplementation with vitamin D. If the RDA had been adequate, these percentages should not have exceeded 2.5%. Herewith these studies show that the current public health target is not being met.
Actual people probably got hurt because of this. Some likely died.
This is also an example of scientific journals serving their intended purpose of pointing out errors, but it should never have gotten this far. This is a send a coal-burning engine under the control of a drunk engineer into the Taggart tunnel when the ventilation and signals are broken level of negligence. I think of the people using numbers as the reliable ones, but that’s not actually enough – you have to think with them, you have to be trying to get the right answer, you have to understand what the numbers mean.
I can imagine making this mistake in school, when it’s low stakes. I can imagine making this mistake on my blog. I can imagine making this mistake at work if I’m far behind on sleep and on a very tight deadline. But if I were setting public health policy? If I were setting the official RDA? I’d try to make sure I was right. And I’d ask the best quantitative thinkers I know to check my numbers.
The article was published in 2014, and as far as I can tell, as of the publication of this blog post, the RDA is unchanged.
(Cross-posted from my personal blog.)