But if the question is “Has this caused you to revise downward your estimate of the value of health insurance?” the answer has to obviously be yes. Anyone who answers differently is looking deep into their intestinal loops, not the Oregon study. You don’t have to revise the estimate to zero, or even a low number. But if you’d asked folks before the results dropped what we’d expect to see if insurance made people a lot healthier, they’d have said “statistically significant improvement on basic markers for the most common chronic diseases. The fact that we didn’t see that means that we should now say that health insurance, or at least Medicaid, probably doesn’t make as big a difference in health as we thought.
-- Megan McArdle, trying to explain Bayesian updates and the importance of making predictions in advance, without referring to any mathematics.
This may be true, but McArdle’s point is precisely that this was not said before the study came out. At that time, people confidently expected that health insurance would, in fact, improve health outcomes. Your argument is one that was only made after the result was known; this is a classic failure mode.
(nods) Yup. Of course, McArdle’s claims about what people would have said before the study, if asked, are also only being made after the results are known, which as you say is a classic failure mode.
Of course, McArdle is neither passing laws nor doing research, just writing articles, so the cost of failure is low. And it’s kind of nice to see someone in the mainstream (sorta) press making the point that surprising observations should change our confidence in our beliefs, which people surprisingly often overlook.
Anyway, the quality of McArdle’s analysis notwithstanding, one place this sort of reasoning seems to lead us is to the idea that when passing a law, we ought to say something about what we anticipate the results of passing that law to be, and have a convention of repealing laws that don’t actually accomplish the thing that we said we were passing the law in order to accomplish.
Which in principle I would be all in favor of, except for the obvious failure mode that if I personally don’t want us to accomplish that, I am now given an incentive to manipulate the system in other ways to lower whatever metrics we said we were going to measure. (Note: I am not claiming here that any such thing happened in the Oregon study.)
That said, even taking that failure mode into account, it might still be preferable to passing laws with unarticulated expected benefits and keeping them on the books despite those benefits never materializing.
Of course, McArdle’s claims about what people would have said before the study, if asked, are also only being made after the results are known, which as you say is a classic failure mode.
I don’t think that’s true; if you read her original article on the subject, linked in the one I link, she quotes statistics like this:
Most of you probably have probably heard the statistic that being uninsured kills 18,000 people a year. Or maybe it’s 27,000. Those figures come from an Institute of Medicine report (later updated by the Urban Institute) that was drawn from [nonrandom observational] studies.
I took a keen interest when, at the fervid climax of the health-care debate in mid-December, a Washington Post blogger, Ezra Klein, declared that Senator Joseph Lieberman, by refusing to vote for a bill with a public option, was apparently “willing to cause the deaths of hundreds of thousands” of uninsured people in order to punish the progressives who had opposed his reelection in 2006. In the ensuing blogstorm, conservatives condemned Klein’s “venomous smear,” while liberals solemnly debated the circumstances under which one may properly accuse one’s opponents of mass murder.
Fair enough. I only read the article you linked, not the additional source material; I’m prepared to believe given additional evidence like what you cite here that her analysis is… er… can one say “pre-hoc”?
[W]hen passing a law, we ought to say something about what we anticipate the results of passing that law to be, and have a convention of repealing laws that don’t actually accomplish the thing that we said we were passing the law in order to accomplish.
There would have to be a two sided test. A tort of ineffectiveness by which the plaintiff seeks relief from a law that fails to achieve the goals laid out for it. A tort of under-ambition by which the plaintiff seeks relief from a law that is immune from the tort of ineffectiveness because the formally specified goals are feeble.
Think about the American experience with courts voiding laws that are unconstitutional. This often ends up with the courts applying balancing tests. It can end up with the court ruling that yes, the law infringes your rights, but only a little. And the law serves a valid purpose, which is very important. So the law is allowed to stand.
These kinds of cases are decided in prospect. The decision is reached on the speculation about the actual effects of the law. It might help if constitutional challenges to legislation could be re-litigated, perhaps after the first ten years. The second hearing could then be decided retrospectively, looking back at ten years experience, and balancing the actual burden on the plaintiffs rights against the actual public benefit of the law.
Where though is the goal post? In practice it moves. In the prospective hearing the government will make grand promises about the huge benefits the law will bring. In the retrospective hearing the government will sail on the opposite tack, arguing that only very modest benefits suffice to justify the law.
It would be good it the goal posts are fixed. Right from the start the law states the goals against which it will be assessed in ten years time. Certainly there needs to be a tort of ineffectiveness, active against laws that do not meet their goals. But politicians would soon learn to game the system by writing very modest goals into law. That needs to be blocked with a tort of under-ambition which ensures that the initial constitutionality of the law is judged only admitting in prospect those benefits that can be litigated in retrospect.
The goal posts should definitely be fixed! And maybe some politicians would want to pass a law that benefits him and his friends in some way, even though it only has a small effect, so there ought to be some kind of safeguard against that, too. But the main problem I can see is anti-synergy. Suppose a law is adopted that totally would have worked, were it not for some other law that was introduced a little later? Should the first one be repealed, or the second one? But maybe the second one does accomplish its goal, and repealing the first one would have negative effects, now that the second one is in place… And with so many laws interacting, how can you even tell which ones have which effects, unless the effects are very large indeed? (Of course, this is a problem in the current system too. I’m glad I’m not a politician; I’d be paralyzed with fear of unintended consequences.)
This is a perspective similar to DanielLC’s point. Additionally, a commenter there makes the parallel point that we don’t really know whether private insurance improves the outcome measures.
Your argument is one that was only made after the result was known; this is a classic failure mode.
True, but we shouldn’t overstate the argument. The p-values were not low enough to count as “statistically significant,” but the direction of change was towards improved health outcomes. One is doing something wrong with this evidence if one updates against improved health outcomes for public health insurance for the poor (i.e. Medicaid).
One is doing something wrong with this evidence if one updates against improved health outcomes for public health insurance for the poor (i.e. Medicaid).
Updates always move you towards what you just saw, and so if your estimate was above what you just saw, you update down. If you only consider the hypotheses that Medicaid “improves,” “has no effect,” or “harms,” then this is weak evidence for “improves” (and “has no effect”). But a more sophisticated set of hypotheses is the quantitative effect of Medicaid; if one estimated beforehand that Medicaid doubled lifespans (to use an exaggerated example), they should revise their estimate downward after seeing this study.
Fair enough. I should have said “McArdle and her political allies are making a mistake by not updating towards ‘Medicaid improves health outcomes,’” given my perception of their priors.
It does help you to pay for (say) blood-pressure medication. This might be expected to result in more people with medical aid and blood-pressure problems taking their medication.
It also helps to pay for doctors. This leads to more people going to the doctor with minor complaints, and increased chances of catching something serious earlier.
Er, yes, fine, but… to the extent that the study shows anything, it shows that the positive results of these effects, if they exist, are consistent with zero. Can we please discuss the data, now that we have some, and not theory?
This annoys me because she doesn’t talk at all about the power of the study. Usually, when you see statistically insignificant positive changes across the board in a study without much power, its a suggestion you should hesitantly update a very tiny bit in the positive direction, AND you need another study, not a suggestion you should update downward.
When ethics prevent us from constructing high power statistical studies, we need to be a bit careful not to reify statistical significance.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn’t even matter that it is positive. An analogy: Suppose I tell you that eating garlic daily increases your IQ, and point to a study with three million participants and P < 1e-7. Vastly significant, no? Now it turns out that the actual size of the effect is 0.01 points of IQ. Are you going to start eating garlic? What if it weren’t garlic, but a several-billion-dollar government health program? Statistical significance is indeed not everything, but there’s such a thing as considering the size of an effect, especially if there’s a cost involved.
Moreover, please consider that “consistent with zero” means exactly that. If you throw a die ten times and it comes up heads six, do you “hesitantly update a very tiny bit” in the direction of the coin being biased? Would you do so, if you did not have a prior reason to hope that the coin was biased?
I respectfully suggest that you are letting your already-written bottom line interfere with your math.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn’t even matter that it is positive.
I strongly disagree.
An old comment of mine gives us a counterexample. A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults. This is very much a practically significant effect, and it’s probably an underestimate for reasons I’ll omit for brevity — look at the paper if you’re curious.
If you do look at the paper, notice figure 1, which summarizes the results of the 8 individual RCTs the meta-analysis used. Even though all of the RCTs had sample sizes in the thousands, 7 of them failed to show a statistically significant effect, including the 4 largest (sample sizes 5139, 5085, 3711 & 3310). The effect is therefore “so small that a sample of several thousand is not sufficient to reliably observe it”, but we would be absolutely wrong to infer that “it doesn’t even matter that it is positive”!
The heuristic that a hard-to-detect effect is probably too small to care about is a fair rule of thumb, but it’s only a heuristic. EHeller & Unnamed are quite right to point out that statistical significance and practical significance correlate only imperfectly.
Does vitamin D reduce all-cause mortality in the elderly? The point-estimates from pretty much all of the various studies are around a 5% reduction in risk of dying for any reason—pretty nontrivial, one would say, no? Yet the results are almost all not ‘statistically significant’! So do we follow Rolf and say ‘fans of vitamin D ought to update on vitamin D not helping overall’… or do we, applying power considerations about the likelihood of making the hard cutoffs at p<0.05 given the small sample sizes & plausible effect sizes, note that the point-estimates are in favor of the hypothesis? (And how does this interact with two-sided tests—vitamin D could’ve increased mortality, after all. Positive point-estimates are consistent with vitamin D helping, and less consistent with no effect, and even less consistent with it harming; so why are we supposed to update in favor of no help or harm when we see a positive point-estimate?)
If we accept Rolf’s argument, then we’d be in the odd position of, as we read through one non-statistically-significant study after another, decreasing the probability of ‘non-zero reduction in mortality’… right up until we get the Autier or Cochrane data summarizing the exact same studies & plug it into a Bayesian meta-analysis like Salvatier did & abruptly flip to ’92% chance of non-zero reduction in mortality’.
A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults.
That’s a curious metric to choose. By that standard taking aspirin is about as healthy as playing a round of Russian Roulette.
It’s a fairly natural metric to choose if one wishes to gauge aspirin’s effect on cancer risk, as the study’s authors did.
By that standard taking aspirin is about as healthy as playing a round of Russian Roulette.
Fortunately, the study’s authors and I also interpreted the data by another standard. Daily aspirin reduced all-cause mortality, and didn’t increase non-cancer deaths (except for “a transient increase in risk of vascular death in the aspirin groups during the first year after completion of the trials”). These are not results we would see if aspirin effected its anti-cancer magic by a similar mechanism to Russian Roulette.
It’s a fairly natural metric to choose if one wishes to gauge aspirin’s effect on cancer risk, as the study’s authors did.
Pardon me. Mentioning only curiosity was politeness. The more significant meanings I would supplement with are ‘naive or suspicious’. By itself that metric really is worthless and reading this kind of health claim should set off warning bells. Lost purposes are a big problem when it comes to medicine. Partly because it is hard, mostly because there is more money in the area than nearly anywhere else.
Fortunately, the study’s authors and I also interpreted the data by another standard. Daily aspirin reduced all-cause mortality, and didn’t increase non-cancer deaths (except for “a transient increase in risk of vascular death in the aspirin groups during the first year after completion of the trials”).
And this is the reason low dose asprin is part of my daily supplement regime (while statins are not).
And this is the reason low dose asprin is part of my daily supplement regime (while statins are not).
I recently stopped with the low dose aspirin, the bleeding when I accidentally cut myself has proven to be too much of an inconvenience. For the time being, at least.
I’d assume they mean something like the per-year risk of dying from cancer conditional on previous survival—if they indeed mean the total lifetime risk of dying from cancer I agree it’s ridiculous.
Yeah, pretty much. There are other examples of this where something harmful appears to be helpful when you don’t take into account possible selection biases (like being put into the ‘non-cancer death’ category); for example, this is an issue in smoking—you can find various correlations where smokers are healthier than non-smokers, but this is just because the unhealthier smokers got pushed over the edge by smoking and died earlier.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn’t even matter that it is positive.
Have you read the study in question? The treatment sample is NOT several thousand, its about 1500. Further, the incidence of the diseases being looked at are only a few percent or less, so the treatment sample sizes for the most prevalent diseases are around 50 (also, if you look at the specifics of the sample, the diseased groups are pretty well controlled).
I suggest the following exercise- ask yourself what WOULD be a big effect, and then work through if the study has the power to see it.
Moreover, please consider that “consistent with zero” means exactly that.
Yes, but in this case, the sample sizes are small and the error bars are so large that consistent with zero is ALSO consistent with 25+ % reduction in incidence (which is a large intervention). The study is incapable from distinguishing hugely important effect from 0 effect, so we shouldn’t update much at all, which is why I wished Mcardle had talked about statistical power. Before we ask “how should we update”, we should ask “what information is actually here?”
Edit: If we treat this as an exploration, it says “we need another study”- after all the effects could be as large as 40%! Thats a potentially tremendous intervention. Unfortunately, its unethical to randomly boot people off of insurance so we’ll likely never see that study done.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn’t even matter that it is positive. [...] Statistical significance is indeed not everything, but there’s such a thing as considering the size of an effect, especially if there’s a cost involved.
Health is extremely important—the statistical value of a human life is something like $8 million—so smallish looking effects can be practically relevant. An intervention that saves 1 life out of every 10,000 people treated has an average benefit of $800 per person. In this Oregon study, people who received Medicaid cost an extra $1,172 per year in total health spending, so the intervention would need to save 1.5 lives per 10,000 person-years (or provide an equivalent benefit in other health improvements) for the health benefits to balance out the health costs. The study looked at fewer than 10,000 people over 2 years, so the cost-benefit cutoff for whether it’s worth it is less than 3 lives saved (or equivalent).
So “not statistically significant” does not imply unimportant, even with a sample size of several thousand. An effect at the cost-benefit threshold is unlikely to show up in significant changes to mortality rates. The intermediate health measures in this study are more sensitive to changes than mortality rate, but were they sensitive enough? Has anyone run the numbers on how sensitive they’d need to be in order to find an effect of this size? The point estimates that they did report are (relative to control group) an 8% reduction in number of people with elevated blood pressure, 17% reduction in number of people with high cholesterol, and 18% reduction in number of people with high glycated hemoglobin levels (a marker of diabetes), which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.
which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.
This would be much more convincing if you reported the costs along with the benefits, so that one could form some kind of estimate of what you’re willing to pay for this. But, again, I think your argument is motivated. “Consistent with zero” means just that; it means that the study cannot exclude the possibility that the intervention was actively harmful, but they had a random fluctuation in the data.
I get the impression that people here talk a good game about statistics, but haven’t really internalised the concept of error bars. I suggest that you have another look at why physics requires five sigma. There are really good reasons for that, you know; all the more so in a mindkilling-charged field.
I was responding to the suggestion that, even if the effects that they found are real, they are too small to matter. To me, that line of reasoning is a cue to do a Fermi estimate to get a quantitative sense of how big the effect would need to be in order to matter, and how that compares to the empirical results.
I didn’t get into a full-fledged Fermi estimate here (translating the measures that they used into the dollar value of the health benefits), which is hard to do that when they only collected data on a few intermediate health measures. (If anyone else has given it a shot, I’d like to take a look.) I did find a couple effect-size-related numbers for which I feel like I have some intuitive sense of their size, and they suggest that that line of reasoning does not go through. Effects that are big enough to matter relative to the costs of additional health spending (like 3 lives saved in their sample, or some equivalent benefit) seem small enough to avoid statistical significance, and the point estimates that they found which are not statistically significant (8-18% reductions in various metrics) seem large enough to matter.
My overall conclusion about the (based on what I know about it so far) study is that it provides little information for updating in any direction, because of those wide error bars. The results are consistent with Medicaid having no effect, they’re consistent with Medicaid having a modest health benefit (e.g., 10% reduction in a few bad things), they’re consistent with Medicaid being actively harmful, and they’re consistent with Medicaid having a large benefit (e.g. 40% reduction in many bad things). The likelihood ratios that the data provide for distinguishing between those alternatives are fairly close to one, with “modest health benefit” slightly favored over the more extreme alternatives.
Again, the original point McArdle is making is that “consistent with zero” is just completely not what the proponents expected beforehand, and they should update accordingly. See my discussion with TheOtherDave, below. A small effect may, indeed, be worth pursuing. But here we have a case where something fairly costly was done after much disagreement, and the proponents claimed that there would be a large effect. In that case, if you find a small effect, you ought not to say “Well, it’s still worth doing”; that’s not what you said before. It was claimed that there would be a large effect, and the program was passed on this basis. It is then dishonest to turn around and say “Ok, the effect is small but still worthwhile”. This ignores the inertia of political programs.
Most Medicaid proponents did not have expectations about the statistical results of this particular study. They did not make predictions about confidence intervals and p values for these particular analyses. Rather, they had expectations about the actual benefit of Medicaid.
You cite Ezra Klein as someone who expected that Medicaid would drastically reduce mortality; Klein was drawing his numbers from a report which estimated that in the US “137,000 people died from 2000 through 2006 because they lacked health insurance, including 22,000 people in 2006.” There were 47 million uninsured Americans in 2006, so those 22,000 excess deaths translate into 4.7 excess deaths per 10,000 uninsured people each year. So that’s the size of the drastic reduction in mortality that you’re referring to: 4.7 lives per 10,000 people each year. (For comparison, in my other comment I estimated that the Medicaid expansion would be worth its estimated cost if it saved at least 1.5 lives per 10,000 people each year or provided an equivalent benefit.)
Did the study rule out an effect as large as this drastic reduction of 4.7 per 10,000? As far as I can tell it did not (I’d like to see a more technical analysis of this). There were under 10,000 people in the study, so I wouldn’t be surprised if they missed effects of that size. Their point estimates, of an 8-18% reduction in various bad things, intuitively seem like they could be consistent with an effect that size. And the upper bounds of their confidence intervals (a 40%+ reduction in each of the 3 bad things) intuitively seem consistent with a much larger effect. So if people like Klein and Drum had made predictions in advance about the effect size of the Oregon intervention, I suspect that their predictions would have fallen within the study’s confidence interval.
There are presumably some people who did expect the results of the study to be statistically significant (otherwise, why run the study?), and they were wrong. But this isn’t a competition between opponents and proponents where every slipup by one side cedes territory to the other side. The data and results are there for us to look at, so we can update based on what the study actually found instead of on which side of the conflict fought better in this battle. In this case, it looks like the correct update based on the study (for most people, to a first approximation) is to not update at all. The confidence interval for the effects that they examined covers the full range of results that seemed plausible beforehand (including the no-effect-whatsoever hypothesis and the tens-of-thousands-of-lives-each-year hypothesis), so the study provides little information for updating one’s priors about the effectiveness of Medicaid.
For the people who did make the erroneous prediction that the study would find statistically significant results, why did they get it wrong? I’m not sure. A few possibilities: 1) they didn’t do an analysis of the study’s statistical power (or used some crude & mistaken heuristic to estimate power), 2) they overestimated how large a health benefit Medicaid would produce, 3) the control group in Oregon turned out to be healthier than they expected which left less room for Medicaid to show benefits, 4) fewer members of the experimental group than they expected ended up actually receiving Medicaid, which reduced the actual sample size and also added noise to the intent-to-treat analysis (reducing the effective sample size).
I do want to point out that, while I agree with your general points, I think that unless the proponents put numerical estimates up beforehand, it’s not quite fair to assume they meant “it will be statistically significant in a sample size of N at least 95% of the time.” Even if they said that, unless they explicitly calculated N, they probably underestimated it by at least one order of magnitude. (Professional researchers in social science make this mistake very frequently, and even when they avoid it, they can only very rarely find funding to actually collect N samples.)
I haven’t looked into this study in depth, so semi-related anecdote time: there was recently a study of calorie restriction in monkeys which had ~70 monkeys. The confidence interval for the hazard ratio included 1 (no effect), and so they concluded no statistically significant benefit to CR on mortality, though they could declare statistically significant benefit on a few varieties of mortality and several health proxies.
I ran the numbers to determine the power; turns out that they couldn’t have reliably noticed the effects of smoking (hazard ratio ~2) on longevity with a study of ~70 monkeys, and while I haven’t seen many quoted estimates of the hazard ratio of eating normally compared to CR, I don’t think there are many people that put them higher than 2.
When you don’t have the power to reliably conclude that all-cause mortality decreased, you can eke out some extra information by looking at the signs of all the proxies you measured. If insurance does nothing, we should expect to see the effect estimates scattered around 0. If insurance has a positive effect, we should expect to see more effect estimates above 0 than below 0, even though most will include 0 in their CI. (Suppose they measure 30 mortality proxies, and all of them show a positive effect, though the univariate CI includes 0 for all of them. If the ground truth was no effect on mortality proxies, that’s a very unlikely result to see; if the ground truth was a positive effect on mortality proxies, that’s a likely result to see.)
I ran the numbers to determine the power; turns out that they couldn’t have reliably noticed the effects of smoking (hazard ratio ~2) on longevity with a study of ~70 monkeys, and while I haven’t seen many quoted estimates of the hazard ratio of eating normally compared to CR, I don’t think there are many people that put them higher than 2.
If I remember correctly, I noticed an effect that did give a p of slightly less than .05 was a hazard ratio of 3, which made me think of running that test, and then I think spower was the r function that I used to figure out what p they could get for a hazard ratio of 2 and 35 experimentals and 35 controls (or whatever the actual split was- I think it was slightly different?).
So you were using Hmisc::spower… I’m surprised that there was even such a function (however obtusely named) - why on earth isn’t it in the survival library?
I was going to try to replicate that estimate, but looking at the spower documentation, it’s pretty complex and I don’t think I could do it without the original paper (which is more work than I want to do).
It is of course very difficult to extract any precise numbers from a political discussion. :) However, if you click through some of the links in the article, or have a look at the followup from today, you’ll find McArdle quoting predictions of tens of thousands of preventable deaths yearly from non-insured status. That looks to me like a pretty big hazard rate, no?
you’ll find McArdle quoting predictions of tens of thousands of preventable deaths yearly from non-insured status. That looks to me like a pretty big hazard rate, no?
No. The Oracle says there’re about 50 million Americans without health insurance. The predictions you quoted refer to 18,000 or 27,000 deaths for want of insurance per year. The higher number implies only a 0.054% death rate per year, or a 3.5% death rate over 65 years (Americans over 65 automatically get insurance). This is non-negligible but hardly huge (and potentially important for all that).
The higher number implies only a 0.054% death rate per year
Eyeballing the statistics, that looks like a hazard ratio between 1.1 and 1.5 (lots of things are good predictors for mortality that you would want to control for that I haven’t; the more you add, the closer that number should get to 1.1).
If you throw a die ten times and it comes up heads six, do you “hesitantly update a very tiny bit” in the direction of the coin being biased?
If I throw a die once and it comes up heads I’m going to be confused. Now, assuming you meant “toss a coin and it comes up heads six times out of ten”.
What is your intended ‘correct’ answer to the question? I think I would indeed hesitantly update a very (very) tiny bit in the direction of the coin being biased but different priors regarding the possibility of the coin being biased in various ways and degrees could easily make the update be towards not-biased. I’d significantly lower p(the coin is biased by having two heads) but very slightly raise p(the coin is slightly heavier on the tails side), etc.
My intended correct answer is that, on this data, you technically can adjust your belief very slightly; but because the prior for a biased coin is so tiny, the update is not worth doing. The calculation cost way exceeds any benefit you can get from gruel this thin. I would say “Null hypothesis [ie unbiased coin] not disconfirmed; move along, nothing to see here”. And if you had a political reason for wishing the coin to be biased towards heads, then you should definitely not make any such update; because you certainly wouldn’t have done so, if tails had come up six times. In that case it would immediately have been “P-level is in the double digits” and “no statistical significance means exactly that” and “with those errors we’re still consistent with a heads bias”.
My intended correct answer is that, on this data, you technically can adjust your belief very slightly; but because the prior for a biased coin is so tiny, the update is not worth do
I would think that our prior for “health care improves health” should be quite a bit larger than the prior for a coin to be biased.
Hanson’s point is that we often over-treat to show we care- not that 0 health care is optimal. Medicaid patients don’t really have to worry about overtreatment.
Hanson’s point is that we often over-treat to show we care- not that 0 health care is optimal
I was interpreting “health care improves health” as “healthcare improves health on the margin.” Is this not what was meant?
Medicaid patients don’t really have to worry about overtreatment.
As someone who has a start-up in the healthcare industry, this runs counter to my personal experience. Also, currently “medicaid overtreatment” is showing about 676,000 results on Google (while “medicaid undertreatment” is showing about 1,240,000 results). Even if it isn’t typical, it surely isn’t an unheard-of phenomenon.
I was interpreting “health care improves health” as “healthcare improves health on the margin.” Is this not what was meant?
No, I meant going from 0 access to care to some access to care improves health, as we are discussing the medicaid study comparing people on medicaid to the uninsured.
As someone who has a start-up in the healthcare industry, this runs counter to my personal experience.
I currently work as a statistician for a large HMO, and I can tell you for us, medicaid patients generally get the ‘patch-you-up-and-out-the-door’ treatment because odds are high we won’t be getting reimbursed in any kind of timely fashion. I’ve worked in a few states, and it seems pretty common for medicaid to be fairly underfunded (hence the Oregon study we are discussing).
And generally, providing medicaid is moving someone from emergency-only to some-primary-care, which is where we should expect some impact- this isn’t increasing treatment on the margin, its providing minimal care to a largely untreated population.
Currently, “medicaid overtreatment” is showing about 676,000 results on Google
So I randomly sampled ~5 in the first two pages, and 3 of those were articles about overtreatment that had a sidebar to a different article discussing some aspect of medicaid, so I’m not sure if the count is meaningful here. (The other 2 were about some loophole dentists were using to overtreat children on medicaid and bill extra, I have no knowledge of dental claims).
No, I meant going from 0 access to care to some access to care improves health, as we are discussing the medicaid study comparing people on medicaid to the uninsured.
This does not appear to be the actual change in access to care when going from being uninsured to on medicaid. As you mention, uninsured patients receive emergency-only care.
Such a study might show that it doesn’t matter on average. But you’d need those numbers to see if it’s increasing the spread of values. That would mean that it really helps some and hurts others. If you can figure out which is which, then it’ll end up being useful. Heck, this applies even if the average effect is negative.
I don’t know how often bio-researchers treat the standard deviation as part of their signal. I suspect it’s infrequent.
How large was your prior for “insurance helps some and harms others, and we should try to figure out which is which” before that was one possible way of rescuing insurance from this study? That sort of argument is, I respectfully suggest, a warning signal which should make you consider whether your bottom line is already written.
I wasn’t even thinking of insurance here. You were talking about garlic. I was thinking about my physics experiments where the standard deviation is a very useful channel of information.
In fact, the study showed fairly substantial improvements in the percentage of patients with depression, high blood pressure, high cholesterol, and high glycated hemoglobin levels (a marker of diabetes). The problem is that the sample size of the study was fairly small, so the results weren’t statistically significant at the 95 percent level.
From a Bayesian perspective, the Oregon results should slightly increase our belief that access to Medicaid produces positive results for diabetes, cholesterol levels, and blood pressure maintenance. It shouldn’t increase our belief much, but if you toss the positive point estimates into the stew of everything we already know, they add slightly to our prior belief that Medicaid is effective.
If this were the only medical study in all of history, then yes, a non-significant result should cause you to update as your quote says. In a world with thousands of studies yearly, you cannot do any such thing, because you’re sure to bias yourself by paying attention to the slightly-positive results you like, and ignore the slightly-negative ones you dislike. (That’s aside from the well-known publication bias where positive results are reported and negative ones aren’t.) If the study had come out with a non-significant negative effect, would comrade Drum have been updating slightly in the direction of “Medicaid is bad”? Hah. This is why we impose the 95% confidence cutoff, which actually is way too low, but that’s another discussion. It prevents us from seeing, or worse, creating, patterns in the noise, which humans are really good at.
The significance cutoff is not a technique of rationality, it is a technique of science, like blinding your results while you’re studying the systematics. It’s something we do because we run on untrusted hardware. Please do not relax your safeguards if a noisy result happens to agree with your opinions! That’s what the safeguards are for!
Then also, please note that Kevin Drum’s prior was not actually “Medicaid will slightly improve these three markers”, it was “Medicaid will drastically reduce mortality”. (See links in discussion with TheOtherDave, below). If you switch your priors around as convenient for claiming support from studies, then of course no study can possibly cause you to update downwards. I would gently suggest that this is not a good epistemic state to occupy.
-- Megan McArdle, trying to explain Bayesian updates and the importance of making predictions in advance, without referring to any mathematics.
The value of health insurance isn’t that it keeps you from getting sick. It’s that it keeps you from getting in debt when you do get sick.
This may be true, but McArdle’s point is precisely that this was not said before the study came out. At that time, people confidently expected that health insurance would, in fact, improve health outcomes. Your argument is one that was only made after the result was known; this is a classic failure mode.
(nods) Yup. Of course, McArdle’s claims about what people would have said before the study, if asked, are also only being made after the results are known, which as you say is a classic failure mode.
Of course, McArdle is neither passing laws nor doing research, just writing articles, so the cost of failure is low. And it’s kind of nice to see someone in the mainstream (sorta) press making the point that surprising observations should change our confidence in our beliefs, which people surprisingly often overlook.
Anyway, the quality of McArdle’s analysis notwithstanding, one place this sort of reasoning seems to lead us is to the idea that when passing a law, we ought to say something about what we anticipate the results of passing that law to be, and have a convention of repealing laws that don’t actually accomplish the thing that we said we were passing the law in order to accomplish.
Which in principle I would be all in favor of, except for the obvious failure mode that if I personally don’t want us to accomplish that, I am now given an incentive to manipulate the system in other ways to lower whatever metrics we said we were going to measure. (Note: I am not claiming here that any such thing happened in the Oregon study.)
That said, even taking that failure mode into account, it might still be preferable to passing laws with unarticulated expected benefits and keeping them on the books despite those benefits never materializing.
I don’t think that’s true; if you read her original article on the subject, linked in the one I link, she quotes statistics like this:
And back in 2010, she said
I don’t think her statement is entirely post-hoc.
Fair enough. I only read the article you linked, not the additional source material; I’m prepared to believe given additional evidence like what you cite here that her analysis is… er… can one say “pre-hoc”?
Ante hoc.
Well, if not, one ought to be able to. I hereby grant you permission! :)
I love this idea!
There would have to be a two sided test. A tort of ineffectiveness by which the plaintiff seeks relief from a law that fails to achieve the goals laid out for it. A tort of under-ambition by which the plaintiff seeks relief from a law that is immune from the tort of ineffectiveness because the formally specified goals are feeble.
Think about the American experience with courts voiding laws that are unconstitutional. This often ends up with the courts applying balancing tests. It can end up with the court ruling that yes, the law infringes your rights, but only a little. And the law serves a valid purpose, which is very important. So the law is allowed to stand.
These kinds of cases are decided in prospect. The decision is reached on the speculation about the actual effects of the law. It might help if constitutional challenges to legislation could be re-litigated, perhaps after the first ten years. The second hearing could then be decided retrospectively, looking back at ten years experience, and balancing the actual burden on the plaintiffs rights against the actual public benefit of the law.
Where though is the goal post? In practice it moves. In the prospective hearing the government will make grand promises about the huge benefits the law will bring. In the retrospective hearing the government will sail on the opposite tack, arguing that only very modest benefits suffice to justify the law.
It would be good it the goal posts are fixed. Right from the start the law states the goals against which it will be assessed in ten years time. Certainly there needs to be a tort of ineffectiveness, active against laws that do not meet their goals. But politicians would soon learn to game the system by writing very modest goals into law. That needs to be blocked with a tort of under-ambition which ensures that the initial constitutionality of the law is judged only admitting in prospect those benefits that can be litigated in retrospect.
The goal posts should definitely be fixed! And maybe some politicians would want to pass a law that benefits him and his friends in some way, even though it only has a small effect, so there ought to be some kind of safeguard against that, too. But the main problem I can see is anti-synergy. Suppose a law is adopted that totally would have worked, were it not for some other law that was introduced a little later? Should the first one be repealed, or the second one? But maybe the second one does accomplish its goal, and repealing the first one would have negative effects, now that the second one is in place… And with so many laws interacting, how can you even tell which ones have which effects, unless the effects are very large indeed? (Of course, this is a problem in the current system too. I’m glad I’m not a politician; I’d be paralyzed with fear of unintended consequences.)
Good point! I’ve totally failed to think about multiple laws interacting.
This is a perspective similar to DanielLC’s point. Additionally, a commenter there makes the parallel point that we don’t really know whether private insurance improves the outcome measures.
True, but we shouldn’t overstate the argument. The p-values were not low enough to count as “statistically significant,” but the direction of change was towards improved health outcomes. One is doing something wrong with this evidence if one updates against improved health outcomes for public health insurance for the poor (i.e. Medicaid).
Updates always move you towards what you just saw, and so if your estimate was above what you just saw, you update down. If you only consider the hypotheses that Medicaid “improves,” “has no effect,” or “harms,” then this is weak evidence for “improves” (and “has no effect”). But a more sophisticated set of hypotheses is the quantitative effect of Medicaid; if one estimated beforehand that Medicaid doubled lifespans (to use an exaggerated example), they should revise their estimate downward after seeing this study.
Fair enough. I should have said “McArdle and her political allies are making a mistake by not updating towards ‘Medicaid improves health outcomes,’” given my perception of their priors.
That’s why McArdle recommended getting only catastrophic coverage.
It does help you to pay for (say) blood-pressure medication. This might be expected to result in more people with medical aid and blood-pressure problems taking their medication.
It also helps to pay for doctors. This leads to more people going to the doctor with minor complaints, and increased chances of catching something serious earlier.
Er, yes, fine, but… to the extent that the study shows anything, it shows that the positive results of these effects, if they exist, are consistent with zero. Can we please discuss the data, now that we have some, and not theory?
This annoys me because she doesn’t talk at all about the power of the study. Usually, when you see statistically insignificant positive changes across the board in a study without much power, its a suggestion you should hesitantly update a very tiny bit in the positive direction, AND you need another study, not a suggestion you should update downward.
When ethics prevent us from constructing high power statistical studies, we need to be a bit careful not to reify statistical significance.
If the effect is so small that a sample of several thousand is not sufficient to reliably observe it, then it doesn’t even matter that it is positive. An analogy: Suppose I tell you that eating garlic daily increases your IQ, and point to a study with three million participants and P < 1e-7. Vastly significant, no? Now it turns out that the actual size of the effect is 0.01 points of IQ. Are you going to start eating garlic? What if it weren’t garlic, but a several-billion-dollar government health program? Statistical significance is indeed not everything, but there’s such a thing as considering the size of an effect, especially if there’s a cost involved.
Moreover, please consider that “consistent with zero” means exactly that. If you throw a die ten times and it comes up heads six, do you “hesitantly update a very tiny bit” in the direction of the coin being biased? Would you do so, if you did not have a prior reason to hope that the coin was biased?
I respectfully suggest that you are letting your already-written bottom line interfere with your math.
If I throw a die and it comes up heads, I’d update in the direction of it being a very unusual die. :-)
I strongly disagree.
An old comment of mine gives us a counterexample. A couple of years ago, a meta-analysis of RCTs found that taking aspirin daily reduces the risk of dying from cancer by ~20% in middle-aged and older adults. This is very much a practically significant effect, and it’s probably an underestimate for reasons I’ll omit for brevity — look at the paper if you’re curious.
If you do look at the paper, notice figure 1, which summarizes the results of the 8 individual RCTs the meta-analysis used. Even though all of the RCTs had sample sizes in the thousands, 7 of them failed to show a statistically significant effect, including the 4 largest (sample sizes 5139, 5085, 3711 & 3310). The effect is therefore “so small that a sample of several thousand is not sufficient to reliably observe it”, but we would be absolutely wrong to infer that “it doesn’t even matter that it is positive”!
The heuristic that a hard-to-detect effect is probably too small to care about is a fair rule of thumb, but it’s only a heuristic. EHeller & Unnamed are quite right to point out that statistical significance and practical significance correlate only imperfectly.
tl;dr: NHST and Bayesian-style subjective probability do not mix easily.
Another example of this problem: http://slatestarcodex.com/2014/01/25/beware-mass-produced-medical-recommendations/
Does vitamin D reduce all-cause mortality in the elderly? The point-estimates from pretty much all of the various studies are around a 5% reduction in risk of dying for any reason—pretty nontrivial, one would say, no? Yet the results are almost all not ‘statistically significant’! So do we follow Rolf and say ‘fans of vitamin D ought to update on vitamin D not helping overall’… or do we, applying power considerations about the likelihood of making the hard cutoffs at p<0.05 given the small sample sizes & plausible effect sizes, note that the point-estimates are in favor of the hypothesis? (And how does this interact with two-sided tests—vitamin D could’ve increased mortality, after all. Positive point-estimates are consistent with vitamin D helping, and less consistent with no effect, and even less consistent with it harming; so why are we supposed to update in favor of no help or harm when we see a positive point-estimate?)
If we accept Rolf’s argument, then we’d be in the odd position of, as we read through one non-statistically-significant study after another, decreasing the probability of ‘non-zero reduction in mortality’… right up until we get the Autier or Cochrane data summarizing the exact same studies & plug it into a Bayesian meta-analysis like Salvatier did & abruptly flip to ’92% chance of non-zero reduction in mortality’.
That’s a curious metric to choose. By that standard taking aspirin is about as healthy as playing a round of Russian Roulette.
It’s a fairly natural metric to choose if one wishes to gauge aspirin’s effect on cancer risk, as the study’s authors did.
Fortunately, the study’s authors and I also interpreted the data by another standard. Daily aspirin reduced all-cause mortality, and didn’t increase non-cancer deaths (except for “a transient increase in risk of vascular death in the aspirin groups during the first year after completion of the trials”). These are not results we would see if aspirin effected its anti-cancer magic by a similar mechanism to Russian Roulette.
Pardon me. Mentioning only curiosity was politeness. The more significant meanings I would supplement with are ‘naive or suspicious’. By itself that metric really is worthless and reading this kind of health claim should set off warning bells. Lost purposes are a big problem when it comes to medicine. Partly because it is hard, mostly because there is more money in the area than nearly anywhere else.
And this is the reason low dose asprin is part of my daily supplement regime (while statins are not).
“All cause mortality” is a magical phrase.
I recently stopped with the low dose aspirin, the bleeding when I accidentally cut myself has proven to be too much of an inconvenience. For the time being, at least.
I’d assume they mean something like the per-year risk of dying from cancer conditional on previous survival—if they indeed mean the total lifetime risk of dying from cancer I agree it’s ridiculous.
Am I missing a subtlety here, or is it just that cancer is usually one of those things that you hope to live long enough to get?
Yeah, pretty much. There are other examples of this where something harmful appears to be helpful when you don’t take into account possible selection biases (like being put into the ‘non-cancer death’ category); for example, this is an issue in smoking—you can find various correlations where smokers are healthier than non-smokers, but this is just because the unhealthier smokers got pushed over the edge by smoking and died earlier.
Have you read the study in question? The treatment sample is NOT several thousand, its about 1500. Further, the incidence of the diseases being looked at are only a few percent or less, so the treatment sample sizes for the most prevalent diseases are around 50 (also, if you look at the specifics of the sample, the diseased groups are pretty well controlled).
I suggest the following exercise- ask yourself what WOULD be a big effect, and then work through if the study has the power to see it.
Yes, but in this case, the sample sizes are small and the error bars are so large that consistent with zero is ALSO consistent with 25+ % reduction in incidence (which is a large intervention). The study is incapable from distinguishing hugely important effect from 0 effect, so we shouldn’t update much at all, which is why I wished Mcardle had talked about statistical power. Before we ask “how should we update”, we should ask “what information is actually here?”
Edit: If we treat this as an exploration, it says “we need another study”- after all the effects could be as large as 40%! Thats a potentially tremendous intervention. Unfortunately, its unethical to randomly boot people off of insurance so we’ll likely never see that study done.
Health is extremely important—the statistical value of a human life is something like $8 million—so smallish looking effects can be practically relevant. An intervention that saves 1 life out of every 10,000 people treated has an average benefit of $800 per person. In this Oregon study, people who received Medicaid cost an extra $1,172 per year in total health spending, so the intervention would need to save 1.5 lives per 10,000 person-years (or provide an equivalent benefit in other health improvements) for the health benefits to balance out the health costs. The study looked at fewer than 10,000 people over 2 years, so the cost-benefit cutoff for whether it’s worth it is less than 3 lives saved (or equivalent).
So “not statistically significant” does not imply unimportant, even with a sample size of several thousand. An effect at the cost-benefit threshold is unlikely to show up in significant changes to mortality rates. The intermediate health measures in this study are more sensitive to changes than mortality rate, but were they sensitive enough? Has anyone run the numbers on how sensitive they’d need to be in order to find an effect of this size? The point estimates that they did report are (relative to control group) an 8% reduction in number of people with elevated blood pressure, 17% reduction in number of people with high cholesterol, and 18% reduction in number of people with high glycated hemoglobin levels (a marker of diabetes), which intuitively seem big enough to be part of an across-the-board health improvement that passes cost-benefit muster.
This would be much more convincing if you reported the costs along with the benefits, so that one could form some kind of estimate of what you’re willing to pay for this. But, again, I think your argument is motivated. “Consistent with zero” means just that; it means that the study cannot exclude the possibility that the intervention was actively harmful, but they had a random fluctuation in the data.
I get the impression that people here talk a good game about statistics, but haven’t really internalised the concept of error bars. I suggest that you have another look at why physics requires five sigma. There are really good reasons for that, you know; all the more so in a mindkilling-charged field.
I was responding to the suggestion that, even if the effects that they found are real, they are too small to matter. To me, that line of reasoning is a cue to do a Fermi estimate to get a quantitative sense of how big the effect would need to be in order to matter, and how that compares to the empirical results.
I didn’t get into a full-fledged Fermi estimate here (translating the measures that they used into the dollar value of the health benefits), which is hard to do that when they only collected data on a few intermediate health measures. (If anyone else has given it a shot, I’d like to take a look.) I did find a couple effect-size-related numbers for which I feel like I have some intuitive sense of their size, and they suggest that that line of reasoning does not go through. Effects that are big enough to matter relative to the costs of additional health spending (like 3 lives saved in their sample, or some equivalent benefit) seem small enough to avoid statistical significance, and the point estimates that they found which are not statistically significant (8-18% reductions in various metrics) seem large enough to matter.
My overall conclusion about the (based on what I know about it so far) study is that it provides little information for updating in any direction, because of those wide error bars. The results are consistent with Medicaid having no effect, they’re consistent with Medicaid having a modest health benefit (e.g., 10% reduction in a few bad things), they’re consistent with Medicaid being actively harmful, and they’re consistent with Medicaid having a large benefit (e.g. 40% reduction in many bad things). The likelihood ratios that the data provide for distinguishing between those alternatives are fairly close to one, with “modest health benefit” slightly favored over the more extreme alternatives.
Again, the original point McArdle is making is that “consistent with zero” is just completely not what the proponents expected beforehand, and they should update accordingly. See my discussion with TheOtherDave, below. A small effect may, indeed, be worth pursuing. But here we have a case where something fairly costly was done after much disagreement, and the proponents claimed that there would be a large effect. In that case, if you find a small effect, you ought not to say “Well, it’s still worth doing”; that’s not what you said before. It was claimed that there would be a large effect, and the program was passed on this basis. It is then dishonest to turn around and say “Ok, the effect is small but still worthwhile”. This ignores the inertia of political programs.
Most Medicaid proponents did not have expectations about the statistical results of this particular study. They did not make predictions about confidence intervals and p values for these particular analyses. Rather, they had expectations about the actual benefit of Medicaid.
You cite Ezra Klein as someone who expected that Medicaid would drastically reduce mortality; Klein was drawing his numbers from a report which estimated that in the US “137,000 people died from 2000 through 2006 because they lacked health insurance, including 22,000 people in 2006.” There were 47 million uninsured Americans in 2006, so those 22,000 excess deaths translate into 4.7 excess deaths per 10,000 uninsured people each year. So that’s the size of the drastic reduction in mortality that you’re referring to: 4.7 lives per 10,000 people each year. (For comparison, in my other comment I estimated that the Medicaid expansion would be worth its estimated cost if it saved at least 1.5 lives per 10,000 people each year or provided an equivalent benefit.)
Did the study rule out an effect as large as this drastic reduction of 4.7 per 10,000? As far as I can tell it did not (I’d like to see a more technical analysis of this). There were under 10,000 people in the study, so I wouldn’t be surprised if they missed effects of that size. Their point estimates, of an 8-18% reduction in various bad things, intuitively seem like they could be consistent with an effect that size. And the upper bounds of their confidence intervals (a 40%+ reduction in each of the 3 bad things) intuitively seem consistent with a much larger effect. So if people like Klein and Drum had made predictions in advance about the effect size of the Oregon intervention, I suspect that their predictions would have fallen within the study’s confidence interval.
There are presumably some people who did expect the results of the study to be statistically significant (otherwise, why run the study?), and they were wrong. But this isn’t a competition between opponents and proponents where every slipup by one side cedes territory to the other side. The data and results are there for us to look at, so we can update based on what the study actually found instead of on which side of the conflict fought better in this battle. In this case, it looks like the correct update based on the study (for most people, to a first approximation) is to not update at all. The confidence interval for the effects that they examined covers the full range of results that seemed plausible beforehand (including the no-effect-whatsoever hypothesis and the tens-of-thousands-of-lives-each-year hypothesis), so the study provides little information for updating one’s priors about the effectiveness of Medicaid.
For the people who did make the erroneous prediction that the study would find statistically significant results, why did they get it wrong? I’m not sure. A few possibilities: 1) they didn’t do an analysis of the study’s statistical power (or used some crude & mistaken heuristic to estimate power), 2) they overestimated how large a health benefit Medicaid would produce, 3) the control group in Oregon turned out to be healthier than they expected which left less room for Medicaid to show benefits, 4) fewer members of the experimental group than they expected ended up actually receiving Medicaid, which reduced the actual sample size and also added noise to the intent-to-treat analysis (reducing the effective sample size).
I do want to point out that, while I agree with your general points, I think that unless the proponents put numerical estimates up beforehand, it’s not quite fair to assume they meant “it will be statistically significant in a sample size of N at least 95% of the time.” Even if they said that, unless they explicitly calculated N, they probably underestimated it by at least one order of magnitude. (Professional researchers in social science make this mistake very frequently, and even when they avoid it, they can only very rarely find funding to actually collect N samples.)
I haven’t looked into this study in depth, so semi-related anecdote time: there was recently a study of calorie restriction in monkeys which had ~70 monkeys. The confidence interval for the hazard ratio included 1 (no effect), and so they concluded no statistically significant benefit to CR on mortality, though they could declare statistically significant benefit on a few varieties of mortality and several health proxies.
I ran the numbers to determine the power; turns out that they couldn’t have reliably noticed the effects of smoking (hazard ratio ~2) on longevity with a study of ~70 monkeys, and while I haven’t seen many quoted estimates of the hazard ratio of eating normally compared to CR, I don’t think there are many people that put them higher than 2.
When you don’t have the power to reliably conclude that all-cause mortality decreased, you can eke out some extra information by looking at the signs of all the proxies you measured. If insurance does nothing, we should expect to see the effect estimates scattered around 0. If insurance has a positive effect, we should expect to see more effect estimates above 0 than below 0, even though most will include 0 in their CI. (Suppose they measure 30 mortality proxies, and all of them show a positive effect, though the univariate CI includes 0 for all of them. If the ground truth was no effect on mortality proxies, that’s a very unlikely result to see; if the ground truth was a positive effect on mortality proxies, that’s a likely result to see.)
Incidentally, how did you do that?
If I remember correctly, I noticed an effect that did give a p of slightly less than .05 was a hazard ratio of 3, which made me think of running that test, and then I think spower was the r function that I used to figure out what p they could get for a hazard ratio of 2 and 35 experimentals and 35 controls (or whatever the actual split was- I think it was slightly different?).
So you were using
Hmisc::spower
… I’m surprised that there was even such a function (however obtusely named) - why on earth isn’t it in thesurvival
library?I was going to try to replicate that estimate, but looking at the spower documentation, it’s pretty complex and I don’t think I could do it without the original paper (which is more work than I want to do).
It is of course very difficult to extract any precise numbers from a political discussion. :) However, if you click through some of the links in the article, or have a look at the followup from today, you’ll find McArdle quoting predictions of tens of thousands of preventable deaths yearly from non-insured status. That looks to me like a pretty big hazard rate, no?
No. The Oracle says there’re about 50 million Americans without health insurance. The predictions you quoted refer to 18,000 or 27,000 deaths for want of insurance per year. The higher number implies only a 0.054% death rate per year, or a 3.5% death rate over 65 years (Americans over 65 automatically get insurance). This is non-negligible but hardly huge (and potentially important for all that).
Edit: and I see gwern has whupped me here.
Eyeballing the statistics, that looks like a hazard ratio between 1.1 and 1.5 (lots of things are good predictors for mortality that you would want to control for that I haven’t; the more you add, the closer that number should get to 1.1).
It looks like you’re referring to a hazard ratio or maybe a relative risk, neither of which are the same as a “hazard rate” AFAIK.
You’re right; I’m thinking of hazard ratios. Editing.
Over a population of something like 50 million people? Dunno.
If I throw a die once and it comes up heads I’m going to be confused. Now, assuming you meant “toss a coin and it comes up heads six times out of ten”.
What is your intended ‘correct’ answer to the question? I think I would indeed hesitantly update a very (very) tiny bit in the direction of the coin being biased but different priors regarding the possibility of the coin being biased in various ways and degrees could easily make the update be towards not-biased. I’d significantly lower p(the coin is biased by having two heads) but very slightly raise p(the coin is slightly heavier on the tails side), etc.
My intended correct answer is that, on this data, you technically can adjust your belief very slightly; but because the prior for a biased coin is so tiny, the update is not worth doing. The calculation cost way exceeds any benefit you can get from gruel this thin. I would say “Null hypothesis [ie unbiased coin] not disconfirmed; move along, nothing to see here”. And if you had a political reason for wishing the coin to be biased towards heads, then you should definitely not make any such update; because you certainly wouldn’t have done so, if tails had come up six times. In that case it would immediately have been “P-level is in the double digits” and “no statistical significance means exactly that” and “with those errors we’re still consistent with a heads bias”.
I would think that our prior for “health care improves health” should be quite a bit larger than the prior for a coin to be biased.
That depends on how long “we” have been reading Overcoming Bias.
Hanson’s point is that we often over-treat to show we care- not that 0 health care is optimal. Medicaid patients don’t really have to worry about overtreatment.
I was interpreting “health care improves health” as “healthcare improves health on the margin.” Is this not what was meant?
As someone who has a start-up in the healthcare industry, this runs counter to my personal experience. Also, currently “medicaid overtreatment” is showing about 676,000 results on Google (while “medicaid undertreatment” is showing about 1,240,000 results). Even if it isn’t typical, it surely isn’t an unheard-of phenomenon.
No, I meant going from 0 access to care to some access to care improves health, as we are discussing the medicaid study comparing people on medicaid to the uninsured.
I currently work as a statistician for a large HMO, and I can tell you for us, medicaid patients generally get the ‘patch-you-up-and-out-the-door’ treatment because odds are high we won’t be getting reimbursed in any kind of timely fashion. I’ve worked in a few states, and it seems pretty common for medicaid to be fairly underfunded (hence the Oregon study we are discussing).
And generally, providing medicaid is moving someone from emergency-only to some-primary-care, which is where we should expect some impact- this isn’t increasing treatment on the margin, its providing minimal care to a largely untreated population.
So I randomly sampled ~5 in the first two pages, and 3 of those were articles about overtreatment that had a sidebar to a different article discussing some aspect of medicaid, so I’m not sure if the count is meaningful here. (The other 2 were about some loophole dentists were using to overtreat children on medicaid and bill extra, I have no knowledge of dental claims).
This does not appear to be the actual change in access to care when going from being uninsured to on medicaid. As you mention, uninsured patients receive emergency-only care.
Such a study might show that it doesn’t matter on average. But you’d need those numbers to see if it’s increasing the spread of values. That would mean that it really helps some and hurts others. If you can figure out which is which, then it’ll end up being useful. Heck, this applies even if the average effect is negative.
I don’t know how often bio-researchers treat the standard deviation as part of their signal. I suspect it’s infrequent.
How large was your prior for “insurance helps some and harms others, and we should try to figure out which is which” before that was one possible way of rescuing insurance from this study? That sort of argument is, I respectfully suggest, a warning signal which should make you consider whether your bottom line is already written.
I wasn’t even thinking of insurance here. You were talking about garlic. I was thinking about my physics experiments where the standard deviation is a very useful channel of information.
That is Kevin Drum’s take. Post 1:
Post 2:
If this were the only medical study in all of history, then yes, a non-significant result should cause you to update as your quote says. In a world with thousands of studies yearly, you cannot do any such thing, because you’re sure to bias yourself by paying attention to the slightly-positive results you like, and ignore the slightly-negative ones you dislike. (That’s aside from the well-known publication bias where positive results are reported and negative ones aren’t.) If the study had come out with a non-significant negative effect, would comrade Drum have been updating slightly in the direction of “Medicaid is bad”? Hah. This is why we impose the 95% confidence cutoff, which actually is way too low, but that’s another discussion. It prevents us from seeing, or worse, creating, patterns in the noise, which humans are really good at.
The significance cutoff is not a technique of rationality, it is a technique of science, like blinding your results while you’re studying the systematics. It’s something we do because we run on untrusted hardware. Please do not relax your safeguards if a noisy result happens to agree with your opinions! That’s what the safeguards are for!
Then also, please note that Kevin Drum’s prior was not actually “Medicaid will slightly improve these three markers”, it was “Medicaid will drastically reduce mortality”. (See links in discussion with TheOtherDave, below). If you switch your priors around as convenient for claiming support from studies, then of course no study can possibly cause you to update downwards. I would gently suggest that this is not a good epistemic state to occupy.