Why I don’t believe in the placebo effect

Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychological effect of taking the drug. The patient’s belief in the power of the medicine has a strong effect on its own. In fact, for some drugs such as antidepressants, the psychological effect of taking a pill is larger than the effect of the drug. It may even be worth it to give a patient an ineffective medicine just to benefit from the placebo effect. This is the conventional wisdom that I took for granted until recently.

I no longer believe any of it, and the short answer as to why is that big meta-analysis on the placebo effect. That meta-analysis collected all the studies they could find that did “direct” measurements of the placebo effect. In addition to a placebo group that could, for all they know, be getting the real treatment, these studies also included a group of patients that didn’t receive a placebo.

But even after looking at the meta-analysis I still found the situation confusing. The only reason I ever believed in the placebo effect was because I understood it to be a scientific finding. This may put me in a different position than people who believe in it from personal experience. But personally, I thought it was just a well-known scientific fact that was important to the design of clinical trials. How did it come to be conventional wisdom, if direct measurement doesn’t back it up? And what do the studies collected in that meta-analysis actually look like?

I did a lot of reading to answer these questions, and that’s what I want to share with you. I’m only going to discuss a handful of studies. I can’t match the force of evidence of the meta-analysis, which aggregated over two hundred studies. But this is how I came to understand what kind of evidence created the impression of a strong placebo effect, and what kind of evidence indicates that it’s actually small.

Examples: Depression

The observation that created the impression of a placebo effect is that patients in the placebo group tend to get better during the trial. Here’s an example from a trial of the first antidepressant that came to mind, which was Prozac. The paper is called “A double-blind, randomized, placebo-controlled trial of fluoxetine in children and adolescents with depression”.

In this test, high scores are bad. So we see both the drug group and the placebo group getting better in the beginning of at the beginning of the trial. By the end of the trial, the scores in those two groups are different, but that difference is not as big as the drop right at the beginning. I can see how someone could look at this and say that most of the effect of the drug is the placebo effect. In fact, the 1950s study that originally popularized the placebo effect consisted mainly of these kind of before-and-after comparisons.

Another explanation is simply that depression comes in months-long episodes. Patients will tend to be in a depressive episode when they’re enrolled in a trial, and by the end many of them will have come out of it. If that’s all there is to it, we would expect that a “no-pill” group (no drug, no placebo) would have the same drop.

I looked through the depression studies cited in that big meta-analysis, but I didn’t manage to find a graph precisely like the Prozac graph but with an additional no-pill group. Here’s the closest that I found, from a paper called “Effects of maintenance amitriptyline and psychotherapy on symptoms of depression”. Before I get into all the reasons why this isn’t directly comparable, note that the placebo and no-pill curves look the same, both on top:

The big difference is that this is trial is testing long-term “maintenance” treatment after recovery from a depressive episode. The 0 to 8 on the x axis here are months, whereas in the Prozac trial they were weeks. And we can’t interpret that drop at all, not only because everyone got the drug during that period before being randomized to these three groups, but also because those that didn’t get better weren’t included in the trial.

But still, look at how the drug group is doing better at the end, whereas the placebo and no-pill groups look about the same. Nobody would look at this and say the drug effect is mostly placebo effect. In fact there is no clear placebo effect at all. But this is the direct comparison, the one where we ask what would happen if the placebo were not given.

This is the kind of evidence that is collected in that big meta-analysis. There’s a few depression studies like this with drug, placebo, and no-pill groups, and they give the same impression. The meta-analysis isn’t limited to drugs, though, or placebo pills. For example, in a trial of the effect of bright light, the placebo was dim light.

If you look at the abstract of the meta-analysis, it says that there was no statistically significant placebo effect in depression. When Scott wrote about this, he was surprised that there were statistically significant effects for pain and nausea, but not for depression. I think I can clear this up.

On my reading, the meta-analysis found a measurable placebo effect for nausea, pain, and depression, all of about the same size. To see that, let’s look at the estimates of effect size and the confidence intervals, copied from their “Summary of Findings” table. Don’t worry about the units of the effect yet, I’ll get to that.

OutcomesEffect (SMD)95% CISignificant?
All clinical conditions‐0.23‐0.28 to ‐0.17Yes
Pain‐0.28‐0.36 to ‐0.19Yes
Nausea‐0.25‐0.46 to ‐0.04Yes
Depression‐0.25‐0.55 to 0.05No

The effect on depression is not statistically significant because the confidence interval includes zero. But if you instead ask whether depression is different from the other conditions, the answer is even more clearly no. A classic case of “The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant”. As for why depression had a wider confidence interval, I’ll note that there were far more studies on pain, and the studies on nausea had more consistent results.

Now, are these effect sizes big or small? These are standardized mean differences (SMD) between the placebo and no-treatment groups. Roughly, an SMD of 1 means that the difference between group averages was the same as the typical difference of a group member from its own group’s average. More precisely:

So it looks like getting a placebo does decrease the depression score. Maybe that decrease is about a quarter of the size of the variation within a group. Maybe more, maybe less. But it’s probably not nothing.

To get a sense of scale for that number, I wanted to see how it compared to antidepressants. So I looked at another meta-analysis: “Antidepressants versus placebo for depression in primary care”. I put their SMD in a table, along with some numbers from the placebo meta-analysis:

ComparisonEffect (SMD)95% CI
Drug vs placebo (depression)-0.49-0.67 to −0.32
Placebo vs no treatment (depression)-0.25-0.55 to 0.05
Placebo vs no treatment (all conditions, patient-reported)-0.26-0.32 to −0.19
Placebo vs no treatment (all conditions, observer-reported)-0.13-0.24 to −0.02

So, while I don’t believe that the effect of a placebo on depression scores is really zero, I doubt that it’s really the same size as the effect of a drug, compared to placebo. So I don’t believe, as some have said, that the effect of antidepressant drugs is mostly placebo effect. Still, it’s not like it’s an order of magnitude difference, is it?

But now I think we have to interpret what these “effects” on depression scores really are. Is there a “placebo effect” on the condition of depression, or merely how depression is reported? That’s why I included the breakdown into patient-reported and observer reported effects (for all conditions). It does seem that the “placebo effect” is smaller when reported by a medical professional, who I would hope is more objective than the patient. But we’re on risky ground here now, interpreting estimates with overlapping confidence intervals.

When I first began to doubt the conventional wisdom about the placebo effect, I didn’t look at depression. I wanted to look first at a disease that seems more “physical”, less subjective. And I remembered, doesn’t Robin Hanson have some evolutionary psychology theory to explain why placebos activate the immune system? What was that about?

Example: The common cold

The Robin Hanson post was inspired by a New Scientist article which made an intriguing claim:

Likewise, those people who think they are taking a drug but are really receiving a placebo can have a [immune] response which is twice that of those who receive no pills (Annals of Family Medicine, doi.org/​cckm8b).

The study is called “Placebo Effects and the Common Cold: A Randomized Controlled Trial”. It doesn’t actually say that placebos double the immune response. But it’s worth a closer look, both to quantify the placebo effect outside of mental illness, and because this miscommunication is an interesting case study in how belief in the placebo effect has been maintained.

If you didn’t see “Placebo” in the title, you would think this was a trial of echinacea. There were the two groups you would expect in such a trial: patients randomly assigned to either echinacea or placebo, without knowing which. But there were another two groups: a no-pill group, and an open-label echinacea group. Patients were also asked whether they’ve taken echinacea before and whether they believe it worked, and there was a separate analysis of the believers.

The study did, as indicated in the New Scientist article, measure immune response. The immune response was quantified by testing a nasal wash for the molecule IL-8, and for neutrophils, a kind of white blood cell. There were no statistically significant differences in these measurements. Even if we’re willing to interpret differences that aren’t statistically significant, I can’t see any measure that’s twice as high among the patients that got a pill. Not even in the subgroup of patients that believed in echinacea. I think the New Scientist article is just wrong.

But with the paper now in hand, let’s look at the “conclusion” section of the abstract:

Participants randomized to the no-pill group tended to have longer and more severe illnesses than those who received pills. For the subgroup who believed in echinacea and received pills, illnesses were substantively shorter and less severe, regardless of whether the pills contained echinacea. These findings support the general idea that beliefs and feelings about treatments may be important and perhaps should be taken into consideration when making medical decisions.

Also a very interesting claim, but we have to narrow it down. There were two “primary outcomes”, duration and severity. In the full sample, none of the comparisons mentioned in the quote were statistically significant. I guess the reviewers were fine with stating these as conclusions, despite not reaching statistical significance.

However, there was a statistically significant difference in duration of illness among the echinacea-believers. Comparing the no-pill group to those that received a pill (placebo or echinacea, open-label or blinded), those receiving a pill had shorter illnesses, with a confidence interval from 1.31 to 2.58 days shorter.

I want to stress that we have left behind the laboratory measurements and are again relying on self-reports. “Illness duration” was measured by asking the patient each day whether they think they still have a cold. This study has convinced me that people who say echinacea worked for them in the past, given pills that might be echinacea, tend to say “no” a couple days earlier. Whether that’s an effect of the pill on the immune response, or just on the self-report, you have to interpret.

The status of the placebo effect in science

One thing I learned reading all this is that a “powerful placebo” really is a widespread belief among scientists and doctors. The “conventional wisdom” that placebo effects are not only real, but can be as big as drug effects, isn’t just a misconception among the public.

But there does not seem to be a consensus on the issue. Some do believe in a “placebo effect” in the sense of a psychologically mediated effect on the condition. Others explain improvement in the placebo group as regression to the mean (which includes as a special case the patients tending to come out of a depressive episode during the trial). It was interesting reading some of the responses to an early version of the placebo meta-analysis. A Dr. McDonald wrote to say that he has previously calculated that the improvements in placebo groups are consistent with regression to the mean, and now that the direct comparison has been made, “It is time to call a myth a myth”. But someone else wrote that “A single, well-performed meta-analysis is insufficient to reject the traditional consensus about the placebo effect.”

And the story didn’t end when that meta-analysis was published. In depression, a recent study was taken by some as evidence for a placebo effect, but another author said “this particular trial is biased to an extent that the results are not interpretable”. Personally I agree with the latter. The issue is that about half the no-pill group dropped out of the trial, and their depression measurements were imputed with the method of “last observation carried forward”. To see the problem with that, look back at that first plot, the one from the Prozac trial, and imagine terminating one of those curves at week 2 or 3 and then extrapolating it with a horizontal line.

Once I saw the problem with “last observation carried forward” (LOCF), though, I saw it in other places too. In fact, that plot from the Prozac trial was made using LOCF, and the placebo group had more patients dropping out. I wonder how much of a difference that makes to our SMD scores.

But now that I’ve brought up yet another statistical issue, I want to take a step back and ask, why is this so confusing? Why haven’t we settled this yet? And a big part of the answer is that the placebo effect is small. My guess is that the “real” placebo effect on health is about zero. But even if you think this is really a psychologically mediated effect on health, it’s not a big one. That’s the root of all our problems. Big effects are easy to measure. Stepping back from the confusion about whether the effect is small or zero, we know with confidence that it’s not large.

Even if you forget the subtleties, we’re a long way from the story where scientists need placebo groups because placebos on their own are such powerful medicine. We do still need placebo groups. If there’s no placebo effect on health, then the differences we see must be biased reporting. We’ve also seen the issues that can be caused by the untreated group dropping out of the trial. Placebos are necessary as a tool for a blinding.

As for why we thought placebos provided substantial health benefits in the first place, it seems that this impression was originally based on regression to the mean, rather than direct comparisons. It was also interesting how badly the facts on the common cold were misrepresented on their way to arriving in the rationalist blogosphere.