We Still Don’t Know If Masks Work


Recently a paper was published which got attention by estimating that 100% mask wearing in the population would cause a 25% reduction in the effective transmission number (shortened to transmissiblity throughout).

This study was observational and so inferring causality is always difficult. Thanks to the excellent data availability I was able to replicate and attempt to validate the model.

Based on my analysis, this study does not provide good evidence for such a causal link, and there is evidence to believe that the observed correlation is spurious.

Model Details:

The paper uses a fairly simple model in combination with MCMC sampling to arrive at their estimates of the effectiveness of various interventions. This simple model allows them to combine multiple disparate data sources together to get an estimate combining multiple possible effects.

In order to arrive at its estimate of transmissibility the model considers the following parameters.

  • Various NPIs such as school closing or restrictions on gathering at a regional level.

  • Regional Mobility (based on Google Mobility Reports)

  • Self Reported Mask Wearing (based on Facebook surveys)

  • A custom regional factor.

  • Random variation in transmissibility over time.

It then computes the most likely distribution of weights for each of these factors based on the likelihood of matching the observed cases and deaths.

Observational Window:

One common criticism of this paper has been that the window was chosen specifically to make masks look better. If the analysis was extended to include the winter spike, masks would come out looking worse.

This was the critique I initially started out to analyze, but it does not appear to be true. When I extended the analysis window to December, masks still appeared to be effective. In fact the claimed effectiveness increased to 40% over that interval. The data on NPIs in the model is not as high quality for the full interval, but the effect being observed seems robust to a wider time range.

Regional Effects:

If mask wearing causes a drop in transmissibility, then regions with higher levels of mask wearing should observe lower growth rates. This figure from the paper makes the implication clear. In fact the model does not actually make that prediction. Instead (holding NPIs and mobility constant) regions with higher mask wearing are predicted to have higher growth.

This occurs because the custom regional factor learned in our model actually correlates positively with mask wearing.

If we apply the expected 25% reduction in transmissibility for complete mask wearing we see that the higher masked regions still have slightly higher transmissibility.

Relative vs Absolute Mask Wearing:

This analysis of the data leads to a seeming contradiction. Within a given region, increased mask usage is correlated with lower growth rates (the 25% claimed effectiveness), but when comparing across regions masks seem to be ineffective. Depending on the causal story you want to tell, either of these claims could be true.

It is possible that people wear masks most in places where COVID is most transmissible. This would explain why masks don’t appear effective when comparing across regions.

However it is also possible that the same factors cause both mask wearing to increase and transmissibility to decrease. For instance, if people wear masks in response to an observed spike in cases, then the population immunity caused by the spike will make masks appear to be effective even if they are not.

Model Variations:

In order to try and determine whether the effect was likely to be true I tried two variations on the experiment.

Uniform Regional Transmissibility:

The first experiment was to force all regions to share the same base transmissibility. This provided an estimate that masks had an effectiveness of −10% (less effective than nothing). This validates the basic concern, but does not address the confounder of high transmissibility regions causing mask wearing (which causes transmissibility to decrease).

No Mask Variation:

The next experiment was to force each regions to use a constant value for mask wearing (the average value in the time period). Although this would add noise to the estimate, it should distinguish between the two effects. In this experiment masks appeared to be reduce transmission, but the point estimate was only ~8% and was not significantly different from 0.

Data Extrapolation:

Another experiment would be to look at the difference between April and May. During this period mask usage did increase (by around 8 percentage points) and the growth rate did decline. But again there was little to no correlation between how much the mask usage increased, and how much the growth rate declined.


The failure of large absolute differences in mask wearing across regions to meaningfully impact the observed growth rate, should make us skeptical of large claimed effects within a particular region. Clever statistical methods can make observational studies more powerful, but they are not sufficient to prove a causal story. A randomized trial underway in Bangladesh found that they could increase mask wearing by 30 percentage points. This randomization allows us to infer causality with much more confidence, and should provide a more definitive answer.