tl;dr: Someone wrote buggy R code and rushed a preprint out the door without proofreading or sanity checking the numbers.
The main claim of the paper is this:
The total number of estimated laboratory–confirmed cases (i.e. cumulative cases) is 18913 (95% CrI: 16444–19705) while the actual numbers of reported laboratory–confirmed cases during our study period is 19559 as of February 11th, 2020. Moreover, we inferred the total number of COVID-19 infections (Figure S1). Our results indicate that the total number of infections (i.e. cumulative infections) is 1905526 (95%CrI: 1350283– 2655936)
So, they conclude that less than 1% of cases were detected. They claim 95% confidence that no more than 1.5% of cases were detected. They combine this with the (unstated) assumption that 100% of deaths were detected and reported, and that therefore the IFR is two orders of magnitude lower than is commonly believed. This is an extraordinary claim, which the paper doesn’t even really acknowledge; they just sort of throw numbers out and fail to mention that their numbers are wildly different from everyone else’s. Their input data is
the daily series of laboratory–confirmed COVID-19 cases and deaths in Wuhan City and epidemiological data of Japanese evacuees from Wuhan City on board government–chartered flights
This is not a dataset which is capable of supporting such a conclusion. On top of that, the paper has other major signals of low quality. The paper is riddled with typos. And there’s this bit:
Serial interval estimates of COVID-19 were derived from previous studies of nCov, indicating that it follows a gamma distribution with the mean and SD at 7.5 and 3.4 days, respectively, based on [14]
In this post I collected estimates of COVID-19′s serial interval. 7.5 days was the chronologically first published estimate, was the highest estimate, and was an outlier with small sample size. Strangely, reference [14] does not point to the paper which estimated 7.5 days; that’s reference 21, whereas reference 14 points to this paper which makes no mention of the serial interval at all.
I was particularly bemused by quoting cumulative infections to 7 significant figures where the 95% confidence interval spanned a factor of 2. This did not fill me with confidence...
tl;dr: Someone wrote buggy R code and rushed a preprint out the door without proofreading or sanity checking the numbers.
The main claim of the paper is this:
So, they conclude that less than 1% of cases were detected. They claim 95% confidence that no more than 1.5% of cases were detected. They combine this with the (unstated) assumption that 100% of deaths were detected and reported, and that therefore the IFR is two orders of magnitude lower than is commonly believed. This is an extraordinary claim, which the paper doesn’t even really acknowledge; they just sort of throw numbers out and fail to mention that their numbers are wildly different from everyone else’s. Their input data is
This is not a dataset which is capable of supporting such a conclusion. On top of that, the paper has other major signals of low quality. The paper is riddled with typos. And there’s this bit:
In this post I collected estimates of COVID-19′s serial interval. 7.5 days was the chronologically first published estimate, was the highest estimate, and was an outlier with small sample size. Strangely, reference [14] does not point to the paper which estimated 7.5 days; that’s reference 21, whereas reference 14 points to this paper which makes no mention of the serial interval at all.
I was particularly bemused by quoting cumulative infections to 7 significant figures where the 95% confidence interval spanned a factor of 2. This did not fill me with confidence...