Errata: My original calculation underestimated the risk by a factor of about 2x. I neglected two key considerations, which fortunately somewhat canceled each other out. My new estimate from the calculation is 3.0 to 11.7 quality-adjusted days lost to long-term sequelae, with my all-things-considered mean at 45.
The two key things I missed:
- I estimated the risk of a non-hospitalized case is about 10x less than a hospitalized case, and so divided the estimates of disease burden by 10x. The first part is correct, but the second part would only make sense if all disease burden was due to hospitalized cases. In fact, there’s a 15:85% split between hospitalized and non-hospitalized patients in the study (13,654:73,435). So if the disease burden for non-hospitalized is x, the total burden is 0.15*10x + 0.85*x = 2.35x. So we should divide by 2.35, not 10.
- However, as Owain pointed out below, the [demographics](https://www.nature.com/articles/s41586-021-03553-9/tables/1) are non-representative and probably skew high-risk given the median age is 60. the demographics are relatively high-risk. Indeed, this is suggested by the 15% hospitalized figure (which also, I suspect, means they just never included asymptomatic and most mildly symptomatic cases). An ONS survey (Figure 4) put symptoms reported after 5 weeks at 25% (20-30%) for 50-69 year olds and 17.5 (12.5 to 22.5%) for 17 to 24 year olds, which is surprisingly little difference, about a 1.5 decrease. I’d conjecture a 2x decrease in risk (noting that assuming no hospitalization is already doing a lot of work here).
Original post:
I did my own back-of-the-envelope calculation and came up with a similar but slightly higher estimated cost of 1.4 to 5.5 quality-adjusted days lost to long-term sequalea conditional on getting symptomatic COVID case. FWIW, I originally thought the OPs numbers seemed way too low, and was going to write a take-down post—but unfortunately the data did not cooperate with this agenda. I certainly don’t fully trust these numbers: it’s based on a single study, and there were a bunch of places I didn’t keep track of uncertainty, so the true credible interval should definitely be a lot wider. Given that and the right-tailed nature of the distribution, my all-things-considered mean is closer to 30 because of this, but figured I’d share the BOTEC anyway in case it’s helpful to anyone.
My model is pretty simple:
1. What % of symptoms are there at some short-term follow up period (e.g. 4 to 12 weeks)? This we actually have data on.
2. How bad are these symptoms? This is fairly subjective.
3. How much do we expect these symptoms to decay long-term? This is going off priors.
For 1. I used Al-Aly et al (2021) as a starting point, which was based on comparing medical records between a COVID-positive and non-COVID demographically matched control group in the US Department of Veterans Affairs database. Anna Ore felt this was one of the more rigorous ones, and I agree. Medical notes seem more reliable than self-report (though far from infallible), they seem to have actually done a Bonferroni correction, and they tested their methodology didn’t pick up any false positives via both a negative-outcome and negative-exposure controls. Caveat: many other studies have scarier headline figures, and it’s certainly possible relying on medical records skews this low (e.g. doctors might be reluctant to give a diagnosis, many patients won’t go to the doctor for mild symptoms, etc).
They report outcomes that occurred between 30 and 180 days after COVID exposure, although infuriatingly don’t seem to break it down any further by date. Figure 2 shows all statistically significant symptoms, in terms of the excess burden (i.e. increase above control) of the reported symptom per 1000 patients. There were 38 in total, ranging from 2.8% (respiratory signs and symptoms) to 0.15% (pleurisy). In total the excess burden was 26%.
I went through and rated each symptom with a very rough and subjective high / medium / low severity. 2% excess burden of high severity symptoms, 19% medium severity, 5% low severity. I then ballparked that high severity (e.g. heart disease, diabates, heart failure) wiped out 30% of your QALYs, medium severity (e.g. respiratory signs, anxiety disorders, asthma) as 5% and low (e.g. skin rash) as 1%. Caveat: there’s a lot of uncertainty in these numbers. Although I suspect I’ve gone for higher costs than most people would, since I tend to think health has a pretty big impact on productivity.
Using my weightings, we get a 1.6% reduction in QALYs conditional on symptomatic COVID case. I think this is misleading for three reasons:
1. Figure 3 shows that excess burden is much higher for people who were hospitalized, and if anything the gap seems bigger for more severe symptoms (e.g. about 10x less heart failure in people positive but not hospitalized, whereas rates of skin rash were only 2x less). This is good news as vaccines seem significantly more effective at preventing hospitalizations, and if you are fortunate enough to be a young healthy person your chance of being hospitalized was pretty low to begin with. I’m applying a 10x reduction for this.
2. This excess burden is per diagnosis, not per patient. Sick people tend to receive multiple diagnoses. I’m not sure how to handle this. In some cases, badness-of-symptoms does seem roughly additive: if I had a headache, I’d probably pay a similar amount not to also develop a skin rash then if my head didn’t hurt. But it seems odd to say that someone who drops dead from cardiac arrest was more fortunate than another patient with the same cause of death, who also had the misfortune of being diagnosed with heart failure a week earlier. So there’s definitely some double-counting with the diagnosis, which I think justifies a 2-5x decrease.
3. This study was presumably predominantly the original COVID strain (based on a cohort between March 2020 and 30 November 2020). Delta seems, per the OP, about 2-3x worse: so let’s increase it by that factor.
Overall we decrease 1.6% by a factor of 6.5 (10*2/3) to 25 (10*5/2), to get a short-term QALY reduction of 0.064% to 0.24%.
However, El-Aly et al include any symptom reported between 30 to 180 days. What we really care about is chance of lifelong symptoms if someone is experiencing a symptom after 6 months there seems like a considerable chance it’ll be lifelong, but if only 30 days has elapsed the chance of recovery seems much higher. A meta-review by Thompson et al (2021) seems to show a drop of around 2x between symptoms in a 4-12 week period vs 12+ weeks (Table 2), although with some fairly wild variation between studies so I do not trust this that much. In an extremely dubious extrapolation from this, we could say that perhaps symptoms half again from 12 weeks to 6 months, again from 6 months to a year, and after that persist as a permanent injury. In this case, we’d divide the “symptom after 30 days figure” from Al-Aly et al by a factor of 8 to get the permanent injury figure, which seems plausible to me (but again, you could totally argue for a much lower number).
With this final fudge, we get a lifelong QALY reduction of 0.008% to 0.03%. Assuming a 50-year life expectancy, this amounts to 1.4 to 5.5 days of cost from long-term sequelae. Of course, there are also short-term costs (and risk of morbidity!) that is omitted from this analysis, so the total costs will be higher than this.
This matches my impression. FAR could definitely use more funding. Although I’d still at the margin rather hire someone above our bar than e.g. have them earn-to-give and donate to us, the math is getting a lot closer than it used to be, to the point where those with excellent earning potential and limited fit for AI safety might well have more impact pursuing a philanthropic pathway.
I’d also highlight there’s a serious lack of diversity in funding. As others in the thread have mentioned, the majority of people’s funding comes (directly or indirectly) from OpenPhil. I think OpenPhil does a good job trying to mitigate this (e.g. being careful about power dynamics, giving organizations exit grants if they do decide to stop funding an org, etc) it’s ultimately not a healthy dynamic, and OpenPhil appears to be quite capacity constrained in terms of grant evaluation. So, the entry of new funders would help diversify this in addition to increasing total capacity.
One thing I don’t see people talk about as much but also seems like a key part of the solution: how can alignment orgs and researchers make more efficient use of existing funding? Spending that was appropriate a year or two ago when funding was plentiful may not be justified any longer, so there’s a need to explicitly put in place appropriate budgets and spending controls. There’s a fair amount of cost-saving measures I could see the ecosystem implementing that would have limited if any hit on productivity: for example, improved cash management (investing in government money market funds earning ~5% rather than 0% interest checking accounts); negotiating harder with vendors (often possible to get substantial discounts on things like cloud compute or commercial real-estate); and cutting back on some fringe benefits (e.g. more/higher-density open plan rather than private offices). I’m not trying to point fingers here: I’ve made missteps here as well, for example FAR’s cash management currently has significant room for improvement—we’re in the process of fixing this and plan to share a write-up of what we found with other orgs in the next month.