The Dutch festival actually was a 2-day event with a total capacity of 10,000 people per day. But it is reasonable to assume that some amount of people attend the first and then the second day, so the total number of participants is lower than 20,000 and correspondingly the rate of infection is unknown but somewhere between 5% and 10%.
Just wanted to confirm you have accurately described my thoughts, and I feel I have a better understanding of your position as well now.
I agree with your reading of my points 1,2,4 and 5 but think we are not seeing eye to eye on points 3 and 6. It also saddens me that you condensed the paragraph on how I would like to view the how-much-should-we-trust-science landscape to its least important sentence (point 4), at least from my point of view.
As for point 3, I do not want to make a general point about the reliability of science at all. I want to discuss what tools we have to evaluate the accuracy of any particular paper or claim, so that we can have more appropriate confidence across the board. I think this is the most important discussion regardless of whether it increases or decreases general confidence. In my opinion, attempting to give a 0th-order summary by discussing the average change in confidence from this approach is doing more harm than good. The sentence “You just want to make the general point that you can’t trust everything you read, with the background understanding that sometimes this is more important, and sometimes less.” is exactly backwards from what I am trying to say.
For point 6, I think it might be very relevant to point out that I’m European, and the anti-vax and global warming denialism really is not that popular around where I live. They are more considered stereotypes of being untrustworthy than properly held beliefs, thankfully. But ignoring that, I think that most of the people influencing social policy and making important decisions are leaning heavily on science, and unfortunately particularly on the types of science I have the lowest confidence in. I was hoping to avoid going into great detail on this, but as short summary I think it is reasonable to be less concerned with the accuracy of papers that have low (societal) impact and more concerned with papers that have high impact. If you randomly sample a published paper on Google Scholar or whatever I’ll happily agree that you are likely to find an accurate piece of research. But this is not an accurate representation of how people encounter scientific studies in reality. I see people break the fourth virtue all the way from coffeehouse discussions to national policy debates, which is so effective precisely because the link between data and conclusion is murky. So a lot of policy proposals can be backed by some amount of references. Over the past few years my attempts to be more even have led me to strongly decrease my confidence in a large number of scientific studies, if only to account for the selection effect that these, and not others, were brought to my attention.
Also I think psychology and nutrition are doing a lot better than they were a decade or two ago, which I consider a great sign. But that’s more of an aside than a real point.
I’ve upvoted you for the clear presentation. Most of the points you state are beliefs I held several years ago, and sounded perfectly reasonable to me. However, over time the track record of this view worsened and worsened, to the point where I now disagree not so much on the object level as with the assumption that this view is valuable to have. I hope you’ll bear with me as I try to give explaining this a shot.
I think the first, major point of disagreement is that the target audience of a paper like this is the “level 1” readers. To me it seems like the target audience consists of scientists and science fans, most of whom already have a lot of faith in the accuracy of the scientific process. It is completely true that showing this piece to someone who has managed to work their way into an unreasonable belief can make it harder to escape that particular trap, but unfortunately that doesn’t make it wrong. That’s the valley of bad rationality and all that. In fact, I think that strongly supports my main original claim—there are so many ways of using sophisticated arguments to get to a wrong conclusion, and only one way to accurately tally up the evidence, that it takes skill and dedication to get to the right answer consistently.
I’m sorry to hear about your friend, and by all means try to keep them away from posts like this. If I understand correctly, you are roughly saying “Science is difficult and not always accurate, but posts like this overshoot on the skepticism. There is some value in trusting published peer-reviewed science over the alternatives, and this view is heavily underrepresented in this community. We need to acknowledge this to dodge the most critical of errors, and only then look for more nuanced views on when to place exactly how much faith in the statements researchers make.” I hope I’m not misrepresenting your view here, this is a statement I used to believe sincerely. And I still think that science has great value, and published research is the most accurate source of information out there. But I no longer believe that this “level 2 view”, extrapolating (always dangerous :P) from your naming scheme, is a productive viewpoint. I think the nuance that I would like to introduce is absolutely essential, and that conflating different fields of research or even research questions within a field under this umbrella does more harm than good. In other words, I would like to discuss the accuracy of modern science with the understanding that this may apply to smaller or larger degree to any particular paper, exactly proportional to the hypothetical universe-separating ability of the data I introduced earlier. I’m not sure if I should spell that out in great detail every couple of sentences to communicate that I am not blanket arguing against science, but rather comparing science-as-practiced with truthfinding-in-theory and looking for similarities and differences on a paper-by-paper basis.
Most critically, I think the image of ‘overshooting’ or ‘undershooting’ trust in papers in particular or science in general is damaging to the discussion. Evaluating the accuracy of inferences is a multi-faceted problem. In some sense, I feel like you are pointing out that if we are walking in a how-much-should-I-trust-science landscape, to a lot of people the message “it’s really not all it’s cracked up to be” would be moving further away from the ideal point. And I agree. But simultaneously, I do not know of a way to get close (not “help the average person get a bit closer”, but get really close) to the ideal point without diving into this nuance. I would really like to discuss in detail what methods we have for evaluating the hard work of scientists to the best of our ability. And if some of that, taken out of context, forms an argument in the arsenal of people determined to metaphorically shoot their own foot off that is a tragedy but I would still like to have the discussion.
As an example, in your quote block I love the first paragraph but think the other 4 are somewhere between irrelevant and misleading. Yes, this discussion will not be a panacea to the replication crisis, and yes, without prior experience comparing crackpots to good sources you may well go astray on many issues. Despite all that, I would still really like to discuss how to evaluate modern science. And personally I believe that we are collectively giving it more credit than it deserves, which is spread in complicated ways between individual claims, research topics and entire fields of science.
That is very interesting, mostly because I do exactly think that people are putting too much faith in textbook science. I’m also a little bit uncomfortable with the suggested classification.
I have high confidence in claims that I think are at low risk of being falsified soon, not because it is settled science but because this sentence is a tautology. The causality runs the other way: if our confidence in the claim is high, we provisionally accept it as knowledge.
By contrast, I am worried about the social process of claims moving from unsettled to settled science. In my personal opinion there is an abundance of overconfidence in what we would call “settled science”. The majority of the claims therein are likely to be correct and hold up under scrutiny, but the bar is still lower than I would prefer.
But maybe I’m way off the mark here, or maybe we are splitting hairs and describing the same situation from a different angle. There is lots of good science out there, and you need overwhelming evidence to justify questioning a standard textbook. But there is also plenty of junk that makes it all the way into lecture halls, never mind all the previous hoops it had to pass through to get there. I am very worried about the statistical power of our scientific institutes in separating truth from fiction, and I don’t think the settled/unsettled distinction helps address this.
It seems to me that we should be really careful before extrapolating from the specific datasets, methods, and subfields these researchers are investigating into others. In particular, I’d like to see some care put into forecasting and selecting research topics that are likely or unlikely to stand up to a multiteam analysis.
I think this is good advice, but only when taken literally. In my opinion there is more than sufficient evidence to suggest that the choices made by researchers (pick any of the descriptions you cited) have a significant impact on the conclusions of papers across a wide variety of fields. Indeed, I think this should be the default assumption until proven otherwise. I’d motivate this primarily by the argument that there are many different ways to draw a wrong conclusion (especially under uncertainty), but only one right way to weigh up all the evidence. Put differently, I think undue influence of arbitrary decisions is the default, and it is only through hard work and collective scientific standards that we stand a chance of avoiding this.
I’ve seen calls to improve all the things that are broken right now: <list>
I think this is a flaw in and of itself. There are many, many ways to go wrong, and the entire standard list (p-hacking, selective reporting, multiple stopping criteria, you name it) should be interpreted more as symptoms than as causes of a scientific crisis.
The crux of the whole scientific approach is that you empirically separate hypothetical universes. You do this by making your universe-hypotheses spit out predictions, and then verify them. It seems to me that by and large this process is ignored or even completely absent when we start asking difficult soft science questions. And to clarify: I don’t particularly blame any researcher, or institute, or publishing agency or peer doing some reviewing. I think that the task at hand is so inhumanly difficult that collectively we are not up to it, and instead we create some semblance of science and call it a day.
From a distanced perspective, I would like my entire scientific process to look like reverse-engineering a big black box labeled ‘universe’. It has input buttons and output channels. Our paradigm postulate correlations between input settings and outputs, and then an individual hypothesis makes a claim about the input settings. We track forward what outputs would be caused by any possible input setting, observe the reality, and update with Bayesian odds ratios.
The problem is frequently that the data we are relying on is influenced by an absolutely gargantuan number of factors—as an example in the OP, the teenage pregnancy rate. I have no trouble believing that statewide schooling laws have some impact on this, but possibly so do for example above-average summer weather, people’s religious background, the ratio of boys to girls in a community, economic (in)stability, recent natural disasters and many more factors. So having observed the teenage pregnancy rates, inferring the impact of the statewide schooling laws is a nigh impossible task. Even just trying to put this into words my mind immediately translated this to “what fraction of the state-by-state variance in teenage pregnancy rates can be attributed to this factor, and what fraction to other factors” but even this is already an oversimplification—why are we comparing states at a fixed time, instead of tracking states over time, or even taking each state-time snapshot as an individual dataset? And why is a linear correlation model accurate, who says we can split the multi-factor model into additive components (implied by the fractions)?
The point I am failing to make is that in this case it is not at all clear what difference in the pregnancy rates we would observe if the statewide schooling laws had a decidedly negative, small negative, small positive or decidedly positive impact, as opposed to one or several of the other factors dominating the observed effects. And without that causal connection we can never infer the impact of these laws from the observed data. This is not a matter of p-hacking or biased science or anything of the sort—the approach doesn’t have the (information theoretic) power to discern the answer we are looking for in the first place, i.e. to single out the true hypothesis from between the false ones.
As for your pragmatic question, how can we tell if a study is to be trusted? I’d recommend asking experts in your field first, and only listening to cynics second. If you insist on asking, my method is to evaluate whether or not it seems plausible to me that, assuming that the conclusion of the paper holds, this would show up as the announced effect observed in the paper. Simultaneously I try to think of several other explanations for the same data. If either of these tries gives some resounding result I tend to chuck the study in the bin. This approach is fraught with confirmation bias (“it seems implausible to me because my view of the world suggests you shouldn’t be able to measure an effect like this”), but I don’t have a better model of the world to consult than my model of the world.
Thank you for the wonderful links, I had no idea that (meta)research like this was being conducted. Of course it doesn’t do to draw conclusion from just one or two papers like that, we would need a bunch more to be sure that we really need a bunch more before we can accept the conclusion.
Jokes aside, I think there is a big unwarranted leap in the final part of your post. You correctly state that just because the outcome of research seems to not replicate we should not assume evil intent (subconscious or no) on the part of the authors. I agree, but also frankly I don’t care. The version of Science Nihilism you present almost seems like strawman Nihilism: “Science does not replicate therefore everything a Scientist says is just their own bias”. I think a far more interesting statement would be “The fact that multiple well-meaning scientists get diametrically opposed results using the same data and techniques, which are well-accepted in the field, shows that the current standards in Science are insufficient to draw the type of conclusions we want.”
Or, from a more information-theoretic point of view, our process of honest effort by scientists followed by peer review and publication is not a sufficiently sharp tool to assign numbers to the questions we’re asking, and a large part of the variance in the published results is indicative of numerous small choices by the researchers instead of indicative of patterns in the data. Whether or not scientists are evil shills with social agendas (hint: they’re mostly not) is somewhat irrelevant if the methods used won’t separate truth from fiction. To me that’s proper Science Nihilism, none of this ‘intent’ or ‘bias’ stuff.
In a similar vein I wonder if the page count of the robustness check is really an indication of a solution to this problem. The alternative seems bleak (well, you did call it Nihilism) but maybe we should allow for the possibility that the entire scientific process as commonly practiced is insufficiently powerful to answer these research questions (for example, maybe the questions are ill-posed). To put it differently, to answer a research question we need to separate the hypothetical universes where it has one answer from the hypothetical universes where it has a different answer, and then observe data to decide which universe we happen to be in. In many papers this link between the separation and the data to be observed is so strenuous that I would be surprised if the outcome was determined by anything but the arbitrary choices of the researchers.
What do you mean ‘problem’? Everybody involved wants the inspection to go well, the correlation between the outcome of the inspection and the quality of the school/firm’s books is incidental at best.
This is a very good point, and in my eyes explains the observations pretty much completely. Thanks!
(yet it was contained in the UK, which is great and suggests I’m talking BS)
I continue to be extremely surprised by the UK decline in numbers. The Netherlands is reporting a current estimated R of 1.1-1.2 for the English strain and 0.8-0.9 for the wild types. They furthermore estimate that just over half of all newly reported cases are English strain by now. But the UK daily cases have dropped by 80% in 40 days, which at a reproduction time of 6 days would mean R = 0.79 throughout.
In the past I suggested a few potential, not mutually exclusive, explanations:
The UK has implemented significantly more effective measures, and if we just copy them we can totally beat the English strain.
The height of the UK peak in the second week of January was caused by Christmas and New Years holiday craze, which caused significant delayed reporting (‘better take that test after I visit all my friends and family, otherwise I won’t be allowed to join them’) and massively overestimates the peak, and also the decay.
The Dutch models are crap.
The UK numbers are crap.
The English strain has spread throughout the London area so rapidly that it hit local group immunity, and the plummet afterwards is caused by a lack of geographical spread. Once this picks up again the UK will see a stark rise in cases.
I previously put my money on hypothesis number 5, but as time goes on it steadily loses credibility. If anybody has a suggestion for what’s going on in the UK right now I’m all ears, I am currently not taking their drop in cases at face value.
The loss of life and health of innocent people who got suckered into a political issue without considering the ramifications?
I mean, the group of people who holds out on getting a vaccine as long as possible will definitely be harder to convince than the average citizen. But with these numbers (death rate, long term health conditions, effectiveness of vaccines) around are you seriously suggesting trying to help them is not cost-effective? From the post I think you’re talking about tens of millions of people in the USA alone, if not 100M+.
I personally have a very tough time fitting your interpretation into my model of the world. To me the popularity and actions of Facebook et al. are mostly disconnected from our ability to communicate with family and close friends.
In my opinion the timeline seems to be a little more as follows:
People are on Facebook and Twitter and other social media platforms both to stay in touch with friends and to complain about the outgroup.
COVID-19 hit, significantly reducing quality of life everywhere. People realign their political discussions and notions of outgroup along COVID-lines—are you a believer in lockdowns and masks and science or the opposite? This temporarily supersedes other political discussions, not because people have wonderfully unique and insightful opinions on COVID countermeasures but because this is the biggest event happening and as such is necessarily political.
After approximately one year of lockdowns and countermeasures people have sunk significant parts of their public profile into their thoughts regarding COVID. A large portion of the public, as well as officials, will support silencing opposition if only to retain a coherent public image (after all, if communication on COVID is not more important than free speech, what have you been doing all these months?).
Facebook rises to the occasion and offers to selflessly censor people according to criteria set by the WHO.
I’d like to couple this with a prediction that Facebook will not start censoring older messaged by the WHO and other Respected Officials. I see Facebook’s cooperation more as a power grab with plausible deniability than a desire for certain messages (officially endorsed) over others (crackpot/other). It only exists through the support of the very serious people, so it is counterproductive to start challenging them on their own history.
Lastly I think that if you genuinely want to have a heart-to-heart with your friends and family it is silly to restrict yourself to communicating via Facebook. Call them, start a blog, meet somewhere outside for a walk if you want. This has the twin benefit of you not having to worry about issues being ‘controversial’ as defined by Facebook, and them not having to publicly change their thoughts over your message. Also it is much less embarrassing if it turns out you were unbelievably overconfident all along.
You are correct, but the hope is that the probabilities involved stay low enough that a linear approximation is reasonable. Using for example https://www.microcovid.org/, typical events like a shopping trip carry infection risks well below 1% (dependent on location, duration of activity and precautions etc.).
I meant after the first shot, sorry for the confusion.
I think ojno has a point. Furthermore, to the best of my knowledge the protection from the vaccines takes a bit of time (10 days? 14 days?) to kick in after the vaccination. Arguably “proceed with the same caution as before” is a better message than “go nuts, dance and hug and visit all your friends” in this period, and for simplicity’s sake this has become the default message.
Who am I kidding, this is of course because we don’t want vaccination to be unfair. If you get social benefits from being vaccinated (by not having to abide by some of the restrictions) then the prioritisation discussion would be even fiercer than it is now. Plus, the more Sacrifices to the Gods you publicly support (h/t Svi) the more of a Serious Person you are, which the CDC tries very hard to be.
Mathoverflow has discussion on it. In short:
This area definition is equivalent to the standard definition, although this was (to me) not immediately obvious.
Some statements (linearity of integrals, for example) are obvious from the one definition, while others (the Monotone Convergence Theorem) are obvious from the other definition. Unfortunately, proving that the two definitions are equivalent is pretty much the proof for these statements (assuming the other definition).
The general approach of “given a claim, test it on indicator functions, then simple functions, then all integrable positive functions, then all integrable functions, then (if desired) integrable complex functions” is called the standard machine of measure theory, so there is educational benefit to seeing it.
It was pointed out to me that it is really not accurate to consider the UK daily COVID numbers as a single data-point. There could be any number of possible explanations for the decrease in the numbers. Some possible explanations include:
The current lockdown and measures are sufficient to bring the English variant to R<1.
The current measures bring the English variant to an R slightly above 1, and the wild variants to R well below 1, and because nationally the English variant is not dominant yet (even though it is in certain regions) this gives a national R<1.
The English strain has spread so aggressively regionally that group immunity effects in the London area have significantly slowed the spread, while not spreading as quickly geographically.
Most notably, hypotheses 2 & 3 predict that the stagnation will soon reverse back into acceleration (with hypothesis 3 predicting a far higher rate than 2), as the English variant becomes more prevalent throughout the rest of the UK. Let’s hope the answer is door number 1?
To what extent does ‘positive PCR test’ equate to ‘infectious’? Or is there some other good indicator? I know most health authorities say something like “if you have been contact with a person who tested positive, then from the point they are no longer symptomatic/first negative test after you have to be careful for X days’, so I assumed they are (somewhat) related.
To the best of my knowledge there are four evil inaccurate but not-completely-moronic reasons for sticking with a 2-dose vaccination plan. Just to be clear: none of these arguments convincingly suggest that 2-dose will be a better method to combat the pandemic.
Many officials may be convinced that “no Proper Scientific Procedure has investigated this” is identical to “there is no knowledge”. In non-pandemic times, if you squint juuust right, this looks like a cost-benefit analysis of delaying medical research versus endorsing crackpot pharmaceutics. I find it more than plausible that many people (and certainly most bureaucracies) are not capable of adjusting this argument to a pandemic. In their defense, you have to be somewhat of an expert in the field to make the cost-benefit assessment on a case-by-case basis (even though it is obvious in this case).
Are there legal/reputational risks to publicly supporting 1-dose vaccines before the Medical Establishment has given it a seal of approval? This would explain why nobody blinked now that they are the norm—people were simply waiting for some agency to accept the blame if in hindsight it turned out to be a mistake.
80% is noticeably lower than 95%, so you can expect about 4 times as many thrillseekers to take the vaccine, go to the local mall, lick every object they can find and come down with something terrible. It could even be COVID. This is awful for public perception of the vaccine. Or, taking less of an extreme, people might risk-compensate to the point where 2x80% is not as much better than 1x95% as naive math might suggest (although I fail to see how it could ever close the gap. People aren’t compensating that much.… right?).
At certain points during the distribution it is conceivable that increasing the immunity in a particularly vulnerable subgroup of the population from 80% to 95% might have a higher impact (on the death toll, medical systems, you name it) than increasing the immunity of an arbitrary selected subgroup of the remainder of the population from 0% to 80%. This chance is bigger if you instituted some messed up prioritization on your subgroups in the first place (see: everywhere).
Anyway, the case for 1-dose is overwhelming. I just wanted to point out how otherwise intelligent people might get this question so incredibly wrong, seeing as I’ve run into shades of all four of these arguments in the past.