My previous go-to for understanding why we didn’t adopt nuclear power on a massive scale is https://rootsofprogress.org/devanney-on-the-nuclear-flop (even citing some of the same sources and using the same charts). Note that the post summarizes Devanney’s book, and the post author does not necessarily agree with the conclusion of the book.
Devanney places a lot of the blame with regulators, in particular the Linear No Threshold model, ALARA legislation and regulator incentives. Do you think this is inaccurate and/or overblown?
If your colleagues are regularly giving unrealistically optimistic estimates, and you are judged worse for giving realistic estimates, clearly your superiors don’t care for the accuracy of the estimates all that much. You’re trying to play a fair game in a situation where you will be rewarded for doing the opposite of that.
Personally I’ve had good mileage out of offering to lie to the people asking for estimates. When asked for estimates during a sprint, or the likes, and if I sufficiently trust the people involved I would say something like “You are asking us to do X, which I think will take 2 months. My colleagues are giving estimates of 2-3 weeks, but the previous times they gave estimates like that the project took 6-10 weeks. I’m committed to the project, and if you want to hear that we can do it in 3 weeks I’m happy to tell you that, but I don’t think we will finish it within 2 months.”
If after that you still find you are being punished for giving realistic estimates, consider not telling the truth?
A shot in the dark, but the Malthusian theory of population suggests war is beneficial to local officials and leaders when they think the younger generation is growing at a sufficiently rapid pace that they are about to be replaced (‘vent the testosterone’, so to speak). The absence of such a growth spike is a mark against this explanation.
More generously: if the birth rate is below replacement, losing young people in a war has drastic consequences for the population ~20 years from now, since it will at least for a while drop far below replacement. If the birth rate is higher the consequences of losing a fraction of your youngest people are, in the long run, less severe.
The first example seems to be an issue of legibility, not fungibility.
I think the section on Don’t Look Up, in particular the comments on the relationship between science and policy, misses the mark in very important ways. The naive model of [science discovers how the world works] --> [policymakers use this to make policy to improve the world (for themselves, or their constituents, or everybody, or whatever)] does not give enough weight to the reverse action—where the policy is fixed, and the science that supports it is promoted until the policy is Scientific(TM). I think most science-that-determines-policy is selected this way (regardless of the intentions of the scientists involved).
If I remember correctly you’ve mentioned this in a previous COVID post, where you recall that big scientific organisations are subject to all kinds of incentives and constraints, of which “tell the truth, the whole truth and nothing but the truth” is regrettably only one among many. I find this a much more productive lens through which to view policy debates compared to being endlessly frustrated that the Real Science with Actual Answers is not involved in the process.
Looks like both Paxlovid and molnupiravir are set to receive FDA approval soon.
The link on severity of Omicron infections (https://www.nrk.no/urix/tall-fra-danmark_-omikron-forer-til-like-mange-innleggelser-som-delta-1.15769977) raises an interesting question. They deduce the severity by comparing the number of hospitalisations from Omicron with the spread of the variant 5 to 6 days prior to hospitalisation, which is the correct thing to do if we assume it takes 5 days from infection to developing symptoms severe enough to be admitted to the hospital. My two questions:
Are other news sources doing this consistently as well? If they are comparing hospitalisations today with case numbers today that gives the wrong answer by approximately two doubling times, underestimating severity.
Is there a reason to believe Omicron may develop symptoms slower/faster than older variants, so that we need to correct these figures?
Since the doubling time of Omicron is so short this has serious knock-on effects.
As an aside, the link does not claim anywhere the patients were admitted for Omicron, just that they were admitted, and diagnosed with Omicron. So for now I’m withholding updating on this.
The paper seems to describe the Delta variant and classify its properties compared to older strains. I’m not an expert so I might well be misunderstanding it, but that paper seems to classify and compare two wild strains, not modify them? Maybe I’m missing something, but what is the relation with omicron?
I thought the fact that South Africa does far more sequencing than other countries in that part of the world (for example, check the reported Delta sequences by country, where South Africa is listed as 25th globally with 11,004 sequenced samples, and the next sub-Saharan country seems to be Nigeria, 49th, 2,075 sequenced Delta samples) is more than sufficient to explain away the surprise that they noticed it first. The fact that they also have a laboratory for virus research is hardly a coincidence.
As far as I know there is insufficient evidence to assume omicron is lab-created, as opposed to, for example, reverse zoonosis or long development time inside a person with a compromised immune response. But even conditional on omicron being lab-created, what reason is there to assume it originates from a lab in South Africa? The twitter threat does not seem to provide an answer.
Would you prefer that the FDA involves itself over that it stands by the sideline?
This seems correct to me, but I don’t immediately see the importance/relevancy of this? At any rate the escape is speculative at this point.
There have also been a dozen or so instances when new variants dominated some country that subsequently fizzled out
I completely failed to notice this, whoops. Do you have some more information on this?
The Dutch festival actually was a 2-day event with a total capacity of 10,000 people per day. But it is reasonable to assume that some amount of people attend the first and then the second day, so the total number of participants is lower than 20,000 and correspondingly the rate of infection is unknown but somewhere between 5% and 10%.
Just wanted to confirm you have accurately described my thoughts, and I feel I have a better understanding of your position as well now.
I agree with your reading of my points 1,2,4 and 5 but think we are not seeing eye to eye on points 3 and 6. It also saddens me that you condensed the paragraph on how I would like to view the how-much-should-we-trust-science landscape to its least important sentence (point 4), at least from my point of view.
As for point 3, I do not want to make a general point about the reliability of science at all. I want to discuss what tools we have to evaluate the accuracy of any particular paper or claim, so that we can have more appropriate confidence across the board. I think this is the most important discussion regardless of whether it increases or decreases general confidence. In my opinion, attempting to give a 0th-order summary by discussing the average change in confidence from this approach is doing more harm than good. The sentence “You just want to make the general point that you can’t trust everything you read, with the background understanding that sometimes this is more important, and sometimes less.” is exactly backwards from what I am trying to say.
For point 6, I think it might be very relevant to point out that I’m European, and the anti-vax and global warming denialism really is not that popular around where I live. They are more considered stereotypes of being untrustworthy than properly held beliefs, thankfully. But ignoring that, I think that most of the people influencing social policy and making important decisions are leaning heavily on science, and unfortunately particularly on the types of science I have the lowest confidence in. I was hoping to avoid going into great detail on this, but as short summary I think it is reasonable to be less concerned with the accuracy of papers that have low (societal) impact and more concerned with papers that have high impact. If you randomly sample a published paper on Google Scholar or whatever I’ll happily agree that you are likely to find an accurate piece of research. But this is not an accurate representation of how people encounter scientific studies in reality. I see people break the fourth virtue all the way from coffeehouse discussions to national policy debates, which is so effective precisely because the link between data and conclusion is murky. So a lot of policy proposals can be backed by some amount of references. Over the past few years my attempts to be more even have led me to strongly decrease my confidence in a large number of scientific studies, if only to account for the selection effect that these, and not others, were brought to my attention.
Also I think psychology and nutrition are doing a lot better than they were a decade or two ago, which I consider a great sign. But that’s more of an aside than a real point.
I’ve upvoted you for the clear presentation. Most of the points you state are beliefs I held several years ago, and sounded perfectly reasonable to me. However, over time the track record of this view worsened and worsened, to the point where I now disagree not so much on the object level as with the assumption that this view is valuable to have. I hope you’ll bear with me as I try to give explaining this a shot.
I think the first, major point of disagreement is that the target audience of a paper like this is the “level 1” readers. To me it seems like the target audience consists of scientists and science fans, most of whom already have a lot of faith in the accuracy of the scientific process. It is completely true that showing this piece to someone who has managed to work their way into an unreasonable belief can make it harder to escape that particular trap, but unfortunately that doesn’t make it wrong. That’s the valley of bad rationality and all that. In fact, I think that strongly supports my main original claim—there are so many ways of using sophisticated arguments to get to a wrong conclusion, and only one way to accurately tally up the evidence, that it takes skill and dedication to get to the right answer consistently.
I’m sorry to hear about your friend, and by all means try to keep them away from posts like this. If I understand correctly, you are roughly saying “Science is difficult and not always accurate, but posts like this overshoot on the skepticism. There is some value in trusting published peer-reviewed science over the alternatives, and this view is heavily underrepresented in this community. We need to acknowledge this to dodge the most critical of errors, and only then look for more nuanced views on when to place exactly how much faith in the statements researchers make.” I hope I’m not misrepresenting your view here, this is a statement I used to believe sincerely. And I still think that science has great value, and published research is the most accurate source of information out there. But I no longer believe that this “level 2 view”, extrapolating (always dangerous :P) from your naming scheme, is a productive viewpoint. I think the nuance that I would like to introduce is absolutely essential, and that conflating different fields of research or even research questions within a field under this umbrella does more harm than good. In other words, I would like to discuss the accuracy of modern science with the understanding that this may apply to smaller or larger degree to any particular paper, exactly proportional to the hypothetical universe-separating ability of the data I introduced earlier. I’m not sure if I should spell that out in great detail every couple of sentences to communicate that I am not blanket arguing against science, but rather comparing science-as-practiced with truthfinding-in-theory and looking for similarities and differences on a paper-by-paper basis.
Most critically, I think the image of ‘overshooting’ or ‘undershooting’ trust in papers in particular or science in general is damaging to the discussion. Evaluating the accuracy of inferences is a multi-faceted problem. In some sense, I feel like you are pointing out that if we are walking in a how-much-should-I-trust-science landscape, to a lot of people the message “it’s really not all it’s cracked up to be” would be moving further away from the ideal point. And I agree. But simultaneously, I do not know of a way to get close (not “help the average person get a bit closer”, but get really close) to the ideal point without diving into this nuance. I would really like to discuss in detail what methods we have for evaluating the hard work of scientists to the best of our ability. And if some of that, taken out of context, forms an argument in the arsenal of people determined to metaphorically shoot their own foot off that is a tragedy but I would still like to have the discussion.
As an example, in your quote block I love the first paragraph but think the other 4 are somewhere between irrelevant and misleading. Yes, this discussion will not be a panacea to the replication crisis, and yes, without prior experience comparing crackpots to good sources you may well go astray on many issues. Despite all that, I would still really like to discuss how to evaluate modern science. And personally I believe that we are collectively giving it more credit than it deserves, which is spread in complicated ways between individual claims, research topics and entire fields of science.
That is very interesting, mostly because I do exactly think that people are putting too much faith in textbook science. I’m also a little bit uncomfortable with the suggested classification.
I have high confidence in claims that I think are at low risk of being falsified soon, not because it is settled science but because this sentence is a tautology. The causality runs the other way: if our confidence in the claim is high, we provisionally accept it as knowledge.
By contrast, I am worried about the social process of claims moving from unsettled to settled science. In my personal opinion there is an abundance of overconfidence in what we would call “settled science”. The majority of the claims therein are likely to be correct and hold up under scrutiny, but the bar is still lower than I would prefer.
But maybe I’m way off the mark here, or maybe we are splitting hairs and describing the same situation from a different angle. There is lots of good science out there, and you need overwhelming evidence to justify questioning a standard textbook. But there is also plenty of junk that makes it all the way into lecture halls, never mind all the previous hoops it had to pass through to get there. I am very worried about the statistical power of our scientific institutes in separating truth from fiction, and I don’t think the settled/unsettled distinction helps address this.
It seems to me that we should be really careful before extrapolating from the specific datasets, methods, and subfields these researchers are investigating into others. In particular, I’d like to see some care put into forecasting and selecting research topics that are likely or unlikely to stand up to a multiteam analysis.
I think this is good advice, but only when taken literally. In my opinion there is more than sufficient evidence to suggest that the choices made by researchers (pick any of the descriptions you cited) have a significant impact on the conclusions of papers across a wide variety of fields. Indeed, I think this should be the default assumption until proven otherwise. I’d motivate this primarily by the argument that there are many different ways to draw a wrong conclusion (especially under uncertainty), but only one right way to weigh up all the evidence. Put differently, I think undue influence of arbitrary decisions is the default, and it is only through hard work and collective scientific standards that we stand a chance of avoiding this.
I’ve seen calls to improve all the things that are broken right now: <list>
I think this is a flaw in and of itself. There are many, many ways to go wrong, and the entire standard list (p-hacking, selective reporting, multiple stopping criteria, you name it) should be interpreted more as symptoms than as causes of a scientific crisis.
The crux of the whole scientific approach is that you empirically separate hypothetical universes. You do this by making your universe-hypotheses spit out predictions, and then verify them. It seems to me that by and large this process is ignored or even completely absent when we start asking difficult soft science questions. And to clarify: I don’t particularly blame any researcher, or institute, or publishing agency or peer doing some reviewing. I think that the task at hand is so inhumanly difficult that collectively we are not up to it, and instead we create some semblance of science and call it a day.
From a distanced perspective, I would like my entire scientific process to look like reverse-engineering a big black box labeled ‘universe’. It has input buttons and output channels. Our paradigm postulate correlations between input settings and outputs, and then an individual hypothesis makes a claim about the input settings. We track forward what outputs would be caused by any possible input setting, observe the reality, and update with Bayesian odds ratios.
The problem is frequently that the data we are relying on is influenced by an absolutely gargantuan number of factors—as an example in the OP, the teenage pregnancy rate. I have no trouble believing that statewide schooling laws have some impact on this, but possibly so do for example above-average summer weather, people’s religious background, the ratio of boys to girls in a community, economic (in)stability, recent natural disasters and many more factors. So having observed the teenage pregnancy rates, inferring the impact of the statewide schooling laws is a nigh impossible task. Even just trying to put this into words my mind immediately translated this to “what fraction of the state-by-state variance in teenage pregnancy rates can be attributed to this factor, and what fraction to other factors” but even this is already an oversimplification—why are we comparing states at a fixed time, instead of tracking states over time, or even taking each state-time snapshot as an individual dataset? And why is a linear correlation model accurate, who says we can split the multi-factor model into additive components (implied by the fractions)?
The point I am failing to make is that in this case it is not at all clear what difference in the pregnancy rates we would observe if the statewide schooling laws had a decidedly negative, small negative, small positive or decidedly positive impact, as opposed to one or several of the other factors dominating the observed effects. And without that causal connection we can never infer the impact of these laws from the observed data. This is not a matter of p-hacking or biased science or anything of the sort—the approach doesn’t have the (information theoretic) power to discern the answer we are looking for in the first place, i.e. to single out the true hypothesis from between the false ones.
As for your pragmatic question, how can we tell if a study is to be trusted? I’d recommend asking experts in your field first, and only listening to cynics second. If you insist on asking, my method is to evaluate whether or not it seems plausible to me that, assuming that the conclusion of the paper holds, this would show up as the announced effect observed in the paper. Simultaneously I try to think of several other explanations for the same data. If either of these tries gives some resounding result I tend to chuck the study in the bin. This approach is fraught with confirmation bias (“it seems implausible to me because my view of the world suggests you shouldn’t be able to measure an effect like this”), but I don’t have a better model of the world to consult than my model of the world.
Thank you for the wonderful links, I had no idea that (meta)research like this was being conducted. Of course it doesn’t do to draw conclusion from just one or two papers like that, we would need a bunch more to be sure that we really need a bunch more before we can accept the conclusion.
Jokes aside, I think there is a big unwarranted leap in the final part of your post. You correctly state that just because the outcome of research seems to not replicate we should not assume evil intent (subconscious or no) on the part of the authors. I agree, but also frankly I don’t care. The version of Science Nihilism you present almost seems like strawman Nihilism: “Science does not replicate therefore everything a Scientist says is just their own bias”. I think a far more interesting statement would be “The fact that multiple well-meaning scientists get diametrically opposed results using the same data and techniques, which are well-accepted in the field, shows that the current standards in Science are insufficient to draw the type of conclusions we want.”
Or, from a more information-theoretic point of view, our process of honest effort by scientists followed by peer review and publication is not a sufficiently sharp tool to assign numbers to the questions we’re asking, and a large part of the variance in the published results is indicative of numerous small choices by the researchers instead of indicative of patterns in the data. Whether or not scientists are evil shills with social agendas (hint: they’re mostly not) is somewhat irrelevant if the methods used won’t separate truth from fiction. To me that’s proper Science Nihilism, none of this ‘intent’ or ‘bias’ stuff.
In a similar vein I wonder if the page count of the robustness check is really an indication of a solution to this problem. The alternative seems bleak (well, you did call it Nihilism) but maybe we should allow for the possibility that the entire scientific process as commonly practiced is insufficiently powerful to answer these research questions (for example, maybe the questions are ill-posed). To put it differently, to answer a research question we need to separate the hypothetical universes where it has one answer from the hypothetical universes where it has a different answer, and then observe data to decide which universe we happen to be in. In many papers this link between the separation and the data to be observed is so strenuous that I would be surprised if the outcome was determined by anything but the arbitrary choices of the researchers.
What do you mean ‘problem’? Everybody involved wants the inspection to go well, the correlation between the outcome of the inspection and the quality of the school/firm’s books is incidental at best.