Previously: Model Combination and Adjustment.
Very cool that you posted these quantified predictions in advance!
A few thoughts re: Scott Alexander & Rob Wiblin on prediction.
Scott wrote that “On February 20th, Tetlock’s superforecasters predicted only a 3% chance that there would be 200,000+ coronavirus cases a month later (there were).” I just want to note that while this was indeed a very failed prediction, in a sense the supers were wrong by just two days. (WHO-counted cases only reached >200k on March 18th, two days before question close.)
One interesting pre-coronavirus probabilistic forecast of global pandemic odds is this: From 2016 through Jan 1st 2020, Metaculus users made forecasts about whether there would be a large pandemic (≥100M infections or ≥10M deaths in a 12mo period) by 2026. For most of the question’s history, the median forecast was 10%-25%, and the special Metaculus aggregated forecast was around 35%. At first this sounded high to me, but then someone pointed out that 4 pandemics from the previous 100 years qualified (I didn’t double-check this), suggesting a base rate of 40% chance per decade. So the median and aggregated forecasts on Metaculus were actually lower than the naive base rate (maybe by accident, or maybe forecasters adjusted downward because we have better surveillance and mitigation tools today?), but I’m guessing still higher than the probabilities that would’ve been given by most policymakers and journalists if they were in the habit of making quantified falsifiable forecasts. Moreover, using the Tetlockian strategy of just predicting the naive base rate with minimal adjustment would’ve yielded an even more impressive in-advance prediction of the coronavirus pandemic.
More generally, the research on probabilistic forecasting makes me suspect that prediction polls/markets with highly-selected participants (e.g. via GJI or HyperMind), or perhaps even those without highly-selected participants (e.g. via GJO or Metaculus), could achieve pretty good calibration (though not necessarily resolution) on high-stakes questions (e.g. about low-probability global risks) with 2-10 year time horizons, though this has not yet been checked.
Nice post. Were there any sources besides Wikipedia that you found especially helpful when researching this post?
If the U.S. kept racing in its military capacity after WW2, the U.S. may have been able to use its negotiating leverage to stop the Soviet Union from becoming a nuclear power: halting proliferation and preventing the build up of world threatening numbers of high yield weapons.
BTW, the most thorough published examination I’ve seen of whether the U.S. could’ve done this is Quester (2000). I’ve been digging into the question in more detail and I’m still not sure whether it’s true or not (but “may” seems reasonable).
I’m very interested in this question, thanks for looking into it!
My answer from 2017 is here.
Interesting historical footnote from Louis Francini:
This issue of differing “capacities for happiness” was discussed by the classical utilitarian Francis Edgeworth in his 1881 Mathematical Psychics (pp 57-58, and especially 130-131). He doesn’t go into much detail at all, but this is the earliest discussion of which I am aware. Well, there’s also the Bentham-Mill debate about higher and lower pleasures (“It is better to be a human being dissatisfied than a pig satisfied”), but I think that may be a slightly different issue.
Cases where scientific knowledge was in fact lost and then rediscovered provide especially strong evidence about the discovery counterfactauls, e.g. Hero’s eolipile and al-Kindi’s development of relative frequency analysis for decoding messages. Probably we underestimate how common such cases are, because the knowledge of the lost discovery is itself lost — e.g. we might easily have simply not rediscovered the Antikythera mechanism.
Apparently Shelly Kagan has a book coming out soon that is (sort of?) about moral weight.
This scoring rules has some downsides from a usability standpoint. See Greenberg 2018, a whitepaper prepared as background material for a (forthcoming) calibration training app.
Some other people at Open Phil have spent more time thinking about two-envelope effects more than I have, and fwiw some of their thinking on the issue is in this post (e.g. see section 18.104.22.168).
My own take on this is described briefly here, with more detail in various appendices, e.g. here.
Yes, I meant to be describing ranges conditional on each species being moral patients at all. I previously gave my own (very made-up) probabilities for that here. Another worry to consider, though, is that many biological/cognitive and behavioral features of a species are simultaneously (1) evidence about their likelihood of moral patienthood (via consciousness), and (2) evidence about features that might affect their moral weight *given* consciousness/patienthood. So, depending on how you use that evidence, it’s important to watch out for double-counting.
I’ll skip responding to #2 for now.
For anyone who is curious, I cite much of the literature arguing over criteria for moral patienthood/weight in the footnotes of this section of my original moral patienthood report. My brief comments on why I’ve focused on consciousness thus far are here.
Cool, this looks better than I’d been expecting. Thanks for doing this! Looking forward to next round.
Hurrah failed project reports!
One of my most-used tools is very simple: an Alfred snippet that lets me paste-as-plain-text using Cmd+Opt+V.