Scientific breakthroughs of the year

Link post

A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn’t link to the original source; they didn’t use probabilities (or even report the sample size); they were usually credulous about preliminary findings (“...which species was it tested on?”); and they essentially never gave any sense of the magnitude or the baselines (“how much better is this treatment than the previous best?”). Speculative results were covered with the same credence as solid proofs. And highly technical fields like mathematics were rarely covered at all, regardless of their practical or intellectual importance. So he had a go at doing it himself.

This year, with Renaissance Philanthropy, we did something more systematic. So, how did the world change this year? What happened in each science? Which results are speculative and which are solid? Which are the biggest, if true?

Our collection of 201 results is here. You can filter them by field, by our best guess of the probability that they generalise, and by their impact if they do. We also include bad news (in red).

Who are we?

Just three people but we cover a few fields. Gavin has a PhD in AI and has worked in epidemiology and metascience; Lauren was a physicist and is now a development economist; Ulkar was a wet-lab biologist and is now a science writer touching many areas. For other domains we had expert help from the Big If True fellows, but mistakes are our own.

Site designed and developed by Judah.

Data fields

  • Category. e.g. AI | Medicine | Protein design

  • Evidence type: A rough classification into

    • Speculation – a hypothesis, an untested theory, an unconfirmed claim. (Imagine someone having an idea for a new cytokine against cancer.)

    • Demo – a preprint, a proof of concept, or an expensive demonstration. Imagine someone trying out the cytokine in vitro and it working.

    • RCT, etc – a serious trial with careful causal inference. Most physics crucial experiments are “RCTs” in this sense. Imagine someone running a small clinical trial for the cytokine in humans.

    • Real-world evidence – clear retrospective evidence from long-term mass deployment, and large demand actually met with large supply. (Imagine the cytokine becoming a clinical standard and getting lots of system-level evidence of effectiveness generated all the time.)

    • Established fact – e.g. an accepted or formalised mathematical proof, or something true by definition like a legal judgment. (Hard to imagine a medicine reaching this level, but perhaps future personalised, precision-targeted versions with real-time in situ monitoring would, or now, maybe a 40-year-followup Cochrane review on its effect on all-cause mortality.)

  • P(generalises). How likely we think it is to replicate or generalise, on priors. Preprint mathematics from a serious researcher is often around 90%. Preclinical in-vitro drug candidates are, notoriously, below 8%.

  • big | true. Every entry in the list is significant – scientifically, socially, or in delightfulness. But we wanted to weight them by impact. The “big if true” field is our guess at how much the entry would change the world if it generalised.

    • We use Nate Silver’s “Technology Richter Scale”, on which a 4 is something commercialisable but not category-defining (like a slightly better mousetrap), a 6 is “on the short list for technology of the year” (Post-its, VCRs), and an 8 is on the short list for technology of the century (cars, electricity, the internet).

    • This score includes not just the impact of the particular discovery, but also the consequences of the method used in the discovery. (For instance, the formalisation of the Strong Prime Number Theorem has no particular social impact — but the implications of autoformalising new mathematical results and the potential acceleration and hardening of many kinds of research warrants at least a 6 on the scale.)

    • (This quantity is on a log scale, while P(generalises) is linear. Taking the product of them, as we do, therefore amounts to placing an exponential penalty on big|true. But while it isn’t the strict expected value, this penalised product yields something closer to newsworthiness. See column N and O for the true EV.)

  • Good/​Bad. Basic: whether we expect the net first-order effect of this to be positive for the world, or not. (We don’t claim to know what the mitigations and responses will net out to.)

Our judgments of P(generalises), big | true, and Good/​Bad are avowedly subjective. If we got something wrong please tell us at gavin@arbresearch.com.

By design, the selection of results is biased towards being comprehensible to laymen, being practically useful now or soon, and against even the good kind of speculation. Interestingness is correlated with importance—but clearly there will have been many important but superficially uninteresting things that didn’t enter our search.