[Question] Understanding information cascades

Meta: Because we think understanding info cascades are important, we recently spent ~10 hours trying to figure out how to quantitatively model them, and have contributed our thinking as answers below. While we currently didn’t have the time to continue exploring, we wanted to experiment with seeing how much the LW community could together build on top of our preliminary search, so we’ve put up a basic prize for more work and tried to structure the work around a couple of open questions. This is an experiment! We’re looking forward to reading any of your contributions to the topic, including things like summaries of existing literature and building out new models of the domain.


Consider the following situation:

Bob is wondering whether a certain protein injures the skeletal muscle of patients with a rare disease. He finds a handful papers with some evidence for the claim (and some with evidence against it), so he simply states the claim in his paper, with some caution, and adds that as a citation. Later, Alice comes across Bob’s paper and sees the cited claim, and she proceeds to cite Bob, but without tracing the citation trail back to the original evidence. This keeps happening, in various shapes and forms, and after a while a literature of hundreds of papers builds up where it’s common knowledge that β amyloid injures the skeletal muscle of patients with inclusion body myositis—without the claim having accumulated any more evidence. (This real example was taken from Greenberg, 2009, which is a case study of this event.)

An information-cascade occurs when people update on each others beliefs, rather than sharing the causes of those beliefs, and those beliefs end up with a vestige of support that far outstrips the evidence for them. Satvik Beri might describe this as the problem of only sharing the outputs of your thinking process, not your inputs.

The dynamics here are perhaps reminiscent of those underlying various failures of collective rationality such as asset bubbles, bystander effects and stampedes.

Note that his effect is different from other problems of collective rationality like the replication crisis, which involve low standards for evidence (such as unreasonably lax p-value thresholds or coordination problems preventing publishing of failed experiments), or the degeneracy of much online discussion, which involves tribal signalling and UI encouraging problematic selection effects. Rather, information cascades involve people rationally updating without any object-level evidence at all, and would persist even if the replication crisis and online outrage culture disappeared. If nobody lies or tells untruths, you can still be subject to an information cascade.


Ben and I are confused about how to think about the negative effects of this problem. We understand the basic idea, but aren’t sure how to reason quantitatively about the impacts, and how to trade-off solving these problems in a community versus doing other improvements to overall efficacy and efficiency of a community. We currently know only how to think about these qualitatively.

We’re posting a couple of related questions that we have some initial thoughts on, that might help clarify the problem.

If you have something you’d like to contribute, but that doesn’t seem to fit into the related questions above, leave it as an answer to this question.


We are committing to pay at least either $800 or (No. of answers and comments * $25), whichever is smaller, for work on this problem recorded on LW, done before May 13th. The prize pool will be split across comments in accordance with how valuable we find them, and we might make awards earlier than the deadline (though if you know you’ll put in work in x weeks, it would be good to mention that to one of us via PM).

Ben and Jacob are each responsible for half of the prize money.

Jacob is funding this through Metaculus AI, a new forecasting platform tracking and improving the state-of-the-art in AI forecasting, partly to help avoid info-cascades in the AI safety and policy communities (we’re currently live and inviting beta-users, you can sign-up here).

Examples of work each of us are especially excited about:


  • Contributions to our Guesstimate model (linked here), such as reducing uncertainty on the inputs or using better models.

  • Extensions of the Guesstimate model beyond biomedicine, especially in ways that make it more directly applicable to the rationality/​effective altruism communities

  • Examples and analysis of existing interventions that deal with this and what makes them work, possibly suggestions for novel ones (though avoiding the trap of optimising for good-seeming ideas)

  • Discussion of how the problem of info-cascades relates to forecasting


  • Concise summaries of relevant papers and their key contributions

  • Clear and concise explanations of what other LWers have found (e.g. turning 5 long answers into 1 medium sized answer that links back to the others while still conveying the key info. Here’s a good example of someone distilling an answer section).