The Prediction Pyramid: Why Fundamental Work is Needed for Prediction Work

Epistemic state: I feel like this post makes fairly intuitive claims, but I have uncertainty on many of the specifics.

In data science, it is a common mistake for organizations to focus on specific exciting parts like machine learning and data visualizations, while overlooking the infrastructural concerns required to allow for such things. There have been several attempts at making pyramids to showcase the necessary data science dependencies in order to make the most accessible parts realizable.

Similar could be said for predictions. Predictions require foundational work in order to be possible and effective. We can use the prediction pyramid below to show this dependency.

pyramid

Evaluations

People beginning a prediction practice quickly run into the challenge of having well-specified questions. It’s not enough to ask who will win a sports game, one needs to clarify how every exceptional situation is to be handled.[1]

Question specification is a big part of Metaculus. Often questions carry significant discussion even after a question is posed in order to discuss possible edge cases.

In addition to question specification, evaluations can be costly to perform. Even in simple cases it still requires manual work. In more complex cases evaluations could take a very long time. GiveWell does charity evaluations, and these can be extensive. This document discusses some other kinds of evaluations, in the “Possible Uses” section.

Ontologies

Say one is trying to determine which diseases will be important to worry about in 2025. One would first need a taxonomy of diseases that will not change until after 2025. If they were to somehow use a poor or an unusual taxonomy, resulting information wouldn’t be useful to others.

In the case of diseases, decades of research years have been carried out in order to establish pragmatic and popular taxonomies. In other domains, new ontologies would need to be developed. Note that we consider ontologies to be a superset of taxonomies.

Another example: the usefulness of careers. 80,000 Hours is an expert here. They have a system which splits career paths into several distinct domains, and rates each one using six distinct attributes. They then do evaluations for each combination.

If it were assured they would continue to do so in the future, it would be relatively straightforward to forecast their future evaluations. If one wanted to do similar predictions without their work, one would have to come up with their own foundational thinking, ontologies, and evaluations.

Other concrete examples of ontologies, for concreteness:

The “Importance, Neglectedness, Tractability” framework for evaluating charity effectiveness
Nick Bostrom’s typology of information hazards, categorizing them by types and subtypes of information transfer mode and effect
Nick Bostrom’s definition of the vulnerable world hypothesis, including the “semi-anarchic default condition” consisting of limited capacity for preventive policing and global governance, and diverse motivations of actors
“Posts on LessWrong” are already discrete, and would represent a taxonomy

Foundational Understanding

Even before worrying about predictions or ontologies, it’s important to have good foundational understandings of topics in question. An ancient Greek scholar believing in Greek Mythology may spend a lot of time creating ontologies around the gods, but this would be a poor foundation for pragmatic work.

In the case of GiveWell, it took some specific philosophical understanding to decide that charity effectiveness was an important thing to optimize for. Later they came up with the “Importance, Neglectedness, Tractability” framework based on this understanding.

Implications

Predictions are most effective within a cluster of other specific tools.

For predictions to be useful, several other things need to go well, and thus they are also worth paying attention to. Discussions about “doing great predictions” should often include information on these other aspects. The equivalent in data science would be to recognize the importance and challenges of fundamental issues like data warehousing when discussing the eventual goal of data visualization.

Areas with existing substantial fundamental work should be easy to add predictions to.

There are many kinds of data which are already categorized and evaluated; in these cases, the predictions can be quite straightforward. For instance, the “winner of the next presidential election” seems obviously important and will be decided by existing parties, so is a very accessible candidate for forecasting.

It could be good to make lists of metrics and data sources that will be both interesting and reliably provided in the future. For example, it’s very likely that Wikidata will continue to report on the GDP and population of different countries, at least for the next 5-10 years. Setting up predictions on such variables should be very feasible.

There could be useful foundational non-predictive work to help future predictions.

One could imagine many useful projects and organizations that focus on just doing a good job on the foundational work, with the goal of assisting predictions down the line. For example, an organization could be set up just to evaluate important future variables. While this organization wouldn’t do forecasting itself, it would be very easy for other forecasting efforts to amplify this organization by forecasting its future evaluations. Currently, this is one accidental benefit of some organizations, but if it were intentional then evaluations could be better optimized for prediction support.

Possible Pyramid Modifications

The above pyramid was selected to be a simple demonstration to explain the above implications. In data science, several different pyramids have been made for different circumstances. Similarly, we can imagine multiple variations of this pyramid for other use cases.

“Aggregations” may make sense on top of predictions. It could be possible for some sites to list predictions and others to aggregate them. There are already sites exist to do nothing except for aggregation. Predictwise is one example.

The foundational understanding layer in the bottom could be subdivided into many other categories. For instance, research distillation could be a valid layer.

Acknowledgements

Thanks to Jacob Lagerros for contributing many examples and details to this post, and to Ben Goldhaber and Max Daniel for providing feedback on it.

[1] One of the first markets on prediction market Augur had this exact problem, with no mention of how a sports market would resolve if the game rained out, disputed, postponed, tied, etc. (Zvi discusses this issue further in his post on prediction markets.)