These divergent evaluations on big figures sugest there might not be enough consensus on the value of past work to support prediction markets like Hanson describes.
Six EA researchers I hold in high regard—Fin Moorhouse, Gavin Leech, Jaime Sevilla, Linch Zhang, Misha Yagudin, and Ozzie Gooen—each spent 1-2 hours rating the value of different pieces of research. They did this rating using a utility function extractor, an app that presents the user with pairwise comparisons and aggregates these comparisons to produce a utility function.
This method revealed a wide gap between different researchers’ conceptions of research value. Sometimes, their disagreement ranged over several orders of magnitude. Results were also inconsistent at the individual level: a test subject might find A to be x times as valuable as B, and B to be y times as valuable as C, but A to be something very different from x*y times as valuable as C.
It seems clear that individual estimates, even those of respected researchers, are likely very noisy and often inaccurate. Future research will further investigate ways to better elicit information from these people and recommend best guesses for the all-things-considered answers. It is also likely that researchers spending more time would have produced better estimates, and we could also experiment with this in the future.
Nuno Sempere’s Valuing research works by eliciting comparisons from EA researchers is the most thorough attempt I’ve seen at this, and it seems to support your guess (Jonas Moss updated the analysis with his own approach that addresses a bunch of technical problems with former):