The clear answer to the question posed, “do the performances of GJP participants follow a power-law distribution, such that the best 2% are significantly better than the rest” is yes—with a minor quibble, and a huge caveat. (Epistemic status: I’m very familiar with the literature, have personal experience as a superforecaster since the beginning, had discussions with Dan Gardner and the people running, have had conversations with the heads of Good Judgement Inc, etc.)
The minor quibble is identified in other comments, that it is unlikely that there is a sharp cutoff at 2%, there isn’t a discontinuity, and power law is probably the wrong term. Aside from those “minor” issues, yes, there is a clear group of people who outperformed multiple years in a row, and this groups was fairly consistent from year to year. Not only that, but the order withing that group is far more stable than chance. That clearly validates the claim that “superforcasters are a real thing.”
But the data that those people are better is based on a number of things, many of which aren’t what you would think. First, the biggest difference between top forecasters and the rest is frequency of updates and a corresponding willingness to change their minds as evidence comes in. People who invest time in trying to forecast well do better than those who don’t—to that extent, it’s a skill like most others. Second, success at forecasting is predicted by most of the things that predict success at almost everything else—intelligence, time spent, and looking for ways to improve. Some of the techniques that Good Judgement advocates for superforecasters are from people who read Kahneman and Twersky, Tetlock, and related research, and tried to apply the ideas. The things that worked were adopted—but not everything helped. Other techniques were original to the participants—for instance, explicitly comparing your estimate for a question based on different timeframes, to ensure it is a coherent and reasonable probability. (Will X happen in the next 4 months? If we changed that to one month, would be estimate be about a quarter as high? What about if it were a year? If my intuition for the answer is about the same, I need to fix that.) Ideas like this are not natural ability, they are just applying intelligence to a problem they care about.
Also, many of the poorer performers were people who didn’t continue forecasting, and their initial numbers got stale—they presumably would have updated. The best performers, on the other hand, checked the news frequently, and updated. At times, we would change a forecast once the event had / had not happened, a couple days before the question was closed, yielding a reasonably large “improvement” in our time-weighted score. This isn’t a function of being naturally better—it’s just the investment of time that helps. (This also explains a decent part of why weighting recency in aggregate scores is helpful—it removes stale forecasts.)
So in short, I’m unconvinced that superforecasters are a “real” thing, except in the sense that most people don’t try, and people who do will do better, and improve over time. Given that, however, we absolutely should rely on superforecasters to make better predictions that the rest of people—as long as they continue doing the things that make them good forecasters.
Davidmanheim
A Personal (Interim) COVID-19 Postmortem
Modelling Transformative AI Risks (MTAIR) Project: Introduction
Public Call for Interest in Mathematical Alignment
- Comparing top forecasters and domain experts by 6 Mar 2022 20:43 UTC; 205 points) (EA Forum;
- Forecasting Newsletter: November 2021 by 2 Dec 2021 21:35 UTC; 23 points) (EA Forum;
- 7 Mar 2022 16:34 UTC; 21 points) 's comment on Comparing top forecasters and domain experts by (EA Forum;
- Forecasting Newsletter: November 2021 by 2 Dec 2021 21:44 UTC; 18 points) (
A Dozen Ways to Get More Dakka
Far-UVC Light Update: No, LEDs are not around the corner (tweetstorm)
Systems that cannot be unsafe cannot be safe
Resolutions to the Challenge of Resolving Forecasts
One key limitation for vaccines is supply, as others have noted. That certainly doesn’t explain everything, but it does explain a lot.
This obstacle was, of course, completely foreseeable, and we proposed a simple way to deal with the problem, which we presented to policymakers and even posted on Lesswrong, by the end of April.Thus beings our story.
Unfortunately, we couldn’t get UK policymakers on board when we discussed it, and the US was doing “warp speed” and Congress wasn’t going to allocate money for a new idea.
We were told that in general policymakers wanted an idea published / peer reviewed before they’d take the idea more seriously, so we submitted a paper. At this point, as a bonus, Preprints.org refused to put the preprint online. (No, really. And they wouldn’t explain.)
We submitted it as a paper to Vaccine May 20th, and they sent it for review, we got it back mid-june, did revisions and resubmitted early July, then the journal changed its mind and said “your paper does not appear to conduct original research, thus it does not fit the criteria.” After emailing to ask what they were doing, they relented and said we could cut the length in half and re-submit as an opinion piece.
We went elsewhere, to a newer, open access, non-blinded review journal, and it was finally online in October, fully published: https://f1000research.com/articles/9-1154
AI Is Not Software
Multitudinous outside views
Safe Stasis Fallacy
Update more slowly!
I disagree with this decision, not because I think it was a bad post, but because it doesn’t seem like the type of post that leads people to a more nuanced or better view of any of the things discussed, much less a post that provided insight or better understanding of critical things in the broader world. It was enjoyable, but not what I’d like to see more of on Less Wrong.
(Note: I posted this response primarily because I saw that lots of others also disagreed with this, and think it’s worth having on the record why at least one of us did so.)
One of the negative consequences of our information policy, as we have learned, is the way it made some regular interactions with people outside of the relevant information circles more difficult than intended.
Is Leverage willing to grant a blanket exemption from the NDAs which people evidently signed, to rectify the potential ongoing harms of not having information available? If not, can you share the text of the NDAs?
“Safety Culture for AI” is important, but isn’t going to be easy
I think you are not looking in the right places, as the groups of rationalists I know are doing incredibly well for themselves—tenure-track positions at major universities, promotions to senior positions in US government agencies, incredibly well paid jobs doing EA-aligned research in machine learning and AI, huge amounts of money being sent to the rationalist-sphere AI risk research agendas that people were routinely dismissing a few years ago, etc.
To evaluate this more dispassionately, however, I’d suggest looking at the people who posted high-karma posts in 2009, and seeing what the posters are doing now. I’ll try that here, but I don’t know what some of these people are doing now. They seem to be a overall high-achieving group. (But we don’t have a baseline.)
https://www.greaterwrong.com/archive/2009 - Page 1: I’m seeing Eliezer, (he seems to have done well,) Hal Finney (unfortunately deceased, but had he lived a bit longer he would have been a multi-multi millionaire for being an early bitcoin holder / developer,) Scott Alexander (I think his blog is doing well enough,) Phil Goetz - ?, Anna Salomon (helping run CFAR,) “Liron” - (?, but he’s now running https://relationshiphero.com/ and seems to have done decently as a serial entrepreneur,) Wei Dei, (A fairly big name in cryptocurrency,) cousin_it - ?, CarlShulman, doing a bunch of existential risk work with FHI and other organizations, Alicorn (now a writer and “Immortal bisexual polyamorous superbeing”), HughRistik - ?, Orthonormal (Still around, but ?), jimrandomh (James Babcock - ?), AllanCrossman, (http://allancrossman.com/ - ?) and Psychohistorian (Eitan Pechenick, Academia)
To attempt to make this point more legible:
Standard best practice in places like the military and intelligence organizations, where lives depend on secrecy being kept from outsiders—but not insiders—is to compartmentalize and maintain “need to know.” Similarly, in information security, the best practice is to only give being security access to what they need, and granularize access to different services / data, and well as differentiating read / write / delete access. Even in regular organizations, lots of information is need-to-know—HR complaints, future budgets, estimates of profitability of a publicly traded company before quarterly reports, and so on. This is normal, and even though it’s costly, those costs are needed.
This type of granular control isn’t intended to stop internal productivity, it is to limit the extent of failures in secrecy, and attempts to exploit the system by leveraging non-public information, both of which are inevitable, since costs to prevent failures grow very quickly as the risk of failure approaches zero. For all of these reasons, the ideal is to have trustworthy people who have low but non-zero probabilities of screwing up on secrecy. Then, you ask them not to share things that are not necessary for others’ work. You only allow limited exceptions and discretion where it is useful. The alternative, of “good trustworthy people [] get to have all the secrets versus bad untrustworthy people who don’t get any,” simply doesn’t work in practice.