The clear answer to the question posed, “do the performances of GJP participants follow a power-law distribution, such that the best 2% are significantly better than the rest” is yes—with a minor quibble, and a huge caveat. (Epistemic status: I’m very familiar with the literature, have personal experience as a superforecaster since the beginning, had discussions with Dan Gardner and the people running, have had conversations with the heads of Good Judgement Inc, etc.)
The minor quibble is identified in other comments, that it is unlikely that there is a sharp cutoff at 2%, there isn’t a discontinuity, and power law is probably the wrong term. Aside from those “minor” issues, yes, there is a clear group of people who outperformed multiple years in a row, and this groups was fairly consistent from year to year. Not only that, but the order withing that group is far more stable than chance. That clearly validates the claim that “superforcasters are a real thing.”
But the data that those people are better is based on a number of things, many of which aren’t what you would think. First, the biggest difference between top forecasters and the rest is frequency of updates and a corresponding willingness to change their minds as evidence comes in. People who invest time in trying to forecast well do better than those who don’t—to that extent, it’s a skill like most others. Second, success at forecasting is predicted by most of the things that predict success at almost everything else—intelligence, time spent, and looking for ways to improve. Some of the techniques that Good Judgement advocates for superforecasters are from people who read Kahneman and Twersky, Tetlock, and related research, and tried to apply the ideas. The things that worked were adopted—but not everything helped. Other techniques were original to the participants—for instance, explicitly comparing your estimate for a question based on different timeframes, to ensure it is a coherent and reasonable probability. (Will X happen in the next 4 months? If we changed that to one month, would be estimate be about a quarter as high? What about if it were a year? If my intuition for the answer is about the same, I need to fix that.) Ideas like this are not natural ability, they are just applying intelligence to a problem they care about.
Also, many of the poorer performers were people who didn’t continue forecasting, and their initial numbers got stale—they presumably would have updated. The best performers, on the other hand, checked the news frequently, and updated. At times, we would change a forecast once the event had / had not happened, a couple days before the question was closed, yielding a reasonably large “improvement” in our time-weighted score. This isn’t a function of being naturally better—it’s just the investment of time that helps. (This also explains a decent part of why weighting recency in aggregate scores is helpful—it removes stale forecasts.)
So in short, I’m unconvinced that superforecasters are a “real” thing, except in the sense that most people don’t try, and people who do will do better, and improve over time. Given that, however, we absolutely should rely on superforecasters to make better predictions that the rest of people—as long as they continue doing the things that make them good forecasters.
Davidmanheim
- Comparing top forecasters and domain experts by 6 Mar 2022 20:43 UTC; 205 points) (EA Forum;
- Forecasting Newsletter: November 2021 by 2 Dec 2021 21:35 UTC; 23 points) (EA Forum;
- 7 Mar 2022 16:34 UTC; 21 points) 's comment on Comparing top forecasters and domain experts by (EA Forum;
- Forecasting Newsletter: November 2021 by 2 Dec 2021 21:44 UTC; 18 points) (
One key limitation for vaccines is supply, as others have noted. That certainly doesn’t explain everything, but it does explain a lot.
This obstacle was, of course, completely foreseeable, and we proposed a simple way to deal with the problem, which we presented to policymakers and even posted on Lesswrong, by the end of April.Thus beings our story.
Unfortunately, we couldn’t get UK policymakers on board when we discussed it, and the US was doing “warp speed” and Congress wasn’t going to allocate money for a new idea.
We were told that in general policymakers wanted an idea published / peer reviewed before they’d take the idea more seriously, so we submitted a paper. At this point, as a bonus, Preprints.org refused to put the preprint online. (No, really. And they wouldn’t explain.)
We submitted it as a paper to Vaccine May 20th, and they sent it for review, we got it back mid-june, did revisions and resubmitted early July, then the journal changed its mind and said “your paper does not appear to conduct original research, thus it does not fit the criteria.” After emailing to ask what they were doing, they relented and said we could cut the length in half and re-submit as an opinion piece.
We went elsewhere, to a newer, open access, non-blinded review journal, and it was finally online in October, fully published: https://f1000research.com/articles/9-1154
I disagree with this decision, not because I think it was a bad post, but because it doesn’t seem like the type of post that leads people to a more nuanced or better view of any of the things discussed, much less a post that provided insight or better understanding of critical things in the broader world. It was enjoyable, but not what I’d like to see more of on Less Wrong.
(Note: I posted this response primarily because I saw that lots of others also disagreed with this, and think it’s worth having on the record why at least one of us did so.)
One of the negative consequences of our information policy, as we have learned, is the way it made some regular interactions with people outside of the relevant information circles more difficult than intended.
Is Leverage willing to grant a blanket exemption from the NDAs which people evidently signed, to rectify the potential ongoing harms of not having information available? If not, can you share the text of the NDAs?
I think you are not looking in the right places, as the groups of rationalists I know are doing incredibly well for themselves—tenure-track positions at major universities, promotions to senior positions in US government agencies, incredibly well paid jobs doing EA-aligned research in machine learning and AI, huge amounts of money being sent to the rationalist-sphere AI risk research agendas that people were routinely dismissing a few years ago, etc.
To evaluate this more dispassionately, however, I’d suggest looking at the people who posted high-karma posts in 2009, and seeing what the posters are doing now. I’ll try that here, but I don’t know what some of these people are doing now. They seem to be a overall high-achieving group. (But we don’t have a baseline.)
https://www.greaterwrong.com/archive/2009 - Page 1: I’m seeing Eliezer, (he seems to have done well,) Hal Finney (unfortunately deceased, but had he lived a bit longer he would have been a multi-multi millionaire for being an early bitcoin holder / developer,) Scott Alexander (I think his blog is doing well enough,) Phil Goetz - ?, Anna Salomon (helping run CFAR,) “Liron” - (?, but he’s now running https://relationshiphero.com/ and seems to have done decently as a serial entrepreneur,) Wei Dei, (A fairly big name in cryptocurrency,) cousin_it - ?, CarlShulman, doing a bunch of existential risk work with FHI and other organizations, Alicorn (now a writer and “Immortal bisexual polyamorous superbeing”), HughRistik - ?, Orthonormal (Still around, but ?), jimrandomh (James Babcock - ?), AllanCrossman, (http://allancrossman.com/ - ?) and Psychohistorian (Eitan Pechenick, Academia)
There is a strategy that is almost mentioned here, but not pursued, that I think is near-optimal—explaining your reasoning as a norm. This is the norm I have experienced in the epistemic community around forecasting. (I am involved in both Good Judgment, where I was an original participant, and have resumed work, and on Metaculus’s AI instance. Both are very similar in that regard.)
If such explanation is a norm, or even a possibility, the social credit for updated predictions will normally be apportioned based on the reasoning as much as the accuracy. And while individual brier scores are useful, forecasters who provide mediocre calibration but excellent public reasoning and evidence which others use are more valuable for an aggregate forecast than excellent forecasters who explain little or nothing.
If Bob wants social credit for his estimate in this type of community, he needs to publicly explain his model—at least in general. (This includes using intuition as an input—there are superforecasters who I update towards based purely on claims that the probability seems too low / high.) Similarly, if Bob wants credit for updating, he needs to explain his updated reasoning—including why he isn’t updating based on evidence that prompted Alice’s estimate, which would usually have been specified, or updated based on Alice’s stated model and her estimate itself. If Bob said 75% initially, but now internally updates to think 50%, it will often be easier to justify a sudden change based on an influential datapoint, rather than a smaller one using an excuse.
But first, the supply isn’t as bounded as it appears—college tuitions have been going up in state schools and schools which expanded, and their costs have also risen. And many of those state schools are growing, and are relatively prestigious. Notice that UCLA and UC Berkeley are each top 25, with 30k students each, but out of state tuition is still >$40k. And there is competition both within and between those schools—they could spend the money on research, which the professors want money for, or give the staff raises instead of wasting the money on class sizes. So there is something left to explain—why are they wasting money in this particular way, instead of wasting it on things the people putatively in charge of the schools want?
I want to apologize, and make sure there is a clear record of what I think both on the object level, and about my comment, in retrospect. (For other mistakes I made, not related to this comment, see here.)
I handled this very poorly, and wasted a significant amount of people’s time. I still think that the claims in the post were materially misleading, (and think some of the claims still are, after edits.) The authors replaced the section saying not to listen to the CDC with a very different disclaimer, which now says: “Notably we’re not saying any of the things they do recommend are bad.” I think we should have a clear norm that potentially harmful things need a much greater degree of caution than it displayed. But calling for it to be removed was stupid.
Above and beyond my initial comment, critically, I screwed up by being pissed off and responding angrily below about what I saw as an uninformed and misleading post, and continued to reply to comments without due consideration of the people involved in both the original post, and the comments. This was in part due to personal biases, and in part due to personal stress, which is not an excuse. This led to what can generously be described as a waste of valuable people’s time, at a particularly bad time. I have apologized to some of those those involved already, but wanted to do so publicly here as well.
Reviewing the arguments
I initially said the post should have been removed. I also used the term “infohazard” in a way that was alarmist—my central claim was that it was damaging and misleading, not that it was an infohazard in the global catastrophic risk sense that people assumed.
Several counterarguments and response to my claim that it should be taken down were advanced follow. I originally responded poorly, so I wanted to review them here, along with my view on the strength of the claims.
1) I should not have been a jerk.
I was dismissive and annoyed about what seemed to me to be many obvious factual errors. My attitude was a mistake. It was also stupid for a number of reasons, and at the very least I should have contacted the authors directly and privately, and been less confrontational. Again, I apologize.
2) Telling people to check with others before posting, and threatening to remove posts which were not so checked, is censorship, which is harmful in other ways.
As I mentioned above, saying the post should be removed was stupid, but I maintain, as I did then, that when a person is unsure about whether saying something is a good idea, and it is consequential enough to matter, they should ask for some outside advice. I think this should be a basic norm, one that lesswrong and the rationality community should not just recommend but where feasible, should try to enforce. I do think that there was a reasonable sense of urgency in getting the message out in this case, and that excuses some level of failure to vet the information carefully.
3) We should encourage people to say true things even when harmful, or as one person said “I’d want people to err heavily on the side of sharing information even if it might be dangerous.”
This stops short of Nietzschean honesty, but I still don’t think this holds up well. First, as I said, I think the post was misleading, so this simply does not apply. But the discussion in the comments and privately pushed on this more, and I think it’s useful to clarify what I claimed. I agree that we should not withhold information which could be important because of a vague concern, and if this post were correct, it would fall under that umbrella. However, what the post seem to me to try to do is collect misleading statements to make it clearer that a bad organization is, in fact, bad—playing level 2 regardless of truth. That seems obviously unacceptable. I do not think lying is acceptable to pursue level 2 goals in Zvi’s explanation of Simulacra, except in dire circumstances.
But the principle advocated here says to default to level 1 brutal / damaging honesty far more often than I think is advisable, not to lie. My initial impression what the the CDC was doing far better than it in fact was, and that the negative impacts were greatly under-appreciated.
I can understand why the balance of how much truth to say when the effect is damaging is critical, and think that Lesswrong’s norms are far better than those elsewhere. I agree that the bare minimum of not actively lying is insufficient, but as I said above, I disagree with others about how far to go in saying things that might be harmful because they are true.
4) We should not attempt to play political games by shielding bad organizations and ignoring or obscuring the truth in order to build trust incorrectly.
I think this is a claim that people should never play level 3. I endorse this. I agree that I was attempting to defend an institution that was doing poorly from claims that it was doing poorly, on the basis that a significant fraction of those claims were unfair. As I said above, this would be a defense. In retrospect, the organization was far worse than I thought at the time, as I realized far too late, and discussed more here. On the other hand, many of the claims were in fact misleading, and I don’t think that false attacks on bad things are OK either.
Announcements of progress tend to clump together before the major AI conferences.
There’s something deeply discouraging about being told “you’re an X% researcher, and if X>Y, then you should stay in alignment. Otherwise, do a different intervention.” No other effective/productive community does this. (Emphasis added.) I don’t know how to put this, but the vibes are deeply off.
I think this is common, actually.
We apply the logic of only taking top people to other areas. Take medicine. The cost of doing medicine badly is significant, so tons of filters exist. Don’t do well in organic chemistry? You can’t be a doctor. Low GPA? Nope. Can’t get a pretty good score on the MCAT? No again. Get into med school but can’t get an internship? Not gonna be able to practice.
It’s similar for many other high-stakes fields. The US military has a multi-decade long weeding out process to end up as a general. Most corporations effectively do the same. Academic research is brutal in similar ways. All of these systems are broken, but not because they have a filter, more because the filter works poorly, and moral mazes, etc.Alignment work that’s not good can be costly, and can easily be very net negative. But it’s currently mostly happening outside of institutions with well-defined filters. So I agree that people should probably try to improve their skills if they want to help, but they should also self filter to some extent.
An additional point worth noting is that there is tremendous social value in reducing coordination costs, but it’s nearly impossible to capture that value, so it’s very under-provided.
What does lowering coordination costs look like? Trade meetups, conferences, and similar events or locations to foster communication and coordination (Like EA and LW meetups and forums,) as well as trustworthy information sharing—which is costly to individuals and mostly benefits others. (Like Givewell, which provides analysis that doesn’t benefit itself, so it is a largely trusted broker.)
I’d be very interested in thinking about what other general strategies could exist—they seems like great targets for world optimization.
I strongly disagree. There are many domains where we have knowledge with little or no ability to conduct RCTs—geology, evolutionary theory, astronomy, etc. The models work because we have strong Bayesian evidence for them—as I understood it, this was the point of a large section of the sequences, so I’m not going to try to re-litigate that debate here.
As someone who is involved in both Metaculus and the Good judgement project, I think it’s worth noting that Zvi’s criticism of Metaculus—that points are given just for participating, so that making a community average guess gets you points—applies to Good Judgement Inc’s predictions by superforecasters in almost exactly the same way—the superforecasters are paid for a combination of participation and their performance, so that guessing the forecast median earns them money. (GJI does have a payment system for superforecasters which is more complex than this, and I probably am not allowed to talk about—but the central point remains true.)
There’s a large literature on bureaucracies, and it has a lot to say that is useful on the topic. Unfortunately, this post manages to ignore most of it. Even more unfortunately, I don’t have time to write a response in the near future.
For those looking for a more complete picture—one that at least acknowledges the fact that most bureaucracies are neither designed by individuals, nor controlled by them—I will strongly recommend James Q. Wilson’s work on the topic, much of which is captured in his book, “Bureaucracy.” I’ll also note that Niskanen’s work is an important alternative view, as is Simon’s earlier (admittedly harder to read, but very useful) work on Administrative Behavior.
Perrow’s work, “Organizational Analysis: A Sociological View” is more dated, and I wouldn’t otherwise recommend it, but it probably does the best job directly refuting the claims made here. In his first chapter, titled “Perspectives on Organizations,” he explains why it is unhelpful to view organizations just as a function of the people who make them up, or as a function of who leads them. When I have more time, I will hope to summarize those points as a response to this post.
Just want to note that I’m less happy with a lesswrong without Duncan. I very much value Duncan’s pushback against what I see as a slow decline in quality, and so I would prefer him to stay and continue doing what he’s doing. The fact that he’s being complained about makes sense, but is mostly a function of him doing something valuable. I have had a few times where I have been slapped down by Duncan, albeit in comments on his Facebook page, where it’s much clearer that his norms are operative, and I’ve been annoyed, but each of those times, despite being frustrated, I have found that I’m being pushed in the right direction and corrected for something I’m doing wrong.
I agree that it’s bad that his comments are often overly confrontational, but there’s no way to deliver constructive feedback that doesn’t involve a degree of confrontation, and I don’t see many others pushing to raise the sanity waterline. In a world where a dozen people were fighting the good fight, I’d be happy to ask him to take a break. But this isn’t that world, and it seems much better to actively promote a norm of people saying they don’t have energy or time to engage than telling Duncan (and maybe / hopefully others) not to push back when they see thinking and comments which are bad.
Even ignoring the above problem, I’m confused why it’s valuable to build up a “real tradition” among LW users, given that the wider unilateralist curse problem that our world faces can’t possibly be solved by LW users having such a tradition.
A few points.
First, I don’t think it’s clear that in the Rationalist / EA community, there is enough reinforcement of this, and I routinely see issues with people “going rogue” and unilaterally engaging in activities that others have warned them would be dangerous, net negative, etc.
Second, it’s valuable even as an exemplar; we should be able to say that there is such a community, and that they are capable of exercising at least this minimal level of restraint.
Third, I think it’s clear that in the next decade the number of people in the Rationalist-sphere that are in actual positions of (relatively significant) power will continue to grow, and we have already seen some such people emerge in government and in the world of NGOs. For AI, in particular, there are many people who have significant influence in making decisions that could significantly affect Humanity’s future. Their active (i.e. passive) participation in this seems likely to at least give them a better understanding of what is needed when they are faced with these choices.
Very much disagree—but this is as someone not in the middle of the Bay area, where the main part of this is happening. Still, I don’t think rationality works without some community.
First, I don’t think that the alternative communities that people engage with are epistemically healthy enough to allow people to do what they need to reinforce good norms for themselves.
Second, I don’t think that epistemic rationality is something that a non-community can do a good job with, because there is much too little personal reinforcement and positive vibes that people get to stick with it if everyone is going it alone.
Wait, the goal here, at least, isn’t to produce truth, it is to disseminate it. Counter-arguments are great, but this isn’t about debating the question, it’s about communicating a conclusion well.
I don’t specifically know about mental health, but I do know specific stories about financial problems being treated as security concerns—and I don’t think I need to explain how incredibly horrific it is to have an employee say to their employer that they are in financial trouble, and be told that they lost their job and income because of it.
To attempt to make this point more legible:
Standard best practice in places like the military and intelligence organizations, where lives depend on secrecy being kept from outsiders—but not insiders—is to compartmentalize and maintain “need to know.” Similarly, in information security, the best practice is to only give being security access to what they need, and granularize access to different services / data, and well as differentiating read / write / delete access. Even in regular organizations, lots of information is need-to-know—HR complaints, future budgets, estimates of profitability of a publicly traded company before quarterly reports, and so on. This is normal, and even though it’s costly, those costs are needed.
This type of granular control isn’t intended to stop internal productivity, it is to limit the extent of failures in secrecy, and attempts to exploit the system by leveraging non-public information, both of which are inevitable, since costs to prevent failures grow very quickly as the risk of failure approaches zero. For all of these reasons, the ideal is to have trustworthy people who have low but non-zero probabilities of screwing up on secrecy. Then, you ask them not to share things that are not necessary for others’ work. You only allow limited exceptions and discretion where it is useful. The alternative, of “good trustworthy people [] get to have all the secrets versus bad untrustworthy people who don’t get any,” simply doesn’t work in practice.