I generally do only a quick skim of post titles and open threads (edit: maybe twice a month on average; I’ll try visiting more often). I used to check LW compulsively prior to 2013, but now I think both LW and I have changed a lot and diverged from each other. No hard feelings, though.
I rarely click link posts on LW. I seldom find them interesting, but I don’t mind them as long as other LWers like them.
I mostly check LW through a desktop browser. Back in 2011–2012, I used Wei Dai’s “Power Reader” script to read all comments. I also used to rely on Dbaupp’s “scroll to new comments” script after they posted it in 2011, but these days I use Bakkot’s “comment highlight” script. (Thanks to all three of you!)
I’ve been on Metaculus a lot over the past year. It’s a prediction website focusing on science and tech (the site’s been mentioned a few times on LW, and in fact that’s how I heard of it). It’s sort of like a gamified and moderated PredictionBook. (Edit: It’s also similar to GJ Open, but IMO, Metaculus has way better questions and scoring.) It’s a more-work-less-talk kind of website, so it’s definitely not a site for general discussions.
I’ve been meaning to write an introductory post about Metaculus… I’ll get to that sometime.
Given that one of LW’s past focus was on biases, heuristics, and the Bayesian interpretation of probability, I think some of you might find it worthwhile and fun to do some real-world practice on manipulating subjective probabilities based on finding evidence. Metaculus is all about that sort of stuff, so join us! (My username there is ‘v’. I recognize a few of you, especially WhySpace, over there.) The site itself is under continual improvement and work, and I know that the admins have high ambitions for it.
Edit: By the way, this is a great post and idea. Thanks!
Is there any information on how well-calibrated the community predictions are on Metaculus? I couldn’t find anything on the site. Also, if one wanted to get into it, could you describe what your process is?
Is there any information on how well-calibrated the community predictions are on Metaculus?
Great question! Yes. There was a post on the official Metaculus blog that addressed this, though this was back in Oct 2016. In the past, they’ve also sent to subscribed users a few emails that looked at community calibration.
I’ve actually done my own analysis on this around two months ago, in private communication. Let me just copy two of the plots I created and what I said there. You might want to ignore the plots and details, and just skip to the “brief summary” at the end.
(Questions on Metaculus go through an ‘open’ phase then a ‘closed’ phase; predictions can only be made and updated while the question is open. After a question closes, it gets resolved either positive or negative once the outcome is known. I based my analysis on the 71 questions that have been resolved as of 2 months ago; there are around 100 resolved questions now.)
First, here’s a plot for the 71 final median predictions. The elements of this plot:
Of all monotonic functions, the black line is the one that, when applied to this set of median predictions, performs the best (in mean score) under every proper scoring rule given the realized outcomes. This can be interpreted as a histogram with adaptive bin widths. So for instance, the figure shows that, binned together, predictions from 14% to 45% resolved positive around 0.11 of the time. This is also the maximum-likelihood monotonic function.
The confidence bands are for the null hypothesis that the 71 predictions are all perfectly calibrated and independent, so that we can sample the distribution of counterfactual outcomes simply by treating the outcome of each prediction with credence p as an independent coin flip with probability p of positive resolution. I sampled 80,000 sets of these 71 outcomes, and built the confidence bands by computing the corresponding maximum-likelihood monotonic function for each set. The inner band is pointwise 1 sigma, whereas the outer is familywise 2 sigma. So the corner of the black line that exceeds the outer band around predictions of 45% is a p < 0.05 event under perfect calibration, and it looks to me that predictions around 30% to 40% are miscalibrated (underconfident).
The two rows of tick marks below the x-axis show the 71 predictions, with the upper green row comprising positive resolutions, and the lower red row comprising negatives.
The dotted blue line is a rough estimate of the proportion of questions resolving positive along the range of predictions, based on kernel density estimates of the distributions of predictions giving positive and negative resolutions.
Now, a plot of all 3723 final predictions on the 71 questions.
The black line is again the monotonic function that minimizes mean proper score, but with the 1% and 99% predictions removed because—as I expected—they were especially miscalibrated (overconfident) compared to nearby predictions.
The two black dots indicate the proportion of question resolving positive for 1% and 99% predictions (around 0.4 and 0.8).
I don’t have any bands indicating dispersion here because these predictions are a correlated mess that I can’t deal with. But for predictions below 20%, the deviation from the diagonal looks large enough that I think it shows miscalibration (overconfidence).
Along the x-axis I’ve plotted kernel density estimates of the predictions resolving positive (green, solid line) and negative (red, dotted line). Kernel densities were computed under log-odds with Gaussian kernels, then converted back to probabilities in [0, 1].
The blue dotted line is again a rough estimate of the proportion resolving positive, using these two density estimates.
Brief summary:
Median predictions around 30% to 40% occur less often than claimed.
User predictions below around 20% occur more often than claimed.
User predictions at 1% and 99% are obviously overconfident.
Other than these, calibration seems okay everywhere else; at least, they aren’t obviously off.
I’m very surprised that user predictions look fairly accurate around 90% and 95% (resolving positive around 0.85 and 0.90 of the time). I expected strong overconfidence like that shown by the predictions below 20%.
Also, if one wanted to get into it, could you describe what your process is?
Is there anything in particular that you want to hear about? Or would you rather have a general description of 1) how I’d suggest starting out on Metaculus, and/or 2) how I approach making and updating predictions on the site, and/or 3) something else?
(The FAQ is handy for questions about the site. It’s linked to by the ‘help’ button at the button of every page.)
Polled.
I generally do only a quick skim of post titles and open threads (edit: maybe twice a month on average; I’ll try visiting more often). I used to check LW compulsively prior to 2013, but now I think both LW and I have changed a lot and diverged from each other. No hard feelings, though.
I rarely click link posts on LW. I seldom find them interesting, but I don’t mind them as long as other LWers like them.
I mostly check LW through a desktop browser. Back in 2011–2012, I used Wei Dai’s “Power Reader” script to read all comments. I also used to rely on Dbaupp’s “scroll to new comments” script after they posted it in 2011, but these days I use Bakkot’s “comment highlight” script. (Thanks to all three of you!)
I’ve been on Metaculus a lot over the past year. It’s a prediction website focusing on science and tech (the site’s been mentioned a few times on LW, and in fact that’s how I heard of it). It’s sort of like a gamified and moderated PredictionBook. (Edit: It’s also similar to GJ Open, but IMO, Metaculus has way better questions and scoring.) It’s a more-work-less-talk kind of website, so it’s definitely not a site for general discussions.
I’ve been meaning to write an introductory post about Metaculus… I’ll get to that sometime.
Given that one of LW’s past focus was on biases, heuristics, and the Bayesian interpretation of probability, I think some of you might find it worthwhile and fun to do some real-world practice on manipulating subjective probabilities based on finding evidence. Metaculus is all about that sort of stuff, so join us! (My username there is ‘v’. I recognize a few of you, especially WhySpace, over there.) The site itself is under continual improvement and work, and I know that the admins have high ambitions for it.
Edit: By the way, this is a great post and idea. Thanks!
Is there any information on how well-calibrated the community predictions are on Metaculus? I couldn’t find anything on the site. Also, if one wanted to get into it, could you describe what your process is?
Great question! Yes. There was a post on the official Metaculus blog that addressed this, though this was back in Oct 2016. In the past, they’ve also sent to subscribed users a few emails that looked at community calibration.
I’ve actually done my own analysis on this around two months ago, in private communication. Let me just copy two of the plots I created and what I said there. You might want to ignore the plots and details, and just skip to the “brief summary” at the end.
(Questions on Metaculus go through an ‘open’ phase then a ‘closed’ phase; predictions can only be made and updated while the question is open. After a question closes, it gets resolved either positive or negative once the outcome is known. I based my analysis on the 71 questions that have been resolved as of 2 months ago; there are around 100 resolved questions now.)
Is there anything in particular that you want to hear about? Or would you rather have a general description of 1) how I’d suggest starting out on Metaculus, and/or 2) how I approach making and updating predictions on the site, and/or 3) something else?
(The FAQ is handy for questions about the site. It’s linked to by the ‘help’ button at the button of every page.)