You don’t care, but if the goal is to motivate better communal predictions, giving people the incentive to do more predicting seems to make far more sense than having it normed to sum to zero, which would mean that in expectation you only gain points when you outperform the community.
This seems to me to be very non-obvious. Do we want more low-quality low-effort predictions, or less high-quality high-effort predictions? Do we want people to go for the exact correct probability as they see it, or give a shove in the direction they feel strongly about? Do we want people to go around making the actual community prediciton to bank free points? Who will free points motivate versus demotivate? What about the question of who to trust, and whether others would update their models based on the predictions of those who are doing well? Etc.
If I have time a post on the subject would be interesting. Curious if there are writings detailing how it works and the reasoning behind it, or if you’d like to talk about it in a video call or LW meetup, or both.
The scoring system incentivizes predicting your true credence, (gory details here).
I think Metaculus rewarding participation is one of the reasons it has participation. Metaculus can discriminate good predictors from bad predictors because it has their track record (I agree this is not the same as discriminating good/bad predictions). This info is incorporated in the Metaculus prediction, which is hidden by default, but you can unlock with on-site fake currency.
I think Metaculus rewarding participation is one of the reasons it has participation.
PredictionBook also had participation while being public about people’s Brier’s scores. I think the main reason Metaculus has more activity is that it has good curated questions.
There’s also no reason to only have a single public metric. Being able to achieve something like the Superforcaster status on the Good Judgement Project would be valuable to motivate some people.
There was a lesswrong post about this a while back that I can’t find right now, and I wrote a twitter thread on a related topic. I’m not involved with the reasoning behind the structure for GJP or Metaculus, so for both it’s an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn’t nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is critical to the overall forecast accuracy, and I’m not sure it’s worthwhile for them.)
Given all of that, I’d be happy to chat, or even do a meetup on incentives for metrics and issues generally, but I’m not sure I have time to put together my thoughts more clearly in the next month. But I’d think Ozzie Gooen has even more to usefully say on the topic. (Thinking about it, I’d be really interested in being on or watching a panel discussion of the topic—which would probably make an interesting event.)
So one should interpret the points as a measure of how useful you’ve been to the overall predictions in the platform, and not how good you should be expected to be on a specific question, right?
Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.
You don’t care, but if the goal is to motivate better communal predictions, giving people the incentive to do more predicting seems to make far more sense than having it normed to sum to zero, which would mean that in expectation you only gain points when you outperform the community.
This seems to me to be very non-obvious. Do we want more low-quality low-effort predictions, or less high-quality high-effort predictions? Do we want people to go for the exact correct probability as they see it, or give a shove in the direction they feel strongly about? Do we want people to go around making the actual community prediciton to bank free points? Who will free points motivate versus demotivate? What about the question of who to trust, and whether others would update their models based on the predictions of those who are doing well? Etc.
If I have time a post on the subject would be interesting. Curious if there are writings detailing how it works and the reasoning behind it, or if you’d like to talk about it in a video call or LW meetup, or both.
The scoring system incentivizes predicting your true credence, (gory details here).
I think Metaculus rewarding participation is one of the reasons it has participation. Metaculus can discriminate good predictors from bad predictors because it has their track record (I agree this is not the same as discriminating good/bad predictions). This info is incorporated in the Metaculus prediction, which is hidden by default, but you can unlock with on-site fake currency.
PredictionBook also had participation while being public about people’s Brier’s scores. I think the main reason Metaculus has more activity is that it has good curated questions.
There’s also no reason to only have a single public metric. Being able to achieve something like the Superforcaster status on the Good Judgement Project would be valuable to motivate some people.
There was a lesswrong post about this a while back that I can’t find right now, and I wrote a twitter thread on a related topic. I’m not involved with the reasoning behind the structure for GJP or Metaculus, so for both it’s an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn’t nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is critical to the overall forecast accuracy, and I’m not sure it’s worthwhile for them.)
Given all of that, I’d be happy to chat, or even do a meetup on incentives for metrics and issues generally, but I’m not sure I have time to put together my thoughts more clearly in the next month. But I’d think Ozzie Gooen has even more to usefully say on the topic. (Thinking about it, I’d be really interested in being on or watching a panel discussion of the topic—which would probably make an interesting event.)
Having a meetup on this seems interesting. Will PM people.
https://www.lesswrong.com/posts/tyNrj2wwHSnb4tiMk/incentive-problems-with-current-forecasting-competitions ?
So one should interpret the points as a measure of how useful you’ve been to the overall predictions in the platform, and not how good you should be expected to be on a specific question, right?
Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.