Davidmanheim comments on Covid 2/11: As Expected

Davidmanheim 12 Feb 2021 11:20 UTC
18 points
0
You don’t care, but if the goal is to motivate better communal predictions, giving people the incentive to do more predicting seems to make far more sense than having it normed to sum to zero, which would mean that in expectation you only gain points when you outperform the community.
- Zvi 12 Feb 2021 12:08 UTC
  3 points
  0
  Parent
  This seems to me to be very non-obvious. Do we want more low-quality low-effort predictions, or less high-quality high-effort predictions? Do we want people to go for the exact correct probability as they see it, or give a shove in the direction they feel strongly about? Do we want people to go around making the actual community prediciton to bank free points? Who will free points motivate versus demotivate? What about the question of who to trust, and whether others would update their models based on the predictions of those who are doing well? Etc.
  If I have time a post on the subject would be interesting. Curious if there are writings detailing how it works and the reasoning behind it, or if you’d like to talk about it in a video call or LW meetup, or both.
  - tenthkrige 12 Feb 2021 12:29 UTC
    17 points
    0
    Parent
    The scoring system incentivizes predicting your true credence, (gory details here).
    
    I think Metaculus rewarding participation is one of the reasons it has participation. Metaculus can discriminate good predictors from bad predictors because it has their track record (I agree this is not the same as discriminating good/bad predictions). This info is incorporated in the Metaculus prediction, which is hidden by default, but you can unlock with on-site fake currency.
    - ChristianKl 12 Feb 2021 13:09 UTC
      10 points
      0
      Parent
      I think Metaculus rewarding participation is one of the reasons it has participation.
      PredictionBook also had participation while being public about people’s Brier’s scores. I think the main reason Metaculus has more activity is that it has good curated questions.
      There’s also no reason to only have a single public metric. Being able to achieve something like the Superforcaster status on the Good Judgement Project would be valuable to motivate some people.
  - Davidmanheim 12 Feb 2021 13:51 UTC
    10 points
    0
    Parent
    There was a lesswrong post about this a while back that I can’t find right now, and I wrote a twitter thread on a related topic. I’m not involved with the reasoning behind the structure for GJP or Metaculus, so for both it’s an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn’t nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is critical to the overall forecast accuracy, and I’m not sure it’s worthwhile for them.)
    
    Given all of that, I’d be happy to chat, or even do a meetup on incentives for metrics and issues generally, but I’m not sure I have time to put together my thoughts more clearly in the next month. But I’d think Ozzie Gooen has even more to usefully say on the topic. (Thinking about it, I’d be really interested in being on or watching a panel discussion of the topic—which would probably make an interesting event.)
    - Raemon 12 Feb 2021 22:03 UTC
      12 points
      0
      Parent
      Having a meetup on this seems interesting. Will PM people.
    - Unnamed 13 Feb 2021 9:43 UTC
      2 points
      0
      Parent
      There was a lesswrong post about this a while back
      https://www.lesswrong.com/posts/tyNrj2wwHSnb4tiMk/incentive-problems-with-current-forecasting-competitions ?
- siclabomines 12 Feb 2021 12:24 UTC
  2 points
  0
  Parent
  So one should interpret the points as a measure of how useful you’ve been to the overall predictions in the platform, and not how good you should be expected to be on a specific question, right?
  - Davidmanheim 12 Feb 2021 13:38 UTC
    3 points
    0
    Parent
    Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
    
    And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.

Davidmanheim comments on Covid 2/​11: As Expected

Davidmanheim comments on Covid 2/11: As Expected