knite comments on Grading myself on SSC’s 2020 predictions

knite 2 Mar 2021 22:16 UTC
1 point
Is the rule supposed to be symmetric around 50%? I used ln(p) - ln(.5) because Scott wrote:

“I scored these using a logarthmic scoring rule, adjusted so that guessing 50-50 always gave zero points.”

However, this doesn’t square with his second statement:

“Getting everything maximally right gives a score of about 14; guessing 50-50 for everything gives a score of 0, getting everything maximally wrong gives a score of negative infinity.”

Do you know what the correct scoring rule is?
- Bucky 3 Mar 2021 8:33 UTC
  2 points
  Parent
  It looks like you’re using the correct formula but maybe with a mistake of what the “p” in the formula means so that your scores on questions where the result was “false” are incorrect.
  I think you maybe used ln(probability put on “true”)-ln(.5) and then multiplied the result by −1 if the actual answer was false?
  The formulation Scott used was ln(probability put on the correct answer)-ln(.5)
  So for q3 for example the calculation shouldn’t be
  $(l n (0.1) - l n (0.5)) \times - 1 = 1.61$
  but should be
  $l n (1 - 0.1) - l n (0.5) = 0.59$
  - gjm 3 Mar 2021 13:54 UTC
    4 points
    Parent
    That looks right to me. If so, and if I’ve done the calculations right, the actual score should be (not +3.34 but) −1.89, just a little bit better than Bucky’s score according to Scott. (Except that #18 -- whether Scott went back to working in the office—seems to be missing; perhaps you didn’t bother predicting on that one because it seemed too Scott-specific? So comparison against others who did predict that one will be misleading unless you remove it from their score. Scott, Zvi and Bucky all lost quite a few points on #18.)
    - Bucky 3 Mar 2021 15:31 UTC
      2 points
      Parent
      Yeah, I didn’t actually answer q18 either (possibly knite maybe used my list as a basis?) for exactly that reason. Scott just put me in as the same as him for that question for the purposes of making an apples-to-apples comparison which seemed fine—no idea what I would have put if I had answered!