orthonormal comments on Dishonest Update Reporting

orthonormal 5 May 2019 23:41 UTC
22 points
0
The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob’s score equals 31*log(.25) + 28*log(.4). Then Bob’s best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.
The real-world version is remembering to dock people’s bad predictions more, the longer they persisted in them. But of course this is hard.
538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.
- Zvi 6 May 2019 12:12 UTC
  3 points
  0
  Parent
  Yes, that seems right, if it can be used as the sole criteria, and be properly normalized for the time frames and questions involved. There are big second-level Goodhart traps lying in wait if people care about this metric.