How can we quantify “difference to expectations of others” and include it in the score?
You’re getting this from the “refinement” part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others’ if you have better refinement (e.g. from “inside information”, or from a superior methodology), even if everyone is identically (perfectly) calibrated.
This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.
ETA: what you say about my using an assumption is not correct—I’ve only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name—the math might become more complicated, is all. I’m not making any assumptions as to the probability distribution of the actual events. The math doesn’t, either. It’s quite general.
I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way.
Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.
I’ve only been making the forecast well-specified
I think “impact here (10% confidence), no impact at that place (90% confidence)” is quite specific. It is a binary event.
You’re getting this from the “refinement” part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others’ if you have better refinement (e.g. from “inside information”, or from a superior methodology), even if everyone is identically (perfectly) calibrated.
This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.
ETA: what you say about my using an assumption is not correct—I’ve only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name—the math might become more complicated, is all. I’m not making any assumptions as to the probability distribution of the actual events. The math doesn’t, either. It’s quite general.
I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way. Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.
I think “impact here (10% confidence), no impact at that place (90% confidence)” is quite specific. It is a binary event.