Because 8.86×10−30>2.76×10−30 Person A is also a slightly better predictor than person B.

Wait, i got confused by the function you used to assign the calibration score.
It worked in that case, but it will yield higher values for those who make more ‘correct’ predictions, not those who are more calibrated.
For example, person A predicts 100 things with 60% confidence, 61 of them turns out to occur and person D predicts 100 things with 60% confidence, 60 of them turns out to occur.
Person D is more calibrated, but gets a lower score than person A, ~5.9e-30 vs ~8.86e-30 (and person E who made 100 predictions with 60 % confidence, which all turned out to be true, would score ~6.53e-21).

I have tried to add a paragraph about this, because I think it’s a good point, and it’s unlikely that you were the only one who got confused about this, Next weekend I will finish part 2 where I make a model that can track calibration independent of prediction, and in that model the 60% ^{61}⁄_{100} will have a better posterior of the calibration parameter than then 60% ^{100}⁄_{100}, though the likelihood of the ^{100}⁄_{100} will of course still be highest.

I’m looking forward to read it, because I think one of the current bottlenecks that limit how many predictions i do is that i cannot easily compare how i’m doing week after week, and i have been looking for a model that help me check how i’m doing for several predictions.

you may be disappointed, unless you make 40+ predictions per week it will be hard to compare weekly drift, the Bernoulli distribution has a much higher variance compared to the normal distribution, so the uncertainty estimate of the calibration is correspondingly wide (high uncertainty of data → high uncertainty of regression parameters). My post 3 will be a hierarchical model which may suite your needs better but it will maybe be a month before I get around to making that model.

If there are many people like you then we may try to make a hackish model that down weights older predictions as they are less predictive of your current calibration than newer predictions, but I will have to think long and hard to make than into a full Bayesian model, so I am making no promises

You are absolutely right, any framework that punishes you for being right would be bad, my point is that increasing your calibration helps a surprising amount and is much more achievable than “just git good” which is required for improving prediction.

I will try to put your point into the draft when I am off work , thanks

Wait, i got confused by the function you used to assign the calibration score. It worked in that case, but it will yield higher values for those who make more ‘correct’ predictions, not those who are more calibrated. For example, person A predicts 100 things with 60% confidence, 61 of them turns out to occur and person D predicts 100 things with 60% confidence, 60 of them turns out to occur. Person D is more calibrated, but gets a lower score than person A, ~5.9e-30 vs ~8.86e-30 (and person E who made 100 predictions with 60 % confidence, which all turned out to be true, would score ~6.53e-21).

I have tried to add a paragraph about this, because I think it’s a good point, and it’s unlikely that you were the only one who got confused about this, Next weekend I will finish part 2 where I make a model that can track calibration independent of prediction, and in that model the 60%

^{61}⁄_{100}will have a better posterior of the calibration parameter than then 60%^{100}⁄_{100}, though the likelihood of the^{100}⁄_{100}will of course still be highest.I’m looking forward to read it, because I think one of the current bottlenecks that limit how many predictions i do is that i cannot easily compare how i’m doing week after week, and i have been looking for a model that help me check how i’m doing for several predictions.

you may be disappointed, unless you make 40+ predictions per week it will be hard to compare weekly drift, the Bernoulli distribution has a much higher variance compared to the normal distribution, so the uncertainty estimate of the calibration is correspondingly wide (high uncertainty of data → high uncertainty of regression parameters). My post 3 will be a hierarchical model which may suite your needs better but it will maybe be a month before I get around to making that model.

If there are many people like you then we may try to make a hackish model that down weights older predictions as they are less predictive of your current calibration than newer predictions, but I will have to think long and hard to make than into a full Bayesian model, so I am making no promises

You are absolutely right, any framework that punishes you for being right would be bad, my point is that increasing your calibration helps a surprising amount and is much more achievable than “just git good” which is required for improving prediction.

I will try to put your point into the draft when I am off work , thanks