So I was trying to adjust for longer term questions being easier by doing the follow:
For each question, calculate the average Brier score for available predictions
For each prediction, calculate the accuracy score as Brier score—average Brier scores of the question.
Correlate accuracy score with range. So I was trying to do that, and I thought, well, I might as well run the correlation between accuracy score and log range. But then some of the ranges are negative, which shouldn’t be the case.
Anyways, if I adjust for question difficulty, results are as you would expect; accuracy is worse the further removed the forecast is from the resolution.
Why do some forecast have negative ranges?
On Metaculus: I assume that these are forecasts on questions that resolved retroactively. Examples:
Will Iran execute or be targeted in a national military attack between 6 June 2019 and 5 October 2019?
https://www.metaculus.com/questions/3756/will-ea-global-san-francisco-be-cancelled-or-rescheduled-due-to-covid-19/
For PredictionBook: The datetime of resolution seems to be the datetime of the first attempted resolution, not the last. Example: Total deaths due to coronavirus in the Netherlands will go over >5000 by the end of April. .
I think I might change the PredictionBook data fetching script to output the datetime of the last resolution.
So I was trying to adjust for longer term questions being easier by doing the follow:
For each question, calculate the average Brier score for available predictions
For each prediction, calculate the accuracy score as Brier score—average Brier scores of the question.
Correlate accuracy score with range. So I was trying to do that, and I thought, well, I might as well run the correlation between accuracy score and log range. But then some of the ranges are negative, which shouldn’t be the case.
Anyways, if I adjust for question difficulty, results are as you would expect; accuracy is worse the further removed the forecast is from the resolution.