Another interesting this you can do is to calculate the accuracy score (Brier score—average of the Brier scores for the question), which adjusts for question difficulty. You gesture at this in your “Accuracy between questions” section.
If you do this, forecasts made further from the resolution time do worse, both in PredictionBook and in Metaculus (correlation is p<0.001, but very small). Code in R:
datapre <- read.csv("pb2.csv") ## or met2.csv
data <- datapre[datapre$range>0,]
data$brier = (data$result-data$probability)^2
accuracyscores = c() ## Lower is better, much like the Brier score.
ranges = c()
for(id in unique(data$id)){
predictions4question = (data$id == id)
briers4question = data$brier[predictions4question]
accuracyscores4question = briers4question - mean(briers4question)
ranges4question = data$range[predictions4question]
accuracyscores=c(accuracyscores,accuracyscores4question)
ranges=c(ranges, ranges4question)
}
summary(lm(accuracyscores ~ ranges))
Another interesting this you can do is to calculate the accuracy score (Brier score—average of the Brier scores for the question), which adjusts for question difficulty. You gesture at this in your “Accuracy between questions” section.
If you do this, forecasts made further from the resolution time do worse, both in PredictionBook and in Metaculus (correlation is p<0.001, but very small). Code in R: