Okay, I finally had some time to look at your feedback.
The problem is, as you said, my attempt to bucket predictions together after range. This removes data, and makes my analysis much more complicated than it needs to be.
I thought that bucketing was a good idea because I was not sure how meaningful a brier score on only one forecast & outcome variable is (I didn’t have a very clear idea of why that should be the case, and didn’t question that intuition).
Let’s say I have my datasets fi (predictions), oi (outcomes) and ri (ranges), i∈1..n.
Then your analysis is calculating cor((oi−fi)2,ri). I introduced a partition variable pj (j∈1..m) and calculated cor((brier(opj..opj+1−1,fpj..fpi+1−1)|∀j∈1..n),(avg(rpj..opj+1−1|∀j∈1..n)).
This throws away information: if one makes p1=1 and p2=n, then one gets one brier score (of all forecasts & outcomes), and the average of all ranges, which results in a correlation of 1 (I haven’t proven that partitioning more roughly loses data monotonically, but it seems intuitively true to me).
If I repeat your analysis, I get the results you got.
Starting from here, I will probably rewrite large parts of the text (and the code, maybe even in a more understandable language) and apply your analysis by removing the bucketing of data.
Cool. Once you rewrite that, and if you do so before the end of the year, I’d encourage you to resubmit it to this contest.
In particular, the reason I’m excited about this kind of work is because it allows us to have at least some information about how accurate long-term predictions can be. Some previous work on this has been done, e.g., rating Kurzweil’s predictions from the 90s but overall we have very little information about this kind of thing. And yet we are interested in seeing how good we can be at making predictions n years out, and potentially making decisions based on that.
Okay, I finally had some time to look at your feedback.
The problem is, as you said, my attempt to bucket predictions together after range. This removes data, and makes my analysis much more complicated than it needs to be.
I thought that bucketing was a good idea because I was not sure how meaningful a brier score on only one forecast & outcome variable is (I didn’t have a very clear idea of why that should be the case, and didn’t question that intuition).
Let’s say I have my datasets fi (predictions), oi (outcomes) and ri (ranges), i∈1..n.
Then your analysis is calculating cor((oi−fi)2,ri). I introduced a partition variable pj (j∈1..m) and calculated cor((brier(opj..opj+1−1,fpj..fpi+1−1)|∀j∈1..n),(avg(rpj..opj+1−1|∀j∈1..n)).
This throws away information: if one makes p1=1 and p2=n, then one gets one brier score (of all forecasts & outcomes), and the average of all ranges, which results in a correlation of 1 (I haven’t proven that partitioning more roughly loses data monotonically, but it seems intuitively true to me).
If I repeat your analysis, I get the results you got.
Basically, I believe my text lacks internal validity, but still has construct validity.
Starting from here, I will probably rewrite large parts of the text (and the code, maybe even in a more understandable language) and apply your analysis by removing the bucketing of data.
Cool. Once you rewrite that, and if you do so before the end of the year, I’d encourage you to resubmit it to this contest.
In particular, the reason I’m excited about this kind of work is because it allows us to have at least some information about how accurate long-term predictions can be. Some previous work on this has been done, e.g., rating Kurzweil’s predictions from the 90s but overall we have very little information about this kind of thing. And yet we are interested in seeing how good we can be at making predictions n years out, and potentially making decisions based on that.