niplav comments on Range and Forecasting Accuracy

niplav 19 Dec 2020 13:17 UTC
1 point
Okay, I finally had some time to look at your feedback.

The problem is, as you said, my attempt to bucket predictions together after range. This removes data, and makes my analysis much more complicated than it needs to be.

I thought that bucketing was a good idea because I was not sure how meaningful a brier score on only one forecast & outcome variable is (I didn’t have a very clear idea of why that should be the case, and didn’t question that intuition).

Let’s say I have my datasets $f_{i}$ (predictions), $o_{i}$ (outcomes) and $r_{i}$ (ranges), $i \in 1.. n$ .

Then your analysis is calculating $cor ((o_{i} - f_{i})^{2}, r_{i})$ . I introduced a partition variable $p_{j}$ ( $j \in 1.. m$ ) and calculated $cor ((brier (o_{p_{j}} . . o_{p_{j + 1} - 1}, f_{p_{j}} . . f_{p_{i + 1} - 1}) | \forall j \in 1.. n), (avg (r_{p_{j}} . . o_{p_{j + 1} - 1} | \forall j \in 1.. n))$ .

This throws away information: if one makes $p_{1} = 1$ and $p_{2} = n$ , then one gets one brier score (of all forecasts & outcomes), and the average of all ranges, which results in a correlation of 1 (I haven’t proven that partitioning more roughly loses data monotonically, but it seems intuitively true to me).

If I repeat your analysis, I get the results you got.

Basically, I believe my text lacks internal validity, but still has construct validity.

Starting from here, I will probably rewrite large parts of the text (and the code, maybe even in a more understandable language) and apply your analysis by removing the bucketing of data.
- NunoSempere 26 Dec 2020 18:06 UTC
  1 point
  Parent
  Cool. Once you rewrite that, and if you do so before the end of the year, I’d encourage you to resubmit it to this contest.
  In particular, the reason I’m excited about this kind of work is because it allows us to have at least some information about how accurate long-term predictions can be. Some previous work on this has been done, e.g., rating Kurzweil’s predictions from the 90s but overall we have very little information about this kind of thing. And yet we are interested in seeing how good we can be at making predictions n years out, and potentially making decisions based on that.