[Question] Is there an equivalent of the CDF for grading predictions?

When I see people grading their predictions, it’s always by: (a) bucketing their predictions by probability (into a “46-55%” bucket, a “56-75%” bucket, …), and then (b) plotting each bucket’s nominal probability vs empirical frequency-of-correctness. See e.g. Scott Alexander here.

This seems… fine… but the bucketing step has a certain inelegance to it: just as you can build many different-looking histograms for the same dataset, you can build many different-looking calibration curves for the same predictions, based on a semi-arbitrary choice of bucketing algorithm. Also, by bucketing datapoints together and then aggregating over the bucket, information is destroyed.

For histograms, there’s an information-preserving, zero-degree-of-freedom alternative: the CDF. The CDF isn’t perfect, but it at least has a different set of problems from histograms.

Is there any similar tool for grading predictions?

No comments.