I wrote a post on a related topic that may or may not prove useful to you: Calibration for continuous quantities. You could extend the histogram method described therein with a score based on a frequentist test of model fit such as the Kolmogorov-Smirnov test.
It’s my position that calibration is fundamentally a frequentist quantity—barring events of epistemic probability zero, a Bayesian agent could only ever consider itself unlucky, not poorly calibrated, no matter how wrong it was.
I wrote a post on a related topic that may or may not prove useful to you: Calibration for continuous quantities. You could extend the histogram method described therein with a score based on a frequentist test of model fit such as the Kolmogorov-Smirnov test.
It’s my position that calibration is fundamentally a frequentist quantity—barring events of epistemic probability zero, a Bayesian agent could only ever consider itself unlucky, not poorly calibrated, no matter how wrong it was.
Wouldn’t an observed mismatch between assigned probability and observed probability count as Bayesian evidence towards miscalibration?