it is also true that the discontinuity is always present somewhere around 0.5 regardless of which method you use
No, you don’t always have a discontinuity. You have to throw out predictions at 0.5, but this could be a consequence of a treatment that is continuous as a function of p. You could simply weight predictions and say that those close to 0.5 count less. I don’t know if that is reasonable for your approach, but similar things are forced upon us. For example, if you want to know whether you are overconfident at 0.5+ε you need 1/ε predictions. It is not just that calibration is impossible to discern at 0.5, but it is also difficult to discern near 0.5.
Yes, thank you, I was speaking about a more narrow set of options (which we were considering).
I don’t currently have an elegant idea about how to do weighing (but I suspect that to fit in nicely, it would be most likely done by subtraction not multiplication).
No, you don’t always have a discontinuity. You have to throw out predictions at 0.5, but this could be a consequence of a treatment that is continuous as a function of p. You could simply weight predictions and say that those close to 0.5 count less. I don’t know if that is reasonable for your approach, but similar things are forced upon us. For example, if you want to know whether you are overconfident at 0.5+ε you need 1/ε predictions. It is not just that calibration is impossible to discern at 0.5, but it is also difficult to discern near 0.5.
Yes, thank you, I was speaking about a more narrow set of options (which we were considering).
I don’t currently have an elegant idea about how to do weighing (but I suspect that to fit in nicely, it would be most likely done by subtraction not multiplication).