I think that saying that good calibration is always achievable and good discrimination is hard is wrong.
I long felt like something is off in saying that “no matter what is a coin sequence—you can always just say 50% and be well calibrated even as max entropy distribution”. It seemed for me that if the coin is falling 19⁄20 heads and you say 50⁄50 it shouldn’t be correct, well calibrated, just worse at prediction discrimination than some other probability distributions. And it may be just rationalization for human instinct to see patterns and so treat 19⁄20 as “inherently non random”. Though, when I tried to articulate my intuition I got this:
(I badly remember exact sequence, but...) Consider would be a normal distribution expected from probability 50⁄50 - it would have most variants having near 1:1 ratio between heads and tails. And 19 of 20 is not like this. One of definitions of good calibration is that thing you predict with 50% probability should happen 50% of cases. But if you consider as cases results of those 20 flips, then you will see that it is not 10 to 10 or even close to that, it is 19 to 1.
So, 50⁄50 of randomness will have terrible calibration if coin tends to fall 19 of 20. So it only works if you have some kind of coin which drops HTHTHT. But then we still have generally the same problem—if we just take different sample again, instead of switching to internal sequence cases switch now to samples of every first flip and every second flips, we will notice that 50⁄50 prediction is terribly calibrated, because you predict 50% cases where it is actually 0 or 100% cases.
According to grok, it has zero new math. And still, i watched a video of Yudkowsky from 2009, and he said that, and then in video of (?2024) he said the same. And generally I never saw on lesswrong any refutation of the idea.
What I guess from all that is that probably standard idea of calibration isn’t something special at all. It is more equavalent to one specific kind of checking market price on pattern and possible predictable arbitrage. Generally, every time you don’t have a perfect prediction, somebody can pump out money of you, by finding some pattern in your predictions you didn’t account for.
And to check for absense of any predictable disrepansies of your probability patterns you need to spend some calculations—that is true for standard calibration, and for calibration among internal cases of sequence, and for calibration among samples of each second, and for any pattern, even if it isn’t considered as calibration.
Generally. there may be a sense in saying that calibration is something which can be achieved by running tests on disrepancies on your predictions, and discrimination is something you need empirical data for. But it still needs infinite calculations. Perfect calibration is something achievable only by Solomonoff induction priors and then Solomonoff induction updates after any bit of data.
And also all that means that there is no cheat code of being humbly uncertain, but well calibrated. The only way to not have disrepancies between your probabilities and reality is to predict accurately.
That also reminds me of my old idea that maybe every mental construct should matching something in physical world. Including probabilities. If so, they aren’t measures of your personal uncertainty—their are imperfect predictions of all the frequencies of results to same sensory input as yourself in whole universe. You are objectively in some concrete universe and so your predictions are wrong and exploitable. Though, there is a way to see it as coordination between all the agents having the same sensory input as you to get the best result for sum of them. Where they don’t “update that their anthropic prediction was wrong”, but aknowledge that there is no better way to have predictions among all agents—if you would have made that one better, some other agent with the same sensory input will get worse result.
I think that saying that good calibration is always achievable and good discrimination is hard is wrong.
I long felt like something is off in saying that “no matter what is a coin sequence—you can always just say 50% and be well calibrated even as max entropy distribution”. It seemed for me that if the coin is falling 19⁄20 heads and you say 50⁄50 it shouldn’t be correct, well calibrated, just worse at prediction discrimination than some other probability distributions. And it may be just rationalization for human instinct to see patterns and so treat 19⁄20 as “inherently non random”. Though, when I tried to articulate my intuition I got this:
(I badly remember exact sequence, but...) Consider would be a normal distribution expected from probability 50⁄50 - it would have most variants having near 1:1 ratio between heads and tails. And 19 of 20 is not like this. One of definitions of good calibration is that thing you predict with 50% probability should happen 50% of cases. But if you consider as cases results of those 20 flips, then you will see that it is not 10 to 10 or even close to that, it is 19 to 1.
So, 50⁄50 of randomness will have terrible calibration if coin tends to fall 19 of 20. So it only works if you have some kind of coin which drops HTHTHT. But then we still have generally the same problem—if we just take different sample again, instead of switching to internal sequence cases switch now to samples of every first flip and every second flips, we will notice that 50⁄50 prediction is terribly calibrated, because you predict 50% cases where it is actually 0 or 100% cases.
According to grok, it has zero new math. And still, i watched a video of Yudkowsky from 2009, and he said that, and then in video of (?2024) he said the same. And generally I never saw on lesswrong any refutation of the idea.
What I guess from all that is that probably standard idea of calibration isn’t something special at all. It is more equavalent to one specific kind of checking market price on pattern and possible predictable arbitrage. Generally, every time you don’t have a perfect prediction, somebody can pump out money of you, by finding some pattern in your predictions you didn’t account for.
And to check for absense of any predictable disrepansies of your probability patterns you need to spend some calculations—that is true for standard calibration, and for calibration among internal cases of sequence, and for calibration among samples of each second, and for any pattern, even if it isn’t considered as calibration.
Generally. there may be a sense in saying that calibration is something which can be achieved by running tests on disrepancies on your predictions, and discrimination is something you need empirical data for. But it still needs infinite calculations. Perfect calibration is something achievable only by Solomonoff induction priors and then Solomonoff induction updates after any bit of data.
And also all that means that there is no cheat code of being humbly uncertain, but well calibrated. The only way to not have disrepancies between your probabilities and reality is to predict accurately.
That also reminds me of my old idea that maybe every mental construct should matching something in physical world. Including probabilities. If so, they aren’t measures of your personal uncertainty—their are imperfect predictions of all the frequencies of results to same sensory input as yourself in whole universe. You are objectively in some concrete universe and so your predictions are wrong and exploitable. Though, there is a way to see it as coordination between all the agents having the same sensory input as you to get the best result for sum of them. Where they don’t “update that their anthropic prediction was wrong”, but aknowledge that there is no better way to have predictions among all agents—if you would have made that one better, some other agent with the same sensory input will get worse result.