It seems like maybe there should be an archive page for past rounds.
If I roll a 20 sided die until I roll a 1, the expected number of times I will need to roll the die is 20. Also, according to my current expectations, immediately before I roll the 1, I expect myself to expect to have to roll 20 more times. My future self will say it will take 20 more times in expectation, when in fact it will only take 1 more time. I can predict this in advance, but I can’t do anything about it.
I think everyone should spend enough time thinking about this to see why there is nothing wrong with this picture. This is what uncertainity looks like, and it had to be this way.
To me, the best content in the old Less Wrong does not fit on any of you islands. It was developing theoretical rationality.
It would probably go on on near the AI safety island, but I feel like that is not fairly representing its generality.
EDIT: I originally said you can do this for multiple choice questions, which is wrong. It only works for questions with two answers.
(In a comment, to keep top level post short.)
One cute way to do calibration for probabilities, is to construst a spinner. If you have a true/false question, you can construct a spinner which is divided up according to your probability that each answer is the correct answer.
If you were to then spin the spinner once, and win if it comes up on the correct answer, this would not incentize constructing the spinner to represent your true beliefs. The best stratege is to put all the mass on the most likely answer.
However, if you spin the spinner twice, and win if either spin lands on the correct answer, you are actually incentivized to make the spinner match your true probabilities!
One reason this game is nice is that it does not require having a correctly specified utility function that you are trying to maximize in expectation. There are only two states, win and lose, and as long as winning is prefered to losing, you should construct your spinner with your true probabilities.
Unfortunately this doesnt work for the confidence intervals, since they seem to require a score that is not bounded below.
I think you want to reward output rather than output that would not have otherwise happened.
This is similar to the fact that if you want to train calibration, you have to optimize you log score and just observe your lack of calibration as an opportunity to increase your log score.
Note: It seems easy to conflate theory with epistemic and practice with instrumental. I think this is a bad combination of buckets, and when I say theory here, I do not exclude theoretical instrumental rationality.