Response to: When (Not) To Use Probabilities
“It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought.” —E. T. Jaynes
The uncertainty due to vague (non math) language is no different than uncertainty by way of “randomizing” something (after all, probability is in the mind). The principle still holds; you should be able to come up with a better way of doing things if you can put in the extra thought.
In some cases, you can’t afford to waste time or it’s not worth the thought, but when dealing with things such as the deciding whether to run the LHC or signing up for cryonics, there’s time, and it’s sorta a big deal, so it pays to do it right.
If you’re asked “how likely is X?”, you can answer “very unlikely” or “0.127%”. The latter may give the impression that the probability is known more precisely than it is, but the first is too vague; both strategies do poorly on the log score.
If you are unsure what probability to state, state this with… another probability distribution.
“My probability distribution over probabilities is an exponential with a mean of 0.127%” isn’t vague, it isn’t overconfident (at the meta^1 level), and gives you numbers to actually bet on.
The expectation value of the metaprobability distribution (integral from 0 to 1 of Pmeta*p*dp) is equal to the probability you give when trying to maximize your expected log score .
To see this, we write out the expected log score (Integral from 0 to 1 of Pmeta*(p*log(q)+(1-p)log(1-q))dp). If you split this into two integrals and pull out the terms that are independent of p, the integrals just turn into the expectation value of p, and the formula is now that of the log score with p replaced with mean(p). We already know that the log score is maximized when q = p, so in this case we set q = mean(p)
This is a very useful result when dealing with extremes where we are not well calibrated. Instead of punting and saying “err… prolly aint gonna happen”, put a probability distribution on your probability distribution and take the mean. For example, if you think X is true, but you don’t know if you’re 99% sure or 99.999% sure, you’ve got to bet at ~99.5%.
It is still no guarantee that you’ll be right 99.5% of times (by assumption we’re not calibrated!), but you can’t do any better given your metaprobability distribution.
You’re not saying “99.5% of the time I’m this confident, I’m right”. You’re just saying “I expect my log score to be maximized if I bet on 99.5%”. The former implies the latter, but the latter does not (necessarily) imply the former.
This method is much more informative than “almost sure”, and gives you numbers to act on when it comes time to “shut up and multiply”. Your first set of numbers may not have “come from numbers”, but the ones you quote now do, which is an improvement. Theoretically this could be taken up a few steps of meta, but once is probably enough.
Note: Anna Salamon’s comment makes this same point.