Qiaochu_Yuan comments on Confidence Confusion

Qiaochu_Yuan 16 Feb 2018 2:16 UTC
8 points
0
I had exactly this argument with Critch several years ago. He was very strongly on the side of reporting all of the digits you have. I disagreed with him at the time but now I think he’s right. As outside view support, I hear that Tetlock’s superforecasters do noticeably worse if you round their probabilities off to the nearest multiple of 5 or 10, but I don’t remember where I heard this and haven’t read Superforecasting myself to corroborate.
As an inside view argument, rounding is extremely sensitive to how you choose to parameterize probabilities. Here are four options: you could choose to think in terms of probabilities, log probabilities, odds, or log odds. In each of these “coordinate systems” rounding has very different results. So mathematically it’s not a very principled thing to do to a probability.
The thing I usually do, when asked to elicit a probability, is report a probability (usually 2 sig figs) and then also a subjective sense of how easy it would be to shift that probability by giving me more evidence / allowing me more time to think. I also sometimes straight up refuse to report a probability. The thing I generally prefer to do is to share my models instead of sharing my probabilities.
I think the thought experiment is dramatically underspecified. Who are Albert and Betty reporting probabilities to, and what will those probabilities be used for?
- Unnamed 16 Feb 2018 2:28 UTC
  8 points
  0
  Parent
  Scott mentioned that fact about superforecasters in his review; from what I remember the book doesn’t add much detail beyond Scott’s summary.
  One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.
- Robert Miles 16 Feb 2018 11:39 UTC
  6 points
  0
  Parent
  Perhaps the principled way is to try representing your probability to the same number of significant figures as a probability, as a log probability, as odds, and as log odds, and then present whichever option happens to fall closest to your true estimate :p
- alkjash 16 Feb 2018 3:01 UTC
  2 points
  0
  Parent
  I think I’m most interested in the last question I posed: as a conversational default when I’m not interested in diving into models and computations, should I share all the digits or as many as my confidence allows?
  - Qiaochu_Yuan 16 Feb 2018 4:30 UTC
    5 points
    0
    Parent
    I think you should share 2 digits.
    - habryka 16 Feb 2018 5:05 UTC
      7 points
      0
      Parent
      I think you should share more digits. I sometimes say 33.5%, and experience it as meaningfully different from 34% or 33%.
      This is obviously exacerbated around the ends of the probability spectrum. I.e. there is a massive difference between 99% and 99.5%, and it seems very important to feel comfortable distinguishing between them.
      - Qiaochu_Yuan 16 Feb 2018 5:25 UTC
        5 points
        0
        Parent
        That’s fair.
        When I say 2 digits I mean 2 sig figs, so e.g. 0.05% is one digit. I think if you’re reporting a probability near 99% it makes sense to report 1 minus that probability, to 2 (or 3, or more if you have them) sig figs.
- Richard_Ngo 17 Feb 2018 3:52 UTC
  1 point
  0
  Parent
  The thing I usually do, when asked to elicit a probability, is report a probability (usually 2 sig figs) and then also a subjective sense of how easy it would be to shift that probability by giving me more evidence / allowing me more time to think.
  What is the correct technical way to summarise the latter quantity (ease of shifting), in an idealised setting?
  - Qiaochu_Yuan 17 Feb 2018 4:59 UTC
    2 points
    0
    Parent
    Uh, I dunno, something like, I currently have a belief about the probability distribution of kinds of evidence I expect to encounter in the future, and from there I can compute a probability distribution over what my posterior beliefs are after updating on that evidence, then compute some summary statistic of that distribution that measures how spread out it is. An easy setting in which this can be made completely formal is repeatedly flipping a coin of unknown bias.