Joe_Collman comments on Strong Evidence is Common

Joe_Collman 16 Mar 2021 22:30 UTC
24 points
0
It’s worth noting that most of the strong evidence here is in locating the hypothesis.
That doesn’t apply to the juggling example—but that’s not so much evidence. “I can juggle” might take you from 1:100 to 10:1. Still quite a bit, but 10 bits isn’t 24.
I think this relates to Donald’s point on the asymmetry between getting from exponentially small to likely (commonplace) vs getting from likely to exponentially sure (rare). Locating a hypothesis can get you the first, but not the second.
It’s even hard to get back to exponentially small chance of x once it seems plausible (this amounts to becoming exponentially sure of ¬x). E.g., if I say “My name is Mortimer Q. Snodgrass… Only kidding, it’s actually Joe Collman”, what are the odds that my name is Mortimer Q. Snodgrass? 1% perhaps, but it’s nowhere near as low as the initial prior.
The only simple way to get all the way back is to lose/throw-away the hypothesis-locating information—which you can’t do via a Bayesian update. I think that’s what makes privileging the hypothesis such a costly error: in general you can’t cleanly update your way back (if your evidence, memory and computation were perfectly reliable, you could—but they’re not). The way to get back is to find the original error and throw it out.
How difficult is it to get into the top 1% of traders? To be 50% sure you’re in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.
I don’t think your examples say much about this. They’re all of the form [trusted-in-context source] communicates [unlikely result]. They don’t seem to show a reason to expect strong evidence may be easy to get when this pattern doesn’t hold. (I suppose they do say that you should check for the pattern—and probably it is useful to occasionally be told “There may be low-hanging fruit. Look for it!”)
- johnswentworth 17 Mar 2021 17:20 UTC
  8 points
  0
  Parent
  Great comment, though I disagree with this line:
  The only simple way to get all the way back is to lose/throw-away the hypothesis-locating information—which you can’t do via a Bayesian update.
  You can definitely do this via a Bayesian update. This is exactly the “explaining away” phenomenon in causal DAGs/Bayes nets: I notice the sidewalk is wet, infer that rain is likely, but then notice the sprinkler ran, so the (Bayesian) sprinkler-update sends my chance-of-rain back down to roughly its original value.
  - Joe_Collman 17 Mar 2021 23:09 UTC
    13 points
    0
    Parent
    Sure, but what I mean is that this is hard to do for hypothesis-location, since post-update you still have the hypothesis-locating information, and there’s some chance that your “explaining away” was itself incorrect (or your memory is bad, you have bugs in your code...).
    For an extreme case, take Donald’s example, where the initial prior would be 8,000,000 bits against.
    Locating the hypothesis there gives you ~8,000,000 bits of evidence. The amount you get in an “explaining away” process is bounded by your confidence in the new evidence. How sure are you that you correctly observed and interpreted the “explaining away” evidence? Maybe you’re 20 bits sure; perhaps 40 bits sure. You’re not 8,000,000 bits sure.
    Then let’s say you’ve updated down quite a few times, but not yet close to the initial prior value. For the next update, how sure are you that the stored value that you’ll be using as your new prior is correct? If you’re human, perhaps you misremembered; if a computer system, perhaps there’s a bug...
    Below a certain point, the new probability you arrive at will be dominated by contributions from weird bugs, misrememberings etc.
    This remains true until/unless you lose the information describing the hypothesis itself.
    I’m not clear how much this is a practical problem—I agree you can update the odds of a hypothesis down to no-longer-worthy-of-consideration. In general, I don’t think you can get back to the original prior without making invalid assumptions (e.g. zero probability of a bug/hack/hoax...), or losing the information that picks out the hypothesis.