• I think I’ve followed the basic argument here? Let me try a couple examples, first a toy problem and then a more realistic one.

Example 1: Dice. A person rolls some fair 20-sided dice and then tells you the highest number that appeared on any of the dice. They either rolled 1 die (and told you the number on it), or 5 dice (and told you the highest of the 5 numbers), or 6 dice (and told you the highest of the 6 numbers).

For some reason you care a lot about whether there were exactly 5 dice, so you could break this down into two hypotheses:

H1: They rolled 5 dice
H2: They rolled 1 or 6 dice

Let’s say they roll and tell you that the highest number rolled was 20. This favors 5 dice over 1 die, and to a lesser degree it favors 6 dice over 5 dice. So if you started with equal (1/​3) probabilities on the 3 possibilities, you’ll update in favor of H1. Someone who also started with a 13 chance on H1, but who thought that 1 die was more likely than 6 dice, would update even more in favor of H1. And someone whose prior was that 6 dice was more likely than 1 die would update less in favor of H1, or even in the other direction if it was lopsided enough.

Relatedly, if you repeated this experiment many times and got lots of 20s, that would eventually become evidence against H1. If the 100th roll is 20, then that favors 6 dice over 5, and by that point the possibility of there being only 1 die is negligible (if the first 99 rolls were large enough) so it basically doesn’t matter that the 20 also favors 5 dice over 1. This seems like another angle on the same phenomenon, since your posterior after 99 rolls is your prior for the 100th roll (and the evidence from the first 99 rolls has made it lopsided enough so that the 20 counts as evidence against H1).

Example 2: College choice. A high school freshman hopes & expects to attend Harvard for college in a few years. One observer thinks that’s unlikely, because Harvard admissions is very selective even for very good students. Another observer thinks that’s unlikely because the student is into STEM and will probably wind up going to a more technical university like MIT; they haven’t thought much yet about choosing a college and Harvard is probably just serving as a default stand-in for a really good school.

The two observers might give the same p(Harvard), but for very different reasons. And because their models are so different, they could even update in opposite directions on the same new data. For instance, perhaps the student does really well on a math contest, and the first observer updates in favor of the student attending Harvard (that’s an impressive accomplishment, maybe they will make it past the admissions filter) while the second observer updates a bit against the student attending Harvard (yep, they’re a STEM person).

You could fit this into the “three outcomes” framing of this post, if you split “not attending Harvard” into “being rejected by Harvard” and “choosing not to attend Harvard”.

• I think your first example could be even simpler. Imagine you have a coin that’s either fair, all-heads, or all-tails. If your prior is “fair or all-heads with probability 12 each”, then seeing heads is evidence against “fair”. But if your prior is “fair or all-tails with probability 12 each”, then seeing heads is evidence for “fair”. Even though “fair” started as 12 in both cases. So the moral of the story is that there’s no such thing as evidence for or against a hypothesis, only evidence that favors one hypothesis over another.

• That’s a great explanation. Evidence may also be compatible or incompatible with a hypothesis. For instance, if I get a die (without the dots on the sides that indicate 1-6), and I instead label* it:

Red, 4, Life, X-Wing, Int, path through a tree

Then finding out I rolled a 4, without knowing what die I used, is compatible with the regular dice hypothesis, but any of the other rolls, is not.

*(likely using symbols, for space reasons)

• This seems related to philosophy of science stuff, where updating is about pitting hypotheses against each other. In order to do that you have to locate the leading alternative hypotheses. It doesn’t work well to just pit a hypothesis against “everything else” (it’s hard to say what p(E|not-H) is, and it can change as you collect more data). You need to find data that distinguishes your hypothesis from leading alternatives. An experiment that favors Newtonian mechanics over Aristotelian mechanics won’t favor Newtonian mechanics over general relativity.