To those responding with the (obvious) insight that a distribution of probabilities collapses into a probability:
This is true, but there’s more to it. Consider, for example, the following two experiments:
Alice flips a coin to see if it comes up Heads or Tails.
Bob carves a rough wooden disk, colors both sides, and flips it to see if it comes up Red or Blue.
To a very good first approximation, Pr[Alice flips Heads]=1/2. For lack of better information, Pr[Bob flips Red]=1/2 as well. In many cases, you should treat these probabilities identically. For example, if choosing between “$2 if Alice flips Heads” and “$3 if Alice flips Tails”, you should pick the second option; you should do the same for Bob.
There are two (related) ways in which we can meaningfully say we’re more uncertain about Bob’s flip than Alice’s.
Repeated trials. If Alice flips her coin 100 times, there’s only about a 0.0016% chance that she gets Heads more than 70 times. If Bob flips his disk 100 times, the chance of seeing Red more than 70 times is much higher: 30⁄101, for instance, if we start with a uniform distribution for Pr[Bob flips Red]. Here, it’s meaningful to say that there’s a true (frequentist) probability of Pr[Bob flips Red], that repeated trials will converge to the probability, and that we’re not at all certain that they will converge to 1⁄2.
Updating. If you see Alice flip her coin 5 times and it comes up Heads every time, then you most likely say “Huh, that’s odd” and continue to expect Pr[Alice flips Heads] to be about 1⁄2 for the next flip. If you see Bob flip his disk 5 times and it comes up Red every time, then you begin to suspect that it’s fairly likely to continue coming up Red.
The second of these is much more important: there are few things in life for which we can take repeated independent samples, but many things in life for which we expect to learn additional information. Unfortunately, the second of these is also much more complicated.
We can’t, you see, just stick with a probability distribution on an underlying parameter p = Pr[Bob flips Red] and update this probability distribution with new information. That helps a lot, but it doesn’t help with everything. For instance, if we already updated on “Chad saw the disk come up Red, like, ten times in a row” we won’t update very much on “Dana was there and she saw it, too.” The only complete description of our uncertainty is a list of all the evidence we have collected.
(Of course, if we expect to see a bunch of independent trials and update based on those, we can do that with Beta distributions and such easily. But, as I mentioned, that doesn’t always happen.)
To those responding with the (obvious) insight that a distribution of probabilities collapses into a probability:
This is true, but there’s more to it. Consider, for example, the following two experiments:
Alice flips a coin to see if it comes up Heads or Tails.
Bob carves a rough wooden disk, colors both sides, and flips it to see if it comes up Red or Blue.
To a very good first approximation, Pr[Alice flips Heads]=1/2. For lack of better information, Pr[Bob flips Red]=1/2 as well. In many cases, you should treat these probabilities identically. For example, if choosing between “$2 if Alice flips Heads” and “$3 if Alice flips Tails”, you should pick the second option; you should do the same for Bob.
There are two (related) ways in which we can meaningfully say we’re more uncertain about Bob’s flip than Alice’s.
Repeated trials. If Alice flips her coin 100 times, there’s only about a 0.0016% chance that she gets Heads more than 70 times. If Bob flips his disk 100 times, the chance of seeing Red more than 70 times is much higher: 30⁄101, for instance, if we start with a uniform distribution for Pr[Bob flips Red]. Here, it’s meaningful to say that there’s a true (frequentist) probability of Pr[Bob flips Red], that repeated trials will converge to the probability, and that we’re not at all certain that they will converge to 1⁄2.
Updating. If you see Alice flip her coin 5 times and it comes up Heads every time, then you most likely say “Huh, that’s odd” and continue to expect Pr[Alice flips Heads] to be about 1⁄2 for the next flip. If you see Bob flip his disk 5 times and it comes up Red every time, then you begin to suspect that it’s fairly likely to continue coming up Red.
The second of these is much more important: there are few things in life for which we can take repeated independent samples, but many things in life for which we expect to learn additional information. Unfortunately, the second of these is also much more complicated.
We can’t, you see, just stick with a probability distribution on an underlying parameter p = Pr[Bob flips Red] and update this probability distribution with new information. That helps a lot, but it doesn’t help with everything. For instance, if we already updated on “Chad saw the disk come up Red, like, ten times in a row” we won’t update very much on “Dana was there and she saw it, too.” The only complete description of our uncertainty is a list of all the evidence we have collected.
(Of course, if we expect to see a bunch of independent trials and update based on those, we can do that with Beta distributions and such easily. But, as I mentioned, that doesn’t always happen.)