I think the missing step is that you’re updating more than you’re updating and . If we use actual numbers, let’s say the Bayesian comes in with , , and . The update based on observing Mercury should be to remove from the standing and renormalize, dividing the remaining probabilities by their sum. So your new probabilities are , , and
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
I don’t have a reason for setting them equal, no. The prior probabilities could be arbitrarily split between the remaining options.
Yes, that’s correct. If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under
That last question is trickier. If there’s no new data either way, but it predicts reality better than most hypotheses in , you can split it out into and , conserving the sum so that . (Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
Then you can compare which specific predictions makes that does not. Once you perform experiments and get data that is extremely unlikely under but likely under , then you rule out and are left with and . Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under .
I’m not sure this is true in general. Sometimes we only figure out what data to look for to disprove theory 1 after someone comes up with theory 2. So before that, a bayesian might be gathering lots of data and updating theory 1′s prior towards P=1, but they’d be being mislead with no way to know they were being mislead. But also, where does theory 2 come from? Either it was something the bayesian could have thought up themselves (in which case why were they updating in the wrong direction?), or bayesianism is incomplete and theory 2 was generated in a non-bayesian way (and in this case it’s hard to see bayesianism as anything but inferior to this other method; at the very least the sum of both methods is definitely better than bayesianism alone).
(Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
There are an infinite number of those, though. Putting that aside, there are other mathematical constructions distinct from GR that are in , but we don’t know what they are yet (just like GR prior to publication).
And it seems like in order to split sensibly we’d need to know the distribution of probabilities over these unknown hypotheses in the hypothesis space. If we used a rule like always split in half, then our result is dependent on which order we learn the theories in (which seems illogical).
Then you can compare which specific predictions makes that does not.
But UG_other makes all possible predictions (since it’s a group of many hypotheses). Or close to ‘all’, at least.
Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
Oh and I think I just saw an issue with your original reasoning:
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
The evidence that falsified NMP also falsified some hypotheses of O (but not GR). So would not be 0.5.
Sorry for the delayed replay. My comments are being held for review.
I think the missing step is that you’re updating more than you’re updating and . If we use actual numbers, let’s say the Bayesian comes in with , , and . The update based on observing Mercury should be to remove from the standing and renormalize, dividing the remaining probabilities by their sum. So your new probabilities are , , and
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
Okay I see, thanks.
Is that because we only have and left, and they both started equal? Did you have a reason for setting ?
Also, I guess the is removed because we have right? (Though IRL I understand it’s bayesian convention to choose a very small value rather than 0)
Also, do you know how things change after GR is published? (assuming no new data in the mean time)
I don’t have a reason for setting them equal, no. The prior probabilities could be arbitrarily split between the remaining options.
Yes, that’s correct. If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under
That last question is trickier. If there’s no new data either way, but it predicts reality better than most hypotheses in , you can split it out into and , conserving the sum so that . (Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
Then you can compare which specific predictions makes that does not. Once you perform experiments and get data that is extremely unlikely under but likely under , then you rule out and are left with and . Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
I’m not sure this is true in general. Sometimes we only figure out what data to look for to disprove theory 1 after someone comes up with theory 2. So before that, a bayesian might be gathering lots of data and updating theory 1′s prior towards P=1, but they’d be being mislead with no way to know they were being mislead. But also, where does theory 2 come from? Either it was something the bayesian could have thought up themselves (in which case why were they updating in the wrong direction?), or bayesianism is incomplete and theory 2 was generated in a non-bayesian way (and in this case it’s hard to see bayesianism as anything but inferior to this other method; at the very least the sum of both methods is definitely better than bayesianism alone).
There are an infinite number of those, though. Putting that aside, there are other mathematical constructions distinct from GR that are in , but we don’t know what they are yet (just like GR prior to publication).
And it seems like in order to split sensibly we’d need to know the distribution of probabilities over these unknown hypotheses in the hypothesis space. If we used a rule like always split in half, then our result is dependent on which order we learn the theories in (which seems illogical).
But UG_other makes all possible predictions (since it’s a group of many hypotheses). Or close to ‘all’, at least.
Oh and I think I just saw an issue with your original reasoning:
The evidence that falsified NMP also falsified some hypotheses of O (but not GR). So would not be 0.5.
Sorry for the delayed replay. My comments are being held for review.