Remember the first comment way back in the thread? Psy-Kosh? I’m pretty much with him.
We assume that both hypotheses are equally precise—that they have equally pointed likelihood functions in the vicinity of the data so far.
If you know what’s inside the boxes, and it’s directly comparable via Occam’s Razor, then Occam’s Razor should probably take over.
The main caveat on this point is that counting symbols in an equation doesn’t always get you the true prior probability of something, and the scientist’s ability to predict the next ten symbols from the first ten symbols may suggest that his version of Occam’s Razor / prior probability, is unusually good, if there’s a dispute about which Razor or prior to use.
For example, it might be that each experiment only gives you 4 bits of data, and when you write out the first scientist’s hypothesis in symbols, it comes out to 60 bits worth of causal network, or something like that. But it was the first hypothesis the scientist thought of, after seeing only 10 experiments or 40 bits worth of data—and what’s more, it worked. Which suggests that the first scientist has a higher effective prior for that hypothesis, than the 60-bit Occam measurement of “counting symbols” would have you believe. Direct Occam stuff only gives you an upper bound on the probability, not a lower bound.
If you don’t know what’s inside the boxes or you don’t have a good Occam prior for it, the first theory wins because the second black box is presumed to have used more of the data.
The main circumstance under which the second theory wins outright, is if you can look inside the boxes and the second theory is strictly simpler—that is, it captures all the successes so far, while containing strictly fewer elements—not just a shorter description, but a description that is a strict subset of the first. Then we just say that the first theory had a dangling part that needs snipping off, which there was never a reason to hypothesize in the first place.
Remember the first comment way back in the thread? Psy-Kosh? I’m pretty much with him.
We assume that both hypotheses are equally precise—that they have equally pointed likelihood functions in the vicinity of the data so far.
If you know what’s inside the boxes, and it’s directly comparable via Occam’s Razor, then Occam’s Razor should probably take over.
The main caveat on this point is that counting symbols in an equation doesn’t always get you the true prior probability of something, and the scientist’s ability to predict the next ten symbols from the first ten symbols may suggest that his version of Occam’s Razor / prior probability, is unusually good, if there’s a dispute about which Razor or prior to use.
For example, it might be that each experiment only gives you 4 bits of data, and when you write out the first scientist’s hypothesis in symbols, it comes out to 60 bits worth of causal network, or something like that. But it was the first hypothesis the scientist thought of, after seeing only 10 experiments or 40 bits worth of data—and what’s more, it worked. Which suggests that the first scientist has a higher effective prior for that hypothesis, than the 60-bit Occam measurement of “counting symbols” would have you believe. Direct Occam stuff only gives you an upper bound on the probability, not a lower bound.
If you don’t know what’s inside the boxes or you don’t have a good Occam prior for it, the first theory wins because the second black box is presumed to have used more of the data.
The main circumstance under which the second theory wins outright, is if you can look inside the boxes and the second theory is strictly simpler—that is, it captures all the successes so far, while containing strictly fewer elements—not just a shorter description, but a description that is a strict subset of the first. Then we just say that the first theory had a dangling part that needs snipping off, which there was never a reason to hypothesize in the first place.