I think this is similar to the 2010 post A Proof of Occam’s Razor? …which spawned 139 comments. I didn’t read them all. But here’s one point that came up a couple times:
Let N be a ridiculously, comically large finite number like N=3↑↑↑↑↑3. Take the N simplest possible hypotheses. This is a finite set, so we can split up probability weight such that simpler hypotheses are less likely, within this set.
For example, rank-order these N hypotheses by decreasing complexity, and assign probability 2−n to the n’th on the list. That leaves 2−N leftover probability weight for all the other infinity hypotheses outside that set, and you can distribute that however you like, N is so big that it doesn’t matter.
Now, simpler hypotheses are less likely, until we go past the first N hypotheses. But N is so ridiculously large that that’s never gonna happen.
I want to say something like: “The bigger N is, the bigger a computer needs to be in order to implement that prior; and given that your brain is the size that it is, it can’t possibly be setting N=3↑↑↑↑↑3.”
Now, this isn’t strictly correct, since the Solomonoff prior is uncomputable regardless of the computer’s size, etc. - but is there some kernel of truth there? Like, is there a way of approximating the Solomonoff prior efficiently, which becomes less efficient the larger N gets?
There are practical anti-occam calculations. Start uniform over all bitstrings. And every time you find a short program that produces a bitstring, turn the probability of that bitstring down.
I think this is similar to the 2010 post A Proof of Occam’s Razor? …which spawned 139 comments. I didn’t read them all. But here’s one point that came up a couple times:
Let N be a ridiculously, comically large finite number like N=3↑↑↑↑↑3. Take the N simplest possible hypotheses. This is a finite set, so we can split up probability weight such that simpler hypotheses are less likely, within this set.
For example, rank-order these N hypotheses by decreasing complexity, and assign probability 2−n to the n’th on the list. That leaves 2−N leftover probability weight for all the other infinity hypotheses outside that set, and you can distribute that however you like, N is so big that it doesn’t matter.
Now, simpler hypotheses are less likely, until we go past the first N hypotheses. But N is so ridiculously large that that’s never gonna happen.
I want to say something like: “The bigger N is, the bigger a computer needs to be in order to implement that prior; and given that your brain is the size that it is, it can’t possibly be setting N=3↑↑↑↑↑3.”
Now, this isn’t strictly correct, since the Solomonoff prior is uncomputable regardless of the computer’s size, etc. - but is there some kernel of truth there? Like, is there a way of approximating the Solomonoff prior efficiently, which becomes less efficient the larger N gets?
There are practical anti-occam calculations. Start uniform over all bitstrings. And every time you find a short program that produces a bitstring, turn the probability of that bitstring down.