“Specifically, going between two universal machines cannot increase the hypothesis length any more than the length of the compiler from one machine to the other. This length is fixed, independent of the hypothesis, so the more data you use, the less this difference matters.”
This doesn’t completely resolve my concern here, as there are infinitely many possible Turing machines. If you pick one and I’m free to pick any other, is there a bound on the length of the compiler? If not, then I don’t see how the compiler length placing a bound on any specific change in Turing machine makes the problem of which machine to use irrelevant.
To be clear: I am aware that starting with different machines, the process of updating on shared observations will eventually lead us to similar distributions even if we started with wildly different priors. My concern is that if “wildly different” is unbounded then “eventually” might also be unbounded even for a fixed value of “similar.” If this does indeed happen, then it’s not clear to me how S I does anything more useful than “Pick your favorite normalized distribution without any 0s or 1s and then update via Bayes.”
Edit: Also thanks for the intro. It’s a lot more accessible than anything else I’ve encountered on the topic.
This seems to be a red-herring issue. There are clear differences in description complexity of Turing machines so the issue seems merely to require a closure argument of some sort in order to decide which is simplest:
Decide on the Turing machine has the shortest program that simulates that Turing machine while running on that Turing machine.
“Specifically, going between two universal machines cannot increase the hypothesis length any more than the length of the compiler from one machine to the other. This length is fixed, independent of the hypothesis, so the more data you use, the less this difference matters.”
This doesn’t completely resolve my concern here, as there are infinitely many possible Turing machines. If you pick one and I’m free to pick any other, is there a bound on the length of the compiler? If not, then I don’t see how the compiler length placing a bound on any specific change in Turing machine makes the problem of which machine to use irrelevant.
To be clear: I am aware that starting with different machines, the process of updating on shared observations will eventually lead us to similar distributions even if we started with wildly different priors. My concern is that if “wildly different” is unbounded then “eventually” might also be unbounded even for a fixed value of “similar.” If this does indeed happen, then it’s not clear to me how S I does anything more useful than “Pick your favorite normalized distribution without any 0s or 1s and then update via Bayes.”
Edit: Also thanks for the intro. It’s a lot more accessible than anything else I’ve encountered on the topic.
can anyone answer these concerns?
This seems to be a red-herring issue. There are clear differences in description complexity of Turing machines so the issue seems merely to require a closure argument of some sort in order to decide which is simplest:
Decide on the Turing machine has the shortest program that simulates that Turing machine while running on that Turing machine.