Yes, there’s no reason for Solomonoff to be well-calibrated in the end, but once we obtain information that most of the universes starting with 0 do not work, that is data against which most of the hypotheses starting with 0 will fail. At this point, brute solomonoff induction will be obviously inefficient, and we should begin using the heuristic of testing almost only hypotheses starting with 1.
In fact, we’re already doing this: We know for a fact that we live in the subset of universes where the acceleration between two particles is not constant and invariant of distance. So it is known that the simpler hypothesis where gravitational attraction is “0.02c/year times the total mass of the objects” is not more likely than the one where gravitational attraction also depends on distance and angular momentum and other factors, despite the former being much less complex than the latter (or so we presume).
There’s still murky depths and open questions, such as (IIRC) how to calculate how “long” (see Kolmogorov complexity) the instructions are.
Because suppose we build two universal turing machines with different sets of internal instructions.
We run Solomonoff Induction on the first machine, and it turns out that 01110101011110101010101111011 is the simplest possible program that will output “110”, and by analyzing the language and structure of the machine we learn that this corresponds to the hypothesis “2*3“, with the output being “6”. Meanwhile, on the second machine, 1111110 will also output “110”, and by analyzing it we find out that this corresponds to the hypothesis “6”, with the output being “6″.
On the first machine, to do the hypothesis “6”, we must write 101010101111110110101111111110000000111111110000110, which is much more complex than the earlier “2*3“ hypothesis, while on the second machine the “2*3” hypothesis is input as 1010111010101111, which is much longer than the “6” hypothesis.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
What weight will we assing to the individual languages/machines? Their complexity… according to what? Perhaps we could make a matrix saying how complex a machine A is when simulated by a machine B, and then find the eigenvalues of the matrix?
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
This is also my intuition as well, though it has to be restricted to turing-complete systems I think. I was under the impression that there was already some active research in this direction, but I’ve never taken the time to look into that too deeply
Hmm, I think I see what you mean.
Yes, there’s no reason for Solomonoff to be well-calibrated in the end, but once we obtain information that most of the universes starting with 0 do not work, that is data against which most of the hypotheses starting with 0 will fail. At this point, brute solomonoff induction will be obviously inefficient, and we should begin using the heuristic of testing almost only hypotheses starting with 1.
In fact, we’re already doing this: We know for a fact that we live in the subset of universes where the acceleration between two particles is not constant and invariant of distance. So it is known that the simpler hypothesis where gravitational attraction is “0.02c/year times the total mass of the objects” is not more likely than the one where gravitational attraction also depends on distance and angular momentum and other factors, despite the former being much less complex than the latter (or so we presume).
There’s still murky depths and open questions, such as (IIRC) how to calculate how “long” (see Kolmogorov complexity) the instructions are.
Because suppose we build two universal turing machines with different sets of internal instructions.
We run Solomonoff Induction on the first machine, and it turns out that 01110101011110101010101111011 is the simplest possible program that will output “110”, and by analyzing the language and structure of the machine we learn that this corresponds to the hypothesis “2*3“, with the output being “6”. Meanwhile, on the second machine, 1111110 will also output “110”, and by analyzing it we find out that this corresponds to the hypothesis “6”, with the output being “6″.
On the first machine, to do the hypothesis “6”, we must write 101010101111110110101111111110000000111111110000110, which is much more complex than the earlier “2*3“ hypothesis, while on the second machine the “2*3” hypothesis is input as 1010111010101111, which is much longer than the “6” hypothesis.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
What weight will we assing to the individual languages/machines? Their complexity… according to what? Perhaps we could make a matrix saying how complex a machine A is when simulated by a machine B, and then find the eigenvalues of the matrix?
Must stop… before head explodes...
This is also my intuition as well, though it has to be restricted to turing-complete systems I think. I was under the impression that there was already some active research in this direction, but I’ve never taken the time to look into that too deeply