This one bother me as well. Turing machine basis for complexity favours certain types of theories over others. For example, how many bits of turing machine does it take to predict, say, maxwells equations, which involve continuous fields?
(warning: original “research” following)
One temptation is to sweep this under the rug by saying that the complexity required to talk about continuous math is just a constant-sized interpreter that you put on the turing machine, but this can’t get around the fact that some thoeries won’t need that.
(BTW, is continuous math even exactly turing-computable?)
If we try to generalize and say “turing machines suck, we’ll use some other language”, it becomes clear that all langauges will have some bias. English has a bias for agenty mind-stuff, turing machines have a bias for discrete cellular automata, some imaginable basis language will have a bias for simple continuous mathematics.
I think the basis that is closest to correct would be one based on a hypothetical real-number continuous-memory hypercomputer (for obvious reasons). This of course implies that I have learned this thing, which implies some deeper form of induction over forms of induction, that would make a language that had only one operator that was just “physics” take appropriate penalization. And then thinking about that idea reveals that inducting over langauges and inducting over theories in languages is suspiciously similar.
That tempts me to say that they should not be seperate, but then I still think the general utility of continuous math should not have to be proven for every theory that uses it, which implies some sharing of complexity somehow. But that seems like nonsense, SI already handles that, I think (by each thoery being a completely defactored package deal). Now I think it would be good to find a new SI that can handle factoring (for algorithmic reasons), but maybe that is a step down the “how do we actually do induction” road, which is not what SI is about.
Thinking over this all again, it seems starting with any form of induction that sortof works will eventually come to the right answer. The hypothetical perfect form of induction that would be the best a learning machine could start with is just the actual answer. Again there’s that duality (is that the right word) between theories and induction priors.
This looks to me alot like some kind of iterative improvement algorithm, where you have to start somewhere, and where you start is chosen from the same domain as your answer (think newton’s method, or gradient descent). It seems like even starting with some human language (like english), you would eventually learn (as we have done) that building a math-interpreter is the thing to do.
Ok, that is the end of my confusion dump.
My unproven, badly specified idea that has something to do with induction:
Ok, background: All turing-complete languages can build interpreters for all others. This implies some traversable languagespace (which is a subset of programspace) where the distance between two points is the length of the interpreter to get from one language to the other. We will also want to be able to go backwards, so that we can go from any point in programspace to any other (by backing up to a good language, and then going forward to new program). Going backwards means building this point from some point in languagespace, rather than building some point in programspace from this point in languagespace) Points in programspace are defined over behavior, so the same program will be reachable from multiple languages.
(the lisp-like area gets a lot of traffic thru it. (“any sufficiently complex program contains … half of common lisp”))
Ok, so now we have a programspace with distances measured in bits. We could just put a probability distribution over the subset of programspace that makes predictions. Solomonof induction does this, with initial improbability defined by distance from turing-machine-langauge.
I think there might be some way to use this programspace idea for induction other than just being another way to look at SI programs. I’m imagining probability or something flowing around in this space updating our concept of simplicity so that when we see that newtonian mechanics, GR, and quantum mechanics are all closely reachable from some mathy point in languagespace, we then upgrade the probability of math-reachable thoeries, without even talking about what they predict. I’m also imagining if we build an AI to do induction, we say “here are some interesting points in programspace (newton, maxwell, SR, GR, quantum) that have served us well” and letting it figure out a distribution over programspace instead of “start from turing-machine-language”.
I have more ideas and I fear I am not communicating this very well.
I think this interpreter-traversable programspace is a good concept to investigate for trying to make SI more objective. Even if we just stick with something like SI, it seems like an interesting concept.
The thing that you’re describing is a lot like finding the stationary distribution of a Markov chain. Not all Markov chains have stationary distributions and I can’t tell whether this one would, though if it does have one it would be unique. It’s an interesting idea.
I should also note that it is not necessary to do anything like this to preform induction over multiple theories. Instead, we can just require a program that outputs known theories in addition to the new theory. We can ask them to be output in any form, such as predictions or source code, since converting source code to predictions is O(1) in program size. Then, if the different theories have more in common, the program computing them all will be shorter than one that must seperately specify elements of very different theories. A slightly different implementation of the same general principle is Schmidhuber’s Optimal Ordered Problem Solver.
This one bother me as well. Turing machine basis for complexity favours certain types of theories over others. For example, how many bits of turing machine does it take to predict, say, maxwells equations, which involve continuous fields?
(warning: original “research” following)
One temptation is to sweep this under the rug by saying that the complexity required to talk about continuous math is just a constant-sized interpreter that you put on the turing machine, but this can’t get around the fact that some thoeries won’t need that.
(BTW, is continuous math even exactly turing-computable?)
If we try to generalize and say “turing machines suck, we’ll use some other language”, it becomes clear that all langauges will have some bias. English has a bias for agenty mind-stuff, turing machines have a bias for discrete cellular automata, some imaginable basis language will have a bias for simple continuous mathematics.
I think the basis that is closest to correct would be one based on a hypothetical real-number continuous-memory hypercomputer (for obvious reasons). This of course implies that I have learned this thing, which implies some deeper form of induction over forms of induction, that would make a language that had only one operator that was just “physics” take appropriate penalization. And then thinking about that idea reveals that inducting over langauges and inducting over theories in languages is suspiciously similar.
That tempts me to say that they should not be seperate, but then I still think the general utility of continuous math should not have to be proven for every theory that uses it, which implies some sharing of complexity somehow. But that seems like nonsense, SI already handles that, I think (by each thoery being a completely defactored package deal). Now I think it would be good to find a new SI that can handle factoring (for algorithmic reasons), but maybe that is a step down the “how do we actually do induction” road, which is not what SI is about.
Thinking over this all again, it seems starting with any form of induction that sortof works will eventually come to the right answer. The hypothetical perfect form of induction that would be the best a learning machine could start with is just the actual answer. Again there’s that duality (is that the right word) between theories and induction priors.
This looks to me alot like some kind of iterative improvement algorithm, where you have to start somewhere, and where you start is chosen from the same domain as your answer (think newton’s method, or gradient descent). It seems like even starting with some human language (like english), you would eventually learn (as we have done) that building a math-interpreter is the thing to do.
Ok, that is the end of my confusion dump.
My unproven, badly specified idea that has something to do with induction:
Ok, background: All turing-complete languages can build interpreters for all others. This implies some traversable languagespace (which is a subset of programspace) where the distance between two points is the length of the interpreter to get from one language to the other. We will also want to be able to go backwards, so that we can go from any point in programspace to any other (by backing up to a good language, and then going forward to new program). Going backwards means building this point from some point in languagespace, rather than building some point in programspace from this point in languagespace) Points in programspace are defined over behavior, so the same program will be reachable from multiple languages.
(the lisp-like area gets a lot of traffic thru it. (“any sufficiently complex program contains … half of common lisp”))
Ok, so now we have a programspace with distances measured in bits. We could just put a probability distribution over the subset of programspace that makes predictions. Solomonof induction does this, with initial improbability defined by distance from turing-machine-langauge.
I think there might be some way to use this programspace idea for induction other than just being another way to look at SI programs. I’m imagining probability or something flowing around in this space updating our concept of simplicity so that when we see that newtonian mechanics, GR, and quantum mechanics are all closely reachable from some mathy point in languagespace, we then upgrade the probability of math-reachable thoeries, without even talking about what they predict. I’m also imagining if we build an AI to do induction, we say “here are some interesting points in programspace (newton, maxwell, SR, GR, quantum) that have served us well” and letting it figure out a distribution over programspace instead of “start from turing-machine-language”.
I have more ideas and I fear I am not communicating this very well.
I think this interpreter-traversable programspace is a good concept to investigate for trying to make SI more objective. Even if we just stick with something like SI, it seems like an interesting concept.
yeah...
The thing that you’re describing is a lot like finding the stationary distribution of a Markov chain. Not all Markov chains have stationary distributions and I can’t tell whether this one would, though if it does have one it would be unique. It’s an interesting idea.
I should also note that it is not necessary to do anything like this to preform induction over multiple theories. Instead, we can just require a program that outputs known theories in addition to the new theory. We can ask them to be output in any form, such as predictions or source code, since converting source code to predictions is O(1) in program size. Then, if the different theories have more in common, the program computing them all will be shorter than one that must seperately specify elements of very different theories. A slightly different implementation of the same general principle is Schmidhuber’s Optimal Ordered Problem Solver.
Some continuous math is commutable, some is not. There are uncomputable real numbers. Computable number.