This is a subtle point. The NFL theorem does prohibit any algorithm from doing well over all possible worlds. But Solomonoff induction does well on any world that has any kind of computable regularity.
Okay, fair point. But by smearing its optimality over the entire Platonic space of computable functions, it is significantly worse than those algorithms tuned for this world’s function. And not surprisingly, AIXI has very little practical application.
If there is no computable regularity, then no prior can do well. In fact, the Solomonoff prior does just as well asymptotically as any computable prior.
And that’s unhelpful when, as is likely, you don’t hit that asymptote until the heat death of the universe.
The reason AIXI doesn’t work in practice is that the “efficient approximations” aren’t really efficient, or aren’t good approximations.
My point is that the most efficient approximations can’t be efficient in any absolute sense. In order to make AIXI useful, you have to feed it information about which functions it can safely skip over—in other words, feed it intelligence of type 2), the information about its environment that you already gained through other methods Which just shows that those kinds of intelligence are not the same.
Actually Solomonoff induction is insanely fast. Its generality is not just that it learns everything to as good an extent as anything else, but also that typically it learns everything from indirect observations “almost as fast as directly” (not really, but...). The “only problem” is that Solomonoff induction is not an algorithm, and so its “speed” is for all practical purposes a meaningless statement.
The analog would be to theorem proving. No one claims that knowing the axioms of math gets you to every theorem “very fast”—because the problem of finding a proof/disproof for an arbitrary proposition is also uncomputable.
A “solution” might be that only proofs matter, while theorems (as formulas) are in general meaningless in themselves, only useful as commentary on proofs.
Nevertheless, the original point stands: no one says “I’ve discovered math! Now I can can learn the answer to any math problem very fast.” In contrast, you are saying that because we have Solomonoff induction, we can infer distributions “very fast”.
In contrast, you are saying that because we have Solomonoff induction, we can infer distributions “very fast”.
To be more precise, we can specify the denotation of distributions very close to the real deal from very few data. This technical sense doesn’t allow the analogy with theorem-proving, which is about algorithms, not denotation.
But by smearing its optimality over the entire Platonic space of computable functions, it is significantly worse than those algorithms tuned for this world’s function. And not surprisingly, AIXI has very little practical application.
Let’s do another thought experiment. Say that humanity has finally resolved to send colonists to nearby star systems. The first group is getting ready to head out to Alpha Centauri.
The plan is that after the colonists arrive and set up their initial civilization, they will assemble a data archive of size T about the new world and send it back to Earth for review. Now it is expensive to send data across light-years, so obviously they want to minimize the number of bits they have to send. So the question is: what data format do the two parties agree on at the moment of parting?
If T is small, then it makes sense to think this issue over quite a bit. What should we expect the data to look like? Will it be images, audio, health reports? If we can figure something out about what the data will look like in advance (ie, choose a good prior), then we can develop a good data format and get short codes.
But if T is large (terabytes) then there’s no point in doing that. When the Alpha Centauri people build their data archive, they spend some time analyzing it and figuring out ways to compress it. Finally they find a really good compression format (=prior). Of course, Earth doesn’t know the format—but that doesn’t matter, since the specification for the format can just be prepended to the transmission.
I think this thought experiment is nice because it reveals the pointlessness of a lot of philosophical debates about Solomonoff, Bayes, etc. Of course the colonists have to choose a prior before the moment of parting, and of course if they choose a good prior they will get short codes. And the Solomonoff distribution may not be perfect in some metaphysical sense, but it’s obviously the right prior to choose in the large T regime. Better world-specific formats exist, but their benefit is small compared to T.
I think this thought experiment is nice because it reveals the pointlessness of a lot of philosophical debates about Solomonoff, Bayes, etc. Of course the colonists have to choose a prior before the moment of parting, and of course if they choose a good prior they will get short codes. And the Solomonoff distribution may not be perfect in some metaphysical sense, but it’s obviously the right prior to choose in the large T regime. Better world-specific formats exist, but their benefit is small compared to T.
Well, the thought experiment doesn’t accomplish that. Solomonoff induction is not necessarily optimal (and most probably isn’t optimal) in your scenario, even and especially for large T. The amount of time it takes for any computable Occamian approximation of S/I to find the the optimal encoding, is superexponential in the length of the raw source data. So the fact that it will eventually get to a superior or near-superior encoding is little consolation, when Alpha Centauri and Sol will have long burned out before Solomonoff has converged on a solution.
The inferiority of Solomonoff Occamian induction, of iterating up through shorter generating algorithms until the data is matched, is not some metaphysical or philosophical issue, but rather, deals directly with the real-world time constraints that arise in practical situations.
My point is, any practical attempt to incorporate Solomonoff induction must also make use of knowledge of the data’s regularity that was found some other way, making it questionable whether Solomonoff induction incorporates everything we mean by “intelligence”. This incompleteness also raises the issue of what this-world-specific methods we actually did use to get to our current state of knowledge that makes Bayesian inference actually effective.
Okay, fair point. But by smearing its optimality over the entire Platonic space of computable functions, it is significantly worse than those algorithms tuned for this world’s function. And not surprisingly, AIXI has very little practical application.
And that’s unhelpful when, as is likely, you don’t hit that asymptote until the heat death of the universe.
My point is that the most efficient approximations can’t be efficient in any absolute sense. In order to make AIXI useful, you have to feed it information about which functions it can safely skip over—in other words, feed it intelligence of type 2), the information about its environment that you already gained through other methods Which just shows that those kinds of intelligence are not the same.
Actually Solomonoff induction is insanely fast. Its generality is not just that it learns everything to as good an extent as anything else, but also that typically it learns everything from indirect observations “almost as fast as directly” (not really, but...). The “only problem” is that Solomonoff induction is not an algorithm, and so its “speed” is for all practical purposes a meaningless statement.
When someone says “very fast, but uncomputable”, what I hear is “dragon in garage, but invisible”.
Generalize that to a good chunk of classical math.
The analog would be to theorem proving. No one claims that knowing the axioms of math gets you to every theorem “very fast”—because the problem of finding a proof/disproof for an arbitrary proposition is also uncomputable.
A “solution” might be that only proofs matter, while theorems (as formulas) are in general meaningless in themselves, only useful as commentary on proofs.
Nevertheless, the original point stands: no one says “I’ve discovered math! Now I can can learn the answer to any math problem very fast.” In contrast, you are saying that because we have Solomonoff induction, we can infer distributions “very fast”.
To be more precise, we can specify the denotation of distributions very close to the real deal from very few data. This technical sense doesn’t allow the analogy with theorem-proving, which is about algorithms, not denotation.
But the analogy is in terms of the “fast but uncomputable” oxymoron.
Let’s do another thought experiment. Say that humanity has finally resolved to send colonists to nearby star systems. The first group is getting ready to head out to Alpha Centauri.
The plan is that after the colonists arrive and set up their initial civilization, they will assemble a data archive of size T about the new world and send it back to Earth for review. Now it is expensive to send data across light-years, so obviously they want to minimize the number of bits they have to send. So the question is: what data format do the two parties agree on at the moment of parting?
If T is small, then it makes sense to think this issue over quite a bit. What should we expect the data to look like? Will it be images, audio, health reports? If we can figure something out about what the data will look like in advance (ie, choose a good prior), then we can develop a good data format and get short codes.
But if T is large (terabytes) then there’s no point in doing that. When the Alpha Centauri people build their data archive, they spend some time analyzing it and figuring out ways to compress it. Finally they find a really good compression format (=prior). Of course, Earth doesn’t know the format—but that doesn’t matter, since the specification for the format can just be prepended to the transmission.
I think this thought experiment is nice because it reveals the pointlessness of a lot of philosophical debates about Solomonoff, Bayes, etc. Of course the colonists have to choose a prior before the moment of parting, and of course if they choose a good prior they will get short codes. And the Solomonoff distribution may not be perfect in some metaphysical sense, but it’s obviously the right prior to choose in the large T regime. Better world-specific formats exist, but their benefit is small compared to T.
The choice that they will prepend a description (and the format of the description) is a choice of prior.
Well, the thought experiment doesn’t accomplish that. Solomonoff induction is not necessarily optimal (and most probably isn’t optimal) in your scenario, even and especially for large T. The amount of time it takes for any computable Occamian approximation of S/I to find the the optimal encoding, is superexponential in the length of the raw source data. So the fact that it will eventually get to a superior or near-superior encoding is little consolation, when Alpha Centauri and Sol will have long burned out before Solomonoff has converged on a solution.
The inferiority of Solomonoff Occamian induction, of iterating up through shorter generating algorithms until the data is matched, is not some metaphysical or philosophical issue, but rather, deals directly with the real-world time constraints that arise in practical situations.
My point is, any practical attempt to incorporate Solomonoff induction must also make use of knowledge of the data’s regularity that was found some other way, making it questionable whether Solomonoff induction incorporates everything we mean by “intelligence”. This incompleteness also raises the issue of what this-world-specific methods we actually did use to get to our current state of knowledge that makes Bayesian inference actually effective.