One pattern I have noticed: those who think the No Free Lunch theorems are interesting and important are usually the people who talk the most nonsense about them. The first thing people need to learn about those theorems is how useless and inapplicable to most of the real world they are.
So, you’re disagreeing that an algorithm that is optimal, on average, over a set of randomly-selected computable environments, will perform worse in any specific environment than an algorithm optimized specifically for that environment?
Because if not, that’s all I need to make my point, no matter what subtlety of NFL I’m missing. (Actually, I can probably make it with an even weaker premise, and could have gone without NFL altogether, but it grants some insight on the issue I’m trying to illuminate.)
The NFL deals with a space of all possible problems—while the universe typically presents embedded agents with a subset of those problems that are produced by short programs or small mechanisms. So: the NFL theorems rarely apply to the real world. In the real world, there are useful general-purpose compression algorithms.
The NFL deals with a space of all possible problems—while the universe typically presents embedded agents with a subset of those problems that are produced by short programs or small mechanisms.
Okay. I stated the NFL-free version of the premise I need. If you agree with that, this point is moot.
In the real world, there are useful general-purpose compression algorithms.
Now I know I’m definitely not using NFL, because I agree with this and it’s consistent with the point in my initial post.
Yes, there are useful general-purpose programs: because researchers recognize regularities that generally appear across all types of files, which there must be because the raw data is rarely purely random. But they identify this regularity before writing the compressor, which then exploits that regularity by (basically) reserving shorter codes for the kinds of data consistent with that regularity.
Likewise, people have identified regularities specific to video files: each frame is very similar to the last. And regularities specific to picture files: each column or row is very similar to the neighboring.
But what they did not do was write an unbiased, Occamian prior program that went through various files and told them what regularities existed, because finding the shortest compression is uncomputable. Rather, they imported prior knowledge of the distribution of data in certain types of files, gained through some other method (type 2 intelligence in my convention), and tailored the compression algorithm to that distribution.
No “universal, all purpose” algorithm found that knowledge.
I should probably give you some proper feedback, as well as caustic comments.
The intelligence subdivision looks useful and interesting—though innate intelligence is usually referred to as being ‘instinctual’.
However, I was less impressed with the idea that the concept of intelligence lies somewhere between a category error and a fallacy of compression.
And I may be more leaning toward the “fallacy of compression” side, I’ll grant that. But I don’t see how you’d disagree with it since you find the subdivision I outlined to have some potential. If people are unknowingly shifting between two very different meanings of intelligence, that certainly is a fallacy of compression.
Another point: I’m not sure your description of AIXI is particularly great. AIXI works where Solomonoff induction works. Solomonoff induction works pretty well in this world. It might not be perfect—due to reference machine issues—but it is pretty good. AIXI would work very badly in worlds where Solomonoff induction was a misleading guide to its sense data. Its performance in this world doesn’t suffer through trying to deal with those worlds—since in those worlds it would be screwed.
Well, actually you’re highlighting the issue I raised in my first post: computable approximations of Solomonoff induction work pretty well … when fed useful priors! But those priors come from a lot of implicit knowledge about the world that skips over an exponentially large number of shorter hypotheses by the time you get to applying it to any specific problem.
AIXI (and computable approximations), starting from a purely Occamian prior, is stuck iterating through lots of generating functions before it gets to the right one—unfeasably long. To speed it up you have to feed it knowledge you gained elsewhere (and of course, find a way to represent that knowledge). But at that point, your prior includes a lot more than a penalty for length!
One pattern I have noticed: those who think the No Free Lunch theorems are interesting and important are usually the people who talk the most nonsense about them. The first thing people need to learn about those theorems is how useless and inapplicable to most of the real world they are.
So, you’re disagreeing that an algorithm that is optimal, on average, over a set of randomly-selected computable environments, will perform worse in any specific environment than an algorithm optimized specifically for that environment?
Because if not, that’s all I need to make my point, no matter what subtlety of NFL I’m missing. (Actually, I can probably make it with an even weaker premise, and could have gone without NFL altogether, but it grants some insight on the issue I’m trying to illuminate.)
The NFL deals with a space of all possible problems—while the universe typically presents embedded agents with a subset of those problems that are produced by short programs or small mechanisms. So: the NFL theorems rarely apply to the real world. In the real world, there are useful general-purpose compression algorithms.
Okay. I stated the NFL-free version of the premise I need. If you agree with that, this point is moot.
Now I know I’m definitely not using NFL, because I agree with this and it’s consistent with the point in my initial post.
Yes, there are useful general-purpose programs: because researchers recognize regularities that generally appear across all types of files, which there must be because the raw data is rarely purely random. But they identify this regularity before writing the compressor, which then exploits that regularity by (basically) reserving shorter codes for the kinds of data consistent with that regularity.
Likewise, people have identified regularities specific to video files: each frame is very similar to the last. And regularities specific to picture files: each column or row is very similar to the neighboring.
But what they did not do was write an unbiased, Occamian prior program that went through various files and told them what regularities existed, because finding the shortest compression is uncomputable. Rather, they imported prior knowledge of the distribution of data in certain types of files, gained through some other method (type 2 intelligence in my convention), and tailored the compression algorithm to that distribution.
No “universal, all purpose” algorithm found that knowledge.
I should probably give you some proper feedback, as well as caustic comments. The intelligence subdivision looks useful and interesting—though innate intelligence is usually referred to as being ‘instinctual’.
However, I was less impressed with the idea that the concept of intelligence lies somewhere between a category error and a fallacy of compression.
Okay, thanks for the proper feedback :-)
And I may be more leaning toward the “fallacy of compression” side, I’ll grant that. But I don’t see how you’d disagree with it since you find the subdivision I outlined to have some potential. If people are unknowingly shifting between two very different meanings of intelligence, that certainly is a fallacy of compression.
Another point: I’m not sure your description of AIXI is particularly great. AIXI works where Solomonoff induction works. Solomonoff induction works pretty well in this world. It might not be perfect—due to reference machine issues—but it is pretty good. AIXI would work very badly in worlds where Solomonoff induction was a misleading guide to its sense data. Its performance in this world doesn’t suffer through trying to deal with those worlds—since in those worlds it would be screwed.
Well, actually you’re highlighting the issue I raised in my first post: computable approximations of Solomonoff induction work pretty well … when fed useful priors! But those priors come from a lot of implicit knowledge about the world that skips over an exponentially large number of shorter hypotheses by the time you get to applying it to any specific problem.
AIXI (and computable approximations), starting from a purely Occamian prior, is stuck iterating through lots of generating functions before it gets to the right one—unfeasably long. To speed it up you have to feed it knowledge you gained elsewhere (and of course, find a way to represent that knowledge). But at that point, your prior includes a lot more than a penalty for length!