I don’t think we can resolve this debate, but let me try to clarify the differences in our positions (perhaps confusing to nonspecialists, since we both advocate compression).
Hutter/Legg/Tyler/etc (algorithmic approach) : Compression is the best measure of understanding. Therefore, to achieve general intelligence, we should search for general purpose compressors. It is not interesting to build specialized compressors. To achieve compression in spite of the NFL theorem, one must exploit empirical structure in the data, but the only empirical fact we require is that the world is computable. Because the compressors are general purpose, to demonstrate success it is sufficient to show that they work well on simple benchmark problems. There is no need to study the structure of specific datasets. To achieve good text compression, one simply finds a general purpose compressor and applies it to text. The problem is entirely a problem of mathematics and algorithm design.
Burfoot (empirical approach) : Compression is the best measure of understanding. However, general purpose compressors are far out of reach at this stage. Instead, one should develop specialized compressors that target specific data types (text, images, speech, music, etc). To achieve good compression in spite of the NFL, one must study the empirical structure of the respective data sets, and build that knowledge into the compressors. To compress text well, one should study grammar, parsing, word morphology, and related topics in linguistics. To demonstrate success, it is sufficient to show that a new compressor achieves a better compression rate on a standard benchmark. We should expect good compressor to fail when applied to a data type for which it was not designed. Progress is achieved by obtaining a series of increasingly strong compression results (K-complexity upper bounds) on standard databases, while also adding new databases of greater scope and size.
Again, I don’t think this debate can resolved, but I think it’s important to clarify the various positions.
General purpose systems have their attractions. The human brain has done well out of the generality that it has.
However, I do see many virtues in narrower systems. Indeed, if you want to perform some specific task, a narrow expert system focussed on the problem domain will probably do a somewhat better job than a general purpose system. So, I would not say:
It is not interesting to build specialized compressors.
Rather, each specialized compressor encodes a little bit of a more general intelligence.
This is also a bit of a misrepresentation:
but the only empirical fact we require is that the world is computable
Occam’s razor is the critical thing, really. That is an “empirical fact”—and without it we are pretty lost.
We do want general-purpose systems. If we have those, they can build whatever narrow systems we might need.
There are two visions of the path towards machine intelligence—one is of broadening narrow systems, and the other is of general forecasting systems increasing in power: the “forecasting first” scenario. Both seem likely to be important. I tend to promote the second approach partly for technical reasons, but partly because it currently gets so little air time and attention.
I don’t think we can resolve this debate, but let me try to clarify the differences in our positions (perhaps confusing to nonspecialists, since we both advocate compression).
Hutter/Legg/Tyler/etc (algorithmic approach) : Compression is the best measure of understanding. Therefore, to achieve general intelligence, we should search for general purpose compressors. It is not interesting to build specialized compressors. To achieve compression in spite of the NFL theorem, one must exploit empirical structure in the data, but the only empirical fact we require is that the world is computable. Because the compressors are general purpose, to demonstrate success it is sufficient to show that they work well on simple benchmark problems. There is no need to study the structure of specific datasets. To achieve good text compression, one simply finds a general purpose compressor and applies it to text. The problem is entirely a problem of mathematics and algorithm design.
Burfoot (empirical approach) : Compression is the best measure of understanding. However, general purpose compressors are far out of reach at this stage. Instead, one should develop specialized compressors that target specific data types (text, images, speech, music, etc). To achieve good compression in spite of the NFL, one must study the empirical structure of the respective data sets, and build that knowledge into the compressors. To compress text well, one should study grammar, parsing, word morphology, and related topics in linguistics. To demonstrate success, it is sufficient to show that a new compressor achieves a better compression rate on a standard benchmark. We should expect good compressor to fail when applied to a data type for which it was not designed. Progress is achieved by obtaining a series of increasingly strong compression results (K-complexity upper bounds) on standard databases, while also adding new databases of greater scope and size.
Again, I don’t think this debate can resolved, but I think it’s important to clarify the various positions.
Thanks for the attempt at a position summary!
General purpose systems have their attractions. The human brain has done well out of the generality that it has.
However, I do see many virtues in narrower systems. Indeed, if you want to perform some specific task, a narrow expert system focussed on the problem domain will probably do a somewhat better job than a general purpose system. So, I would not say:
Rather, each specialized compressor encodes a little bit of a more general intelligence.
This is also a bit of a misrepresentation:
Occam’s razor is the critical thing, really. That is an “empirical fact”—and without it we are pretty lost.
We do want general-purpose systems. If we have those, they can build whatever narrow systems we might need.
There are two visions of the path towards machine intelligence—one is of broadening narrow systems, and the other is of general forecasting systems increasing in power: the “forecasting first” scenario. Both seem likely to be important. I tend to promote the second approach partly for technical reasons, but partly because it currently gets so little air time and attention.