Along with what rwallace said, it might be a good idea to point out that the total message length actually breaks down into three parts:
1) Specification of the encoding: in info-theoretical terms, this maps more common, long strings (observations) to shorter symbols. It plays the role of the theory, in that it implicitly says what observations are most likely. This part tells you how to “decompress” parts 2 and 3 into their original form.
2) The compressed string: this lists all the observed data, using the shortcuts given in 1). Common occurrences are easily described by short strings.
3) The “exceptions” to the theory—the observations that aren’t easily compressed into 2). The longer strings are reserved for these, since they occur the least.
So yes, even if you can’t reach perfection, any strong regularity found in the data will allow you to compress it, even if it’s a rule with part-3 exceptions.
Daniel’s Reusability hypothesis is correct: any insight you gain into a particular domain can be equivalently expressed as a way to compress descriptions of that domain.
However, let me again register my annoyance with this series, not on grounds that it’s wrong (so far), but that it’s just an exposition of the well-accepted minimum message length formalism. It may not be accepted by scientists, but it’s accepted here, and sciences do implicitly use it when they can’t do controlled experiments (which are not strictly necessary for science anyway), such as in astronomy and geology.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal. When you do this you add another element to the system—time. How much time do you spend trying to compress something, or vice versa, decompressing it? And no compression system can garuntee that every observed instance can be compressed to less then its original size. Infact, there are always going to be cases where compressing it actually increases the size of the file, possibly significantly, but at least just a few bits.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Compression amounts to saying: “This is the structure of the data. Now, here’s the remaining data to full in any uncertainty not resolved by its structure.” The structure tells you what long symbols you can substitute with short symbols.
And no compression system can garuntee that every observed instance can be compressed to less then its original size.
True, compression only attempts to guarantee that, for the function generating the data, you save, on average, by using a certain compression scheme. If the generating function has no regularity, then you won’t be able to compress, even on average.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Yes but if you have a symbol for every possible fractal, it defeats the purpose. The point was to use time as a memory resource. This also works in reverse with caching and other methods, you can use memory as if it were time. Both, however, only work up to a point.
Along with what rwallace said, it might be a good idea to point out that the total message length actually breaks down into three parts:
1) Specification of the encoding: in info-theoretical terms, this maps more common, long strings (observations) to shorter symbols. It plays the role of the theory, in that it implicitly says what observations are most likely. This part tells you how to “decompress” parts 2 and 3 into their original form.
2) The compressed string: this lists all the observed data, using the shortcuts given in 1). Common occurrences are easily described by short strings.
3) The “exceptions” to the theory—the observations that aren’t easily compressed into 2). The longer strings are reserved for these, since they occur the least.
So yes, even if you can’t reach perfection, any strong regularity found in the data will allow you to compress it, even if it’s a rule with part-3 exceptions.
Daniel’s Reusability hypothesis is correct: any insight you gain into a particular domain can be equivalently expressed as a way to compress descriptions of that domain.
However, let me again register my annoyance with this series, not on grounds that it’s wrong (so far), but that it’s just an exposition of the well-accepted minimum message length formalism. It may not be accepted by scientists, but it’s accepted here, and sciences do implicitly use it when they can’t do controlled experiments (which are not strictly necessary for science anyway), such as in astronomy and geology.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal. When you do this you add another element to the system—time. How much time do you spend trying to compress something, or vice versa, decompressing it? And no compression system can garuntee that every observed instance can be compressed to less then its original size. Infact, there are always going to be cases where compressing it actually increases the size of the file, possibly significantly, but at least just a few bits.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Compression amounts to saying: “This is the structure of the data. Now, here’s the remaining data to full in any uncertainty not resolved by its structure.” The structure tells you what long symbols you can substitute with short symbols.
True, compression only attempts to guarantee that, for the function generating the data, you save, on average, by using a certain compression scheme. If the generating function has no regularity, then you won’t be able to compress, even on average.
Yes but if you have a symbol for every possible fractal, it defeats the purpose. The point was to use time as a memory resource. This also works in reverse with caching and other methods, you can use memory as if it were time. Both, however, only work up to a point.