Yeah, so my dumb argumentive comment is, prediction does not equal compression. Sequential prediction equals compression. But non-sequential prediction is also important and does not equal compression.
I’m not so sure about this. I can accept that non-sequential prediction is not full compression, for the obvious reason that the sequence is information and lacking it means you haven’t compressed it; but if this were true in general then how could information about non-sequential things allow us to achieve better compression? For example, in Alkjash’s example the frequency of the letters was worth 4 bits.
This causes me to expect that not only does any kind of prediction correspond to some compression, but that each kind of prediction corresponds to a kind of compression.
On the other hand, thinking further about how to distinguish between the 4 bits in the frequency example and the 10 bits from the partial-sequence-elimination example, I am promptly confronted by a grey mist. Mumble mumble prediction/compresson transform mumble.
I’m not so sure about this. I can accept that non-sequential prediction is not full compression, for the obvious reason that the sequence is information and lacking it means you haven’t compressed it; but if this were true in general then how could information about non-sequential things allow us to achieve better compression? For example, in Alkjash’s example the frequency of the letters was worth 4 bits.
This causes me to expect that not only does any kind of prediction correspond to some compression, but that each kind of prediction corresponds to a kind of compression.
On the other hand, thinking further about how to distinguish between the 4 bits in the frequency example and the 10 bits from the partial-sequence-elimination example, I am promptly confronted by a grey mist. Mumble mumble prediction/compresson transform mumble.