In computer science land, prediction = compression. In practice, it doesn’t. Trying to compress data might be useful rationality practice in some circumstances, if you know the pitfalls.
One reason that prediction doesn’t act like compression, is that information can vary in its utility by many orders of magnitude. Suppose you have datasource consisting of a handful of bits that describe something very important. (eg Friendly singularity, yes or no) and you also have vast amounts of unimportant drivel. (Funny cat videos) You are asked to compress this data the best you can. You are going to be spending a lot of time focusing on the mechanics of cat fur, and statistical regularities in camera noise. Now, sometimes you can’t separate it based on raw bits, If shown video of a book page, the important thing to predict is probably the text, not the lens distortion effects or the amount of motion blur. Sure, perfect compression will predict everything as well as possible, but imperfect compression can look almost perfect by focussing attention on things we don’t care about.
Also, there is redundancy for error correction. Giving multiple different formulations of a physical law, plus some examples, makes it easier for a human to understand. Repeating a message can spot errors, and redundancy can do the same thing. Maybe you could compress a maths book by deleting all the answers to some problems, but the effort of solving the problems is bigger than the value of the saved memory space.
Videos have irremovable noise, but in some domains there is none and compression is more useful. In my experience, one example where the prediction-compression duality made problems much easier for humans to understand is in code golf (writing the shortest possible programs that perform various tasks). There’s a Stack Exchange site dedicated to it, and over the years people have created better and better code-golf languages. Solution sizes shrunk by a factor of >3. Now they’re so good that code golf isn’t as fun anymore because the tasks are too easy; anyone fluent in a modern golfing language can decompose the average code golf problem into ten or twenty atomic functions.
In computer science land, prediction = compression. In practice, it doesn’t. Trying to compress data might be useful rationality practice in some circumstances, if you know the pitfalls.
One reason that prediction doesn’t act like compression, is that information can vary in its utility by many orders of magnitude. Suppose you have datasource consisting of a handful of bits that describe something very important. (eg Friendly singularity, yes or no) and you also have vast amounts of unimportant drivel. (Funny cat videos) You are asked to compress this data the best you can. You are going to be spending a lot of time focusing on the mechanics of cat fur, and statistical regularities in camera noise. Now, sometimes you can’t separate it based on raw bits, If shown video of a book page, the important thing to predict is probably the text, not the lens distortion effects or the amount of motion blur. Sure, perfect compression will predict everything as well as possible, but imperfect compression can look almost perfect by focussing attention on things we don’t care about.
Also, there is redundancy for error correction. Giving multiple different formulations of a physical law, plus some examples, makes it easier for a human to understand. Repeating a message can spot errors, and redundancy can do the same thing. Maybe you could compress a maths book by deleting all the answers to some problems, but the effort of solving the problems is bigger than the value of the saved memory space.
Videos have irremovable noise, but in some domains there is none and compression is more useful. In my experience, one example where the prediction-compression duality made problems much easier for humans to understand is in code golf (writing the shortest possible programs that perform various tasks). There’s a Stack Exchange site dedicated to it, and over the years people have created better and better code-golf languages. Solution sizes shrunk by a factor of >3. Now they’re so good that code golf isn’t as fun anymore because the tasks are too easy; anyone fluent in a modern golfing language can decompose the average code golf problem into ten or twenty atomic functions.