The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird. Most of the time when events evolve into a simple state, it’s because information is destroyed. This isn’t really a counter-argument, it’s just trying to put into words what feels odd.
One thing that’s confusing to me: Why K-complexity of the low-level history? Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise? Or memory-bounded K-complexity, which might also favour multi-level descriptions.
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe.
The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird
Laws of physics under the Standard Model are reversible though, aren’t they? I think you can’t do it from within an Everett branch, because some information ends up in inaccessible-to-you parts of the universal wavefunction, but if you had access to the wavefunction itself, you would’ve been able to run it in reverse. So under the Standard Model, future states do pin down the entire past.
One thing that’s confusing to me: Why K-complexity of the low-level history?
Hm; frankly, simply because it’s the default I ran with.
Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise?
That seems like an acceptable fit. It’s defined through Kolmogorov complexity anyway, though; would it produce any qualitatively different conclusions here?
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe
So under the Standard Model, future states do pin down the entire past.
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
would it produce any qualitatively different conclusions here?
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
Interesting. Elaborate?
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.
The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird. Most of the time when events evolve into a simple state, it’s because information is destroyed. This isn’t really a counter-argument, it’s just trying to put into words what feels odd.
One thing that’s confusing to me: Why K-complexity of the low-level history? Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise? Or memory-bounded K-complexity, which might also favour multi-level descriptions.
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe.
Laws of physics under the Standard Model are reversible though, aren’t they? I think you can’t do it from within an Everett branch, because some information ends up in inaccessible-to-you parts of the universal wavefunction, but if you had access to the wavefunction itself, you would’ve been able to run it in reverse. So under the Standard Model, future states do pin down the entire past.
Hm; frankly, simply because it’s the default I ran with.
That seems like an acceptable fit. It’s defined through Kolmogorov complexity anyway, though; would it produce any qualitatively different conclusions here?
Interesting. Elaborate?
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.