So under the Standard Model, future states do pin down the entire past.
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
would it produce any qualitatively different conclusions here?
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
Interesting. Elaborate?
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.