Then, we could exploit it to compress the description of the full-fidelity/lowest-level history.
I don’t think this works if the lowest level laws of physics are very very simple. The laws of physics at the lowest level + initial conditions are sufficient to roll out the whole history, so (in K-complexity) there’s no benefit to adding descriptions of the higher levels.
Maybe if lots of noise is constantly being injected into the universe, this would change things. Because then the noise counts as part of the initial conditions. So the K-complexity of the universe-history is large, but high-level structure is common anyway because it’s more robust to that noise?
The laws of physics at the lowest level + initial conditions are sufficient to roll out the whole history, so (in K-complexity) there’s no benefit to adding descriptions of the higher levels.
Unless high-level structure lets you compress the initial conditions themselves, no?
Self-arguing on the topic
Counter-argument: The initial state had no structure we could exploit for compression, pure chaos.
Counter²-argument: Any given history that ended up well-abstracting corresponds to a specific inhomogeneous distribution of mass in the early universe, which defined the way the galaxies are spread across it. At least that seems to be the step that could already be compressed. If there were a step upstream of it where the state really didn’t have any structure, that unstructured state could be generated by describing the post-structure-formation state, describing the “timestamp” of the post-structure-formation state, then running physics in reverse to generate the unstructured state. So unless the later structured state fails to be lower-description-length than the earlier unstructured state, structure/abstractibility should still allow you to compress the initial state’s description, even if the structure only appears later.
Counter³-argumnent: The real “initial state” is the initial state of the quantum multiverse from which all possible Everett branches (and so all possible inhomogeneous distributions of mass, etc.) are generated. Its description length could be incredibly low, such as a uniform point singularity with no expensive-to-describe inhomogeneities whatsoever. The bits you later have to spend to describe the state of your universe are effectively spent on pinpointing the specific Everett branch you’re in, but the actual algorithm generating the whole Tegmark III multiverse did not have to do that. It just described the simple state from which all possible branches descend.
Counter⁴-argument: My understanding is that under QM/QFT, the universe doesn’t start from a singularity; it’s a general-relativity thing. QM/QFT require an initial inhomogeneous universal wavefunction to start working.
Counter⁵-argument: Perhaps the real Theory of Everything unifying QFT and GR would have an initial homogeneous singularity from which all possible Everett branches are generated, and this end result seems plausible enough that we may as well assume it right now.
I don’t know enough fundamental physics to make a confident call here. Though...
Counter⁶-argument: There seems to be some process which reallocates realityfluid within Tegmark III as well, between Everett branches. I think this is a hint that the “Tegmark III entire is a single program for the purposes of anthropics/from Tegmark IV’s point of view” idea is somehow wrong.
Wait, none of that actually helps; you’re right. If we can specify the full state of the universe/multiverse at any one moment, the rest of its history can be generated from that moment. To do so most efficiently, we should pick the simplest-to-describe state, and there we would benefit from having some structure. But as long as we have one simple-to-describe state, we can have all the other states be arbitrarily unstructured, with no loss of simplicity. So what we should expect is a history with at least one moment of structure (e. g., the initial conditions) that can then immediately dissolve into chaos.
To impose structure on the entire history, we do have to introduce some source of randomness that interferes in the state-transition process, making it impossible to deterministically compute later states from early ones. I. e., the laws of physics themselves have to be “incomplete”/stochastic, such that they can’t be used as the decompression algorithm. I do have some thoughts on why that may (effectively) be the case, but they’re on a line of reasoning I don’t really trust.
… Alternatively, what if the most compact description of the lowest-level state at any given moment routes through describing the entire multi-level history? I. e., what if even abstractions that exist in the distant future shed some light at the present lowest-level state, and they do so in a way that’s cheaper than specifying the lowest-level state manually?
Suppose the state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate this universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Given all that, is it plausible that far-future abstractions summarize redundant information stored in the current state? Such that specifying the lowest-level state up to the needed precision is cheaper by describing the future history, rather than by manually specifying the position of every particle (or, rather, the finer details of the universal wavefunction).
… Yes, I think? Like, consider the state Sn, with some high-level system A existing in it. Suppose we want to infer S0 from Sn. How much information does A tell us about S0? Intuitively, quite a lot: for A to end up arising, many fine details in the distant past had to line up just right. Thus, knowing about A likely gives us more bits about the exact low-level past state than the description length of A itself.
Ever-further-in-the-future high-level abstractions essentially serve as compressed information about sets of ever-more-distant decimal-expansion digits of past lowest-level states. As long as an abstraction takes fewer bits to specify than the bits it communicates about the initial conditions, its presence decreases that initial state’s description length.
This is basically just the scaled-up version of counter²-argument from the collapsible. If an unstructured state deterministically evolves into a structured state, those future structures are implicit in its at-a-glance-unstructured form. Thus, the more simple-to-describe high-level structures a state produces across its history, the simpler it itself is to describe. So if we want to run a universe from t=0 to t=n with a bounded simulation error, the simplest initial conditions would impose the well-abstractibility property on the whole 0-to-n interval. That recovers the property I want.
Main diff with your initial argument: the idea that the description length of the lowest-level state at any given moment effectively scales with the length of history you want to model, rather than being constant-and-finite. This makes it a question of whether any given additional period of future history is cheaper to specify by directly describing the desired future multi-level abstract state, or by packing that information into the initial conditions; and the former seems cheaper.
All that reasoning is pretty raw, obviously. Any obvious errors there?
Also, this is pretty useful. For bounty purposes, I’m currently feeling $20 on this one; feel free to send your preferred payment method via PMs.
The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird. Most of the time when events evolve into a simple state, it’s because information is destroyed. This isn’t really a counter-argument, it’s just trying to put into words what feels odd.
One thing that’s confusing to me: Why K-complexity of the low-level history? Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise? Or memory-bounded K-complexity, which might also favour multi-level descriptions.
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe.
The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird
Laws of physics under the Standard Model are reversible though, aren’t they? I think you can’t do it from within an Everett branch, because some information ends up in inaccessible-to-you parts of the universal wavefunction, but if you had access to the wavefunction itself, you would’ve been able to run it in reverse. So under the Standard Model, future states do pin down the entire past.
One thing that’s confusing to me: Why K-complexity of the low-level history?
Hm; frankly, simply because it’s the default I ran with.
Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise?
That seems like an acceptable fit. It’s defined through Kolmogorov complexity anyway, though; would it produce any qualitatively different conclusions here?
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe
So under the Standard Model, future states do pin down the entire past.
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
would it produce any qualitatively different conclusions here?
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
Interesting. Elaborate?
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.
Maybe if lots of noise is constantly being injected into the universe, this would change things. Because then the noise counts as part of the initial conditions. So the K-complexity of the universe-history is large, but high-level structure is common anyway because it’s more robust to that noise?
To summarize what the paper argues (from my post in that thread):
Suppose the microstate of a system is defined by a (set of) infinite-precision real numbers, corresponding to e. g. its coordinates in phase space.
We define the coarse-graining as a truncation of those real numbers: i. e., we fix some degree of precision.
That degree of precision could be, for example, the Planck length.
At the microstate level, the laws of physics may be deterministic and reversible.
At the macrostate level, the laws of physics are stochastic and irreversible. We define them as a Markov process, with transition probabilities P(x,y) defined as “the fraction of the microstates in the macrostate x that map to the macrostate y in the next moment”.
Over time, our ability to predict what state the system is in from our knowledge of its initial coarse-grained state + the laws of physics degrades.
Macroscopically, it’s because of the properties of the specific stochastic dynamic we have to use (this is what most of the paper is proving, I think).
Microscopically, it’s because ever-more-distant decimal digits in the definition of the initial state start influencing dynamics ever stronger. (See the multibaker map in Appendix A, the idea of “microscopic mixing” in a footnote, and also apparently Kolmogorov-Sinai entropy.)
That is: in order to better pinpoint farther-in-time states, we would have to spend more bits (either by defining more fine-grained macrostates, or maybe by locating them in the execution trace).
Thus: stochasticity, and the second law, are downstream of the fact that we cannot define the initial state with infinite precision.
I. e., it is effectively the case that there’s (pseudo)randomness injected into the state-transition process.
And if a given state has some other regularities by which it could be compactly defined, aside from defining it through the initial conditions, that would indeed decrease its description length/algorithmic entropy. So we again recover the “trajectories that abstract well throughout their entire history are simpler” claim.
Okay. I think this anthropic theory makes a falsifiable prediction (in principle). The infinite precision real numbers could be algorithmically simple, or they could be unstructured. The theory predicts that they are not algorithmically simple. If it were the case that they were algorithmically simple, we could run a solomonoff inductor on the macrostates and it would recover the full microstates (and this would probably be simpler than the abstraction-based compression).
I don’t think this works if the lowest level laws of physics are very very simple. The laws of physics at the lowest level + initial conditions are sufficient to roll out the whole history, so (in K-complexity) there’s no benefit to adding descriptions of the higher levels.
Maybe if lots of noise is constantly being injected into the universe, this would change things. Because then the noise counts as part of the initial conditions. So the K-complexity of the universe-history is large, but high-level structure is common anyway because it’s more robust to that noise?
Unless high-level structure lets you compress the initial conditions themselves, no?
Self-arguing on the topic
Counter-argument: The initial state had no structure we could exploit for compression, pure chaos.
Counter²-argument: Any given history that ended up well-abstracting corresponds to a specific inhomogeneous distribution of mass in the early universe, which defined the way the galaxies are spread across it. At least that seems to be the step that could already be compressed. If there were a step upstream of it where the state really didn’t have any structure, that unstructured state could be generated by describing the post-structure-formation state, describing the “timestamp” of the post-structure-formation state, then running physics in reverse to generate the unstructured state. So unless the later structured state fails to be lower-description-length than the earlier unstructured state, structure/abstractibility should still allow you to compress the initial state’s description, even if the structure only appears later.
Counter³-argumnent: The real “initial state” is the initial state of the quantum multiverse from which all possible Everett branches (and so all possible inhomogeneous distributions of mass, etc.) are generated. Its description length could be incredibly low, such as a uniform point singularity with no expensive-to-describe inhomogeneities whatsoever. The bits you later have to spend to describe the state of your universe are effectively spent on pinpointing the specific Everett branch you’re in, but the actual algorithm generating the whole Tegmark III multiverse did not have to do that. It just described the simple state from which all possible branches descend.
Counter⁴-argument: My understanding is that under QM/QFT, the universe doesn’t start from a singularity; it’s a general-relativity thing. QM/QFT require an initial inhomogeneous universal wavefunction to start working.
Counter⁵-argument: Perhaps the real Theory of Everything unifying QFT and GR would have an initial homogeneous singularity from which all possible Everett branches are generated, and this end result seems plausible enough that we may as well assume it right now.
I don’t know enough fundamental physics to make a confident call here. Though...
Counter⁶-argument: There seems to be some process which reallocates realityfluid within Tegmark III as well, between Everett branches. I think this is a hint that the “Tegmark III entire is a single program for the purposes of anthropics/from Tegmark IV’s point of view” idea is somehow wrong.
Wait, none of that actually helps; you’re right. If we can specify the full state of the universe/multiverse at any one moment, the rest of its history can be generated from that moment. To do so most efficiently, we should pick the simplest-to-describe state, and there we would benefit from having some structure. But as long as we have one simple-to-describe state, we can have all the other states be arbitrarily unstructured, with no loss of simplicity. So what we should expect is a history with at least one moment of structure (e. g., the initial conditions) that can then immediately dissolve into chaos.
To impose structure on the entire history, we do have to introduce some source of randomness that interferes in the state-transition process, making it impossible to deterministically compute later states from early ones. I. e., the laws of physics themselves have to be “incomplete”/stochastic, such that they can’t be used as the decompression algorithm. I do have some thoughts on why that may (effectively) be the case, but they’re on a line of reasoning I don’t really trust.
… Alternatively, what if the most compact description of the lowest-level state at any given moment routes through describing the entire multi-level history? I. e., what if even abstractions that exist in the distant future shed some light at the present lowest-level state, and they do so in a way that’s cheaper than specifying the lowest-level state manually?
Suppose the state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate this universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Given all that, is it plausible that far-future abstractions summarize redundant information stored in the current state? Such that specifying the lowest-level state up to the needed precision is cheaper by describing the future history, rather than by manually specifying the position of every particle (or, rather, the finer details of the universal wavefunction).
… Yes, I think? Like, consider the state Sn, with some high-level system A existing in it. Suppose we want to infer S0 from Sn. How much information does A tell us about S0? Intuitively, quite a lot: for A to end up arising, many fine details in the distant past had to line up just right. Thus, knowing about A likely gives us more bits about the exact low-level past state than the description length of A itself.
Ever-further-in-the-future high-level abstractions essentially serve as compressed information about sets of ever-more-distant decimal-expansion digits of past lowest-level states. As long as an abstraction takes fewer bits to specify than the bits it communicates about the initial conditions, its presence decreases that initial state’s description length.
This is basically just the scaled-up version of counter²-argument from the collapsible. If an unstructured state deterministically evolves into a structured state, those future structures are implicit in its at-a-glance-unstructured form. Thus, the more simple-to-describe high-level structures a state produces across its history, the simpler it itself is to describe. So if we want to run a universe from t=0 to t=n with a bounded simulation error, the simplest initial conditions would impose the well-abstractibility property on the whole 0-to-n interval. That recovers the property I want.
Main diff with your initial argument: the idea that the description length of the lowest-level state at any given moment effectively scales with the length of history you want to model, rather than being constant-and-finite. This makes it a question of whether any given additional period of future history is cheaper to specify by directly describing the desired future multi-level abstract state, or by packing that information into the initial conditions; and the former seems cheaper.
All that reasoning is pretty raw, obviously. Any obvious errors there?
Also, this is pretty useful. For bounty purposes, I’m currently feeling $20 on this one; feel free to send your preferred payment method via PMs.
The idea that there’s a simple state in the future, that still pins down the entire past, seems possible but weird. Most of the time when events evolve into a simple state, it’s because information is destroyed. This isn’t really a counter-argument, it’s just trying to put into words what feels odd.
One thing that’s confusing to me: Why K-complexity of the low-level history? Why not, for example, Algorithmic Minimal Sufficient Statistic, which doesn’t count the uniform noise? Or memory-bounded K-complexity, which might also favour multi-level descriptions.
I think I prefer frequentist justifications for complexity priors, because they explain why it works even on small parts of the universe.
Laws of physics under the Standard Model are reversible though, aren’t they? I think you can’t do it from within an Everett branch, because some information ends up in inaccessible-to-you parts of the universal wavefunction, but if you had access to the wavefunction itself, you would’ve been able to run it in reverse. So under the Standard Model, future states do pin down the entire past.
Hm; frankly, simply because it’s the default I ran with.
That seems like an acceptable fit. It’s defined through Kolmogorov complexity anyway, though; would it produce any qualitatively different conclusions here?
Interesting. Elaborate?
Good point, I was thinking in terms of toy markov chains rather than real physics when I said that.
It would change the conclusions a lot if the initial conditions were unstructured noise, and the laws of physics were very simple, because then the AMSS would just contain the laws of physics and there’d be no compression benefit from multi-level structure.
It requires something like an IID assumption, so it’s fairly useless for your purposes. But once we’ve got that assumption, then we can bound |empirical error—generalisation error| with a function of hypothesis complexity and the number of data points (Chapter 7.2 in Understanding Machine Learning). So we can say simpler hypotheses will continue to work roughly as well as they worked on the training data, without any reference to whether they are true or not, or any philosophical justification of the prior.
The IID assumption really sucks though, I really want there to be some weaker assumption that lets us conclude a similar thing. And intuitively this should be a thing that exists, because in science and in real life, we constantly learn to use simple approximate rules-of-thumb, but we justify them in a similar way. “The rule has a good track record and I don’t see any reason it won’t work in this next specific case”.
There’s a somewhat more Bayesian way of doing the same thing, where you do logical induction over whether the “true underlying hypothesis” implies the rule-of-thumb you’ve observed. But I think if you pull apart how the logical induction is working there, it has to be doing something like the frequentist thing.
Some new data on that point:
To summarize what the paper argues (from my post in that thread):
I. e., it is effectively the case that there’s (pseudo)randomness injected into the state-transition process.
And if a given state has some other regularities by which it could be compactly defined, aside from defining it through the initial conditions, that would indeed decrease its description length/algorithmic entropy. So we again recover the “trajectories that abstract well throughout their entire history are simpler” claim.
Okay. I think this anthropic theory makes a falsifiable prediction (in principle). The infinite precision real numbers could be algorithmically simple, or they could be unstructured. The theory predicts that they are not algorithmically simple. If it were the case that they were algorithmically simple, we could run a solomonoff inductor on the macrostates and it would recover the full microstates (and this would probably be simpler than the abstraction-based compression).
But this explanation doesn’t sit well with me, because under this kind of prior, the fundamental laws of physics being so simple is really surprising.