I’ve been thinking about it in terms of “but which language are we using to compute the complexity of our universe/laws of physics?”. Usually I likewise just go “only matters up to an additive constant, just assume we’re not using a Turing tarpit and we’re probably good”. If we do dig into it, though, what can we conclude?
Some thoughts:
What is the “objectively correct” reference language?
We should, of course, assume that the algorithm computing our universe is simple to describe in terms of the “natural” reference language, due to the simplicity prior. I. e., it should have support for the basic functions our universe’s physics computes. I think that’s already equivalent to “the machine can run our physics without insane implementation size”.
On the flip side, it’s allowed to lack support for functions our universe can’t cheaply compute. For example, it may not have primitive functions for solving NP-complete problems. (In theory, I think there was nothing stopping physics from having fundamental particles that absorb Traveling Salesman problems and near-instantly emit their solutions.)
Now suppose we also assume that our observations are sampled from the distribution over all observers in Tegmark 4. This means that when we’re talking about the language/TM underlying it, we’re talking about some “natural”, “objective” reference language.
What can we infer about it?
First, as mentioned, we should assume the reference language is not a Turing tarpit. After all, if we allowed reality to “think” in terms of some arbitrarily convoluted Turing-tarpit language, we could arbitrarily skew the simplicity prior.
But what is a “Turing tarpit” in that “global”/”objective” sense, not defined relative to some applications/programs? Intuitively, it feels like “one of the normal, sane languages that could easily implement all the other sane languages” should be possible to somehow formalize...
Which is to say: when we’re talking about the Kolmogorov complexity of some algorithm, in what language are we measuring it? Intuitively, we want to, in turn, pick one of the “simplest” languages to define.[1] But what language do we pick for measuring this language’s complexity? An infinite recursion follows.
Intuitively, there’s perhaps some way to short-circuit that recursion. (Perhaps by somehow defining the complexity of a language by weighing its complexity across “all” languages while prioritizing the opinions of those languages which are themselves simple in terms of whatever complexity measure this expression defines? Or something along those lines, circular definitions not always a problem. (Though see an essay Tsvi linked to which breaks down why many of those definitions don’t work.))
Regardless, if something like this is successful, we’ll get a “global” definition of what counts as a simple/natural language. This would, in turn, allow us to estimate the “objective” complexity of various problems, by measuring the length of their solutions in terms of that natural language (i. e., the length of the execution trace of a computation solving the problem). This would perhaps show that some problems are “objectively” hard, such as some theoretical/philosophical problems or the NP-complete problems.
The speed prior
What if we try to compute the complexity not of the laws of physics, but of a given observer-moment/universe-state, and penalize the higher-complexity ones?
In chaotic systems, this actually works out to the speed prior: i. e., to assuming that the later steps of a program have less realityfluid than the early ones. Two lines of reasoning:
The farther in time a state is,[2] the more precisely you have to specify the initial conditions in order to hit it.
Justification: Suppose the program’s initial state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate the universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Aside from picking the initial state that generates the observation, you also have to pinpoint that observation in the execution trace of the program. It can be as easy as defining the time-step (if you’re working with classical mechanics), or as difficult as pointing at a specific Everett branch. And pinpointing generally gets more expensive with time (even in the trivial case of “pick a time-step”, the length of the number you have to provide grows).
Anthropically, this means that the computations implementing us are (relatively) stable, and produce “interesting” states (relatively) quickly/in few steps.
Anyway, digging into the paper now...
1. Oh, I see it’s likewise concerned with the description length of states:
Gács [23] defines the coarse-grained algorithmic entropy of any individual state: roughly speaking, it is the number of bits of information that a fixed computer needs in order to identify the state’s coarse-grained cell. For example, a state in which all particles are concentrated in one location would have low entropy, because the repeated coordinates can be printed by a short program. If the coarse graining in question is Markovian, then Levin’s [24] law of randomness conservation says that the algorithmic entropy seldom decreases. In physical terms, we will come to see this as a vast generalization of the second law of thermodynamics
2. The way the paper justifies the second law of thermodynamics is neat.
My understanding of that
Suppose the microstate of a system is defined by a (set of) infinite-precision real numbers, corresponding to e. g. its coordinates in phase space.
We define the coarse-graining as a truncation of those real numbers: i. e., we fix some degree of precision.
That degree of precision could be, for example, the Planck length.
At the microstate level, the laws of physics may be deterministic and reversible.
At the macrostate level, the laws of physics are stochastic and irreversible. We define them as a Markov process, with transition probabilities P(x,y) defined as “the fraction of the microstates in the macrostate x that map to the macrostate y in the next moment”.
Over time, our ability to predict what state the system is in from our knowledge of its initial coarse-grained state + the laws of physics degrades.
Macroscopically, it’s because of the properties of the specific stochastic dynamic we have to use (this is what most of the paper is proving, I think).
Microscopically, it’s because ever-more-distant decimal digits in the definition of the initial state start influencing dynamics ever stronger. (See the multibaker map in Appendix A, the idea of “microscopic mixing” in a footnote, and also apparently Kolmogorov-Sinai entropy.)
That is: in order to better pinpoint farther-in-time states, we would have to spend more bits (either by defining more fine-grained macrostates, or maybe by locating them in the execution trace).
Thus: stochasticity, and the second law, are downstream of the fact that we cannot define the initial state with infinite precision.
3. The part about incomputability being necessary is also interesting, metaphysically.
Why must it be impossible to prove lower bounds on Kolmogorov complexity?
So, Kolmogorov complexity is upper-semicomputable. This means that, for some x:
You can prove an upper bound on K(x), just by finding a program that computes x.
You can only prove a lower bound on K(x) using a program p with K(p)>K(x). Meaning, you can’t use any fixed-size program (or formal system) to prove arbitrarily high complexity.
Imagine if it were otherwise, if some p much smaller than K(x) could prove a lower bound on K(x). Then you could use that p to cheaply pinpoint x: by setting up a program that goes through programs in order, uses p to estimate the lower bound on their K(x), then outputs the first program whose complexity is above a threshold. Which would simultaneously functions as an upper bound on K(x): since our small program was able to compute it, K(x) can’t be higher than K(p).
Thus, in order for arbitrarily complex states/programs to exist, it must be impossible to prove that they are complex.
Why? Why does that have to be the case?
Intuitively, it’s because “proving” complexity requires pointing at specific features of the state x and explaining why exactly they are complex. That is, your formal language must be expressive enough to precisely talk about those features, in their full detail. If, however, you can get away with using some abstractions/generalizations to prove x‘s complexity, that by definition decreases x’s complexity.
Impromptu poll: is structuring long-form comments this way, with collapsibles for topics, convenient, or should I have just used titles? Please react with thumbs up/down to the following statement: “collapsibles good”.
All that said,
But this smells really promising to me [...] as a more principled way to tackle bounded rationality of embedded systems.
I’m curious what you have in mind here. I’ve kind of been treating my thinking on those topics as basically recreational/a guilty pleasure. The possibility that there’s something actually useful here interests me.
If we need to use a Turing machine which is roughly equivalent to physics, then a natural next step is to drop the assumption that the machine in question is Turing complete. Just pick some class of machines which can efficiently simulate our physics, and which can be efficiently implemented in our physics. And then, one might hope, the sort of algorithmic thermodynamic theory the paper presents can carry over to that class of machines.
Probably there are some additional requirements for the machines, like some kind of composability, but I don’t know exactly what they are.
This would also likely result in a direct mapping between limits on the machines (like e.g. limited time or memory) and corresponding limits on the physical systems to which the theory applies for those machines.
The resulting theory would probably read more like classical thermo, where we’re doing thought experiments involving fairly arbitrary machines subject to just a few constraints, and surprisingly general theorems pop out.
I’ve been thinking about it in terms of “but which language are we using to compute the complexity of our universe/laws of physics?”. Usually I likewise just go “only matters up to an additive constant, just assume we’re not using a Turing tarpit and we’re probably good”. If we do dig into it, though, what can we conclude?
Some thoughts:
What is the “objectively correct” reference language?
We should, of course, assume that the algorithm computing our universe is simple to describe in terms of the “natural” reference language, due to the simplicity prior. I. e., it should have support for the basic functions our universe’s physics computes. I think that’s already equivalent to “the machine can run our physics without insane implementation size”.
On the flip side, it’s allowed to lack support for functions our universe can’t cheaply compute. For example, it may not have primitive functions for solving NP-complete problems. (In theory, I think there was nothing stopping physics from having fundamental particles that absorb Traveling Salesman problems and near-instantly emit their solutions.)
Now suppose we also assume that our observations are sampled from the distribution over all observers in Tegmark 4. This means that when we’re talking about the language/TM underlying it, we’re talking about some “natural”, “objective” reference language.
What can we infer about it?
First, as mentioned, we should assume the reference language is not a Turing tarpit. After all, if we allowed reality to “think” in terms of some arbitrarily convoluted Turing-tarpit language, we could arbitrarily skew the simplicity prior.
But what is a “Turing tarpit” in that “global”/”objective” sense, not defined relative to some applications/programs? Intuitively, it feels like “one of the normal, sane languages that could easily implement all the other sane languages” should be possible to somehow formalize...
Which is to say: when we’re talking about the Kolmogorov complexity of some algorithm, in what language are we measuring it? Intuitively, we want to, in turn, pick one of the “simplest” languages to define.[1] But what language do we pick for measuring this language’s complexity? An infinite recursion follows.
Intuitively, there’s perhaps some way to short-circuit that recursion. (Perhaps by somehow defining the complexity of a language by weighing its complexity across “all” languages while prioritizing the opinions of those languages which are themselves simple in terms of whatever complexity measure this expression defines? Or something along those lines, circular definitions not always a problem. (Though see an essay Tsvi linked to which breaks down why many of those definitions don’t work.))
Regardless, if something like this is successful, we’ll get a “global” definition of what counts as a simple/natural language. This would, in turn, allow us to estimate the “objective” complexity of various problems, by measuring the length of their solutions in terms of that natural language (i. e., the length of the execution trace of a computation solving the problem). This would perhaps show that some problems are “objectively” hard, such as some theoretical/philosophical problems or the NP-complete problems.
The speed prior
What if we try to compute the complexity not of the laws of physics, but of a given observer-moment/universe-state, and penalize the higher-complexity ones?
In chaotic systems, this actually works out to the speed prior: i. e., to assuming that the later steps of a program have less realityfluid than the early ones. Two lines of reasoning:
The farther in time a state is,[2] the more precisely you have to specify the initial conditions in order to hit it.
Justification: Suppose the program’s initial state is parametrized by real numbers. As it evolves, ever-more-distant decimal digits become relevant. This means that, if you want to simulate the universe on a non-analog computer (i. e., a computer that doesn’t use unlimited-precision reals) from t=0 to t=n starting from some initial state S0, with the simulation error never exceeding some value, the precision with which you have to specify S0 scales with n. Indeed, as n goes to infinity, so does the needed precision (i. e., the description length).
Aside from picking the initial state that generates the observation, you also have to pinpoint that observation in the execution trace of the program. It can be as easy as defining the time-step (if you’re working with classical mechanics), or as difficult as pointing at a specific Everett branch. And pinpointing generally gets more expensive with time (even in the trivial case of “pick a time-step”, the length of the number you have to provide grows).
Anthropically, this means that the computations implementing us are (relatively) stable, and produce “interesting” states (relatively) quickly/in few steps.
Anyway, digging into the paper now...
1. Oh, I see it’s likewise concerned with the description length of states:
2. The way the paper justifies the second law of thermodynamics is neat.
My understanding of that
Suppose the microstate of a system is defined by a (set of) infinite-precision real numbers, corresponding to e. g. its coordinates in phase space.
We define the coarse-graining as a truncation of those real numbers: i. e., we fix some degree of precision.
That degree of precision could be, for example, the Planck length.
At the microstate level, the laws of physics may be deterministic and reversible.
At the macrostate level, the laws of physics are stochastic and irreversible. We define them as a Markov process, with transition probabilities P(x,y) defined as “the fraction of the microstates in the macrostate x that map to the macrostate y in the next moment”.
Over time, our ability to predict what state the system is in from our knowledge of its initial coarse-grained state + the laws of physics degrades.
Macroscopically, it’s because of the properties of the specific stochastic dynamic we have to use (this is what most of the paper is proving, I think).
Microscopically, it’s because ever-more-distant decimal digits in the definition of the initial state start influencing dynamics ever stronger. (See the multibaker map in Appendix A, the idea of “microscopic mixing” in a footnote, and also apparently Kolmogorov-Sinai entropy.)
That is: in order to better pinpoint farther-in-time states, we would have to spend more bits (either by defining more fine-grained macrostates, or maybe by locating them in the execution trace).
Thus: stochasticity, and the second law, are downstream of the fact that we cannot define the initial state with infinite precision.
3. The part about incomputability being necessary is also interesting, metaphysically.
Why must it be impossible to prove lower bounds on Kolmogorov complexity?
So, Kolmogorov complexity is upper-semicomputable. This means that, for some x:
You can prove an upper bound on K(x), just by finding a program that computes x.
You can only prove a lower bound on K(x) using a program p with K(p)>K(x). Meaning, you can’t use any fixed-size program (or formal system) to prove arbitrarily high complexity.
Imagine if it were otherwise, if some p much smaller than K(x) could prove a lower bound on K(x). Then you could use that p to cheaply pinpoint x: by setting up a program that goes through programs in order, uses p to estimate the lower bound on their K(x), then outputs the first program whose complexity is above a threshold. Which would simultaneously functions as an upper bound on K(x): since our small program was able to compute it, K(x) can’t be higher than K(p).
Thus, in order for arbitrarily complex states/programs to exist, it must be impossible to prove that they are complex.
Why? Why does that have to be the case?
Intuitively, it’s because “proving” complexity requires pointing at specific features of the state x and explaining why exactly they are complex. That is, your formal language must be expressive enough to precisely talk about those features, in their full detail. If, however, you can get away with using some abstractions/generalizations to prove x‘s complexity, that by definition decreases x’s complexity.
Impromptu poll: is structuring long-form comments this way, with collapsibles for topics, convenient, or should I have just used titles? Please react with thumbs up/down to the following statement: “collapsibles good”.
All that said,
I’m curious what you have in mind here. I’ve kind of been treating my thinking on those topics as basically recreational/a guilty pleasure. The possibility that there’s something actually useful here interests me.
Since that would allow its emulators to be common across Tegmark IV, which would, in turn, give a bump to any algorithms simple in its terms.
More specifically, the first occurrence of that observation.
What I have in mind re:boundedness...
If we need to use a Turing machine which is roughly equivalent to physics, then a natural next step is to drop the assumption that the machine in question is Turing complete. Just pick some class of machines which can efficiently simulate our physics, and which can be efficiently implemented in our physics. And then, one might hope, the sort of algorithmic thermodynamic theory the paper presents can carry over to that class of machines.
Probably there are some additional requirements for the machines, like some kind of composability, but I don’t know exactly what they are.
This would also likely result in a direct mapping between limits on the machines (like e.g. limited time or memory) and corresponding limits on the physical systems to which the theory applies for those machines.
The resulting theory would probably read more like classical thermo, where we’re doing thought experiments involving fairly arbitrary machines subject to just a few constraints, and surprisingly general theorems pop out.