Consider the domain of bit streams—to avoid having to deal with infinity, let’s take some large but finite length, say a trillion bits. Then there are 2^trillion possible bit streams. Now restrict our attention to just those that begin with a particular ordered pattern, say the text of Hamlet, and choose one of those at random. (We can run this experiment on a real computer by taking a copy of said text and appending enough random noise to bring the file size up to a trillion bits.) What can we say about the result?
Well, almost all bit streams that begin with the text of Hamlet, consist of just random noise thereafter, so almost certainly the one we choose will lapse into random noise as soon as the chosen text ends.
Suppose we go about things in a different way and instead of choosing a bit stream directly, we take our domain as that of programs a trillion bits long, and then take the first trillion bits of the program’s output as our bit stream. Now restrict our attention to just those programs whose output begins with the text of Hamlet, and choose one of those at random. What can we say about the result this time?
It’s possible that the program we chose consists of print ”...”, i.e. the output bit stream is just embedded in the program code. Then our conclusion from choosing a bit stream directly still applies.
But this is highly unlikely. The entropy of English text has been estimated at about one bit per character. In other words, there exist subroutines that will output the text of Hamlet, that are about eight times shorter than print ”...”. That in turn means that in our domain of programs a trillion bits long, exponentially more programs contain the compact subroutine than the literal print statement. Therefore our randomly selected program is almost certainly printing the text of Hamlet by using a compact subroutine, not a literal print statement.
But the compact subroutine will probably not lapse into random noise as soon as Hamlet is done. It will continue to generate… probably not a meaningful sequel, Hamlet doesn’t constrain the universe enough for that, but at least something that bears some syntactic resemblance to English text.
For the constraint the output bit stream must begin with the text of Hamlet, substitute at least one observer must exist. We are then left with the conclusion that if the universe were randomly chosen from all bit streams, we should be Boltzmann brains, observing just enough order for an instance of observation, with an immediate lapse back into noise. As we observe continuing order, we may conclude that the universe was randomly chosen from programs (or mathematical descriptions, if the universe is not restricted to be computable) so that it was selected to contain a compact generator of the patterns that produced us, therefore a generator that will continue producing order.
Note that this is an independent derivation of Solomonoff’s result that future data should be expected to be that which would be generated by the shortest program that would produce past data. (I actually figured out the above argument as a way to refute Hume’s claim that induction is not logical, before I heard of Solomonoff induction.)
It also works equally well if output data is allowed to be infinite, or even uncountably infinite in size (e.g. continuum physics), because uncountably infinite data can still be generated by a formal definition of finite size.
That all makes sense and you put it more clearly than I’ve seen before, but I dispute the implication that finding that our local universe is the result of a compact generator implies very much about the large-scale structure of an ensemble universe. For example imagine pockets of local universes that look all nice and neat from the inside yet are completely alien to aliens in a far-off universe pocket—”far off” being determined by the Turing languages for their respective universal priors, say. For a slightly more elegant variation on the idea I made the same argument here. Such an ensemble might be “uniform” and even devoid of any information content—see Standish’s Theory of Nothing—yet could look very rich from the inside. Does your reasoning eliminate this possibility in a way that I’m not seeing?
Edit: I was assuming you mean “ensemble” when you say “universe” but you might not have actually been implying this seemingly much stronger claim?
I don’t understand your objection. I would take “ensemble” to roughly map to what I meant by “domain”. Certainly the whole ensemble has little or no information content. You can’t really look at an ensemble from the inside, only your own universe. Does any of that clarify anything?
“far off” being determined by the Turing languages for their respective universal priors, say.
You are raising the objection that the Solomonoff prior takes the language as a parameter? True, but I’m not sure how that can be helped; in practice it amounts to only a small additive constant on program complexity, and in any case it’s not like there’s any competing theory that does the job without taking the language as a parameter. Besides, it doesn’t affect the point I was making.
You can’t really look at an ensemble from the inside, only your own universe. Does any of that clarify anything?
I think so, in the sense that I think we basically understand each other; but I’m not sure why you agree but seem uninterested in the idea that “certainly the whole ensemble has little or no information content”. Do you think that’s all there really is to say on the matter? (That sounds reasonable, I guess I just still feel like there’s more to the answer, or something.)
Well, I am interested in it in the sense that one of the things that attracted me to the multiverse theory in the first place was its marvelous economy of assumption. I’m not sure there is anything much else specifically to be said about that, though.
That in turn means that in our domain of programs a trillion bits long, exponentially more programs contain the compact subroutine than the literal print statement.
Are you sure this is right? There’s exponentially many different print statements. Do you have an argument why they should have low combined weight?
The number of N-bit print statements whose output starts with the n-bit Hamlet is exactly 2^(N-n). If K(Hamlet)=n/6, then there are at least 2^(N-n/6) programs whose output starts with Hamlet. Probably more.
EDIT: Ooops, I didn’t prove anything about the structuredness of the remaining N-n bits, so this is not too useful in itself.
My reasoning is this:
Consider the domain of bit streams—to avoid having to deal with infinity, let’s take some large but finite length, say a trillion bits. Then there are 2^trillion possible bit streams. Now restrict our attention to just those that begin with a particular ordered pattern, say the text of Hamlet, and choose one of those at random. (We can run this experiment on a real computer by taking a copy of said text and appending enough random noise to bring the file size up to a trillion bits.) What can we say about the result?
Well, almost all bit streams that begin with the text of Hamlet, consist of just random noise thereafter, so almost certainly the one we choose will lapse into random noise as soon as the chosen text ends.
Suppose we go about things in a different way and instead of choosing a bit stream directly, we take our domain as that of programs a trillion bits long, and then take the first trillion bits of the program’s output as our bit stream. Now restrict our attention to just those programs whose output begins with the text of Hamlet, and choose one of those at random. What can we say about the result this time?
It’s possible that the program we chose consists of print ”...”, i.e. the output bit stream is just embedded in the program code. Then our conclusion from choosing a bit stream directly still applies.
But this is highly unlikely. The entropy of English text has been estimated at about one bit per character. In other words, there exist subroutines that will output the text of Hamlet, that are about eight times shorter than print ”...”. That in turn means that in our domain of programs a trillion bits long, exponentially more programs contain the compact subroutine than the literal print statement. Therefore our randomly selected program is almost certainly printing the text of Hamlet by using a compact subroutine, not a literal print statement.
But the compact subroutine will probably not lapse into random noise as soon as Hamlet is done. It will continue to generate… probably not a meaningful sequel, Hamlet doesn’t constrain the universe enough for that, but at least something that bears some syntactic resemblance to English text.
For the constraint the output bit stream must begin with the text of Hamlet, substitute at least one observer must exist. We are then left with the conclusion that if the universe were randomly chosen from all bit streams, we should be Boltzmann brains, observing just enough order for an instance of observation, with an immediate lapse back into noise. As we observe continuing order, we may conclude that the universe was randomly chosen from programs (or mathematical descriptions, if the universe is not restricted to be computable) so that it was selected to contain a compact generator of the patterns that produced us, therefore a generator that will continue producing order.
Note that this is an independent derivation of Solomonoff’s result that future data should be expected to be that which would be generated by the shortest program that would produce past data. (I actually figured out the above argument as a way to refute Hume’s claim that induction is not logical, before I heard of Solomonoff induction.)
It also works equally well if output data is allowed to be infinite, or even uncountably infinite in size (e.g. continuum physics), because uncountably infinite data can still be generated by a formal definition of finite size.
That all makes sense and you put it more clearly than I’ve seen before, but I dispute the implication that finding that our local universe is the result of a compact generator implies very much about the large-scale structure of an ensemble universe. For example imagine pockets of local universes that look all nice and neat from the inside yet are completely alien to aliens in a far-off universe pocket—”far off” being determined by the Turing languages for their respective universal priors, say. For a slightly more elegant variation on the idea I made the same argument here. Such an ensemble might be “uniform” and even devoid of any information content—see Standish’s Theory of Nothing—yet could look very rich from the inside. Does your reasoning eliminate this possibility in a way that I’m not seeing?
Edit: I was assuming you mean “ensemble” when you say “universe” but you might not have actually been implying this seemingly much stronger claim?
I don’t understand your objection. I would take “ensemble” to roughly map to what I meant by “domain”. Certainly the whole ensemble has little or no information content. You can’t really look at an ensemble from the inside, only your own universe. Does any of that clarify anything?
You are raising the objection that the Solomonoff prior takes the language as a parameter? True, but I’m not sure how that can be helped; in practice it amounts to only a small additive constant on program complexity, and in any case it’s not like there’s any competing theory that does the job without taking the language as a parameter. Besides, it doesn’t affect the point I was making.
I think so, in the sense that I think we basically understand each other; but I’m not sure why you agree but seem uninterested in the idea that “certainly the whole ensemble has little or no information content”. Do you think that’s all there really is to say on the matter? (That sounds reasonable, I guess I just still feel like there’s more to the answer, or something.)
Well, I am interested in it in the sense that one of the things that attracted me to the multiverse theory in the first place was its marvelous economy of assumption. I’m not sure there is anything much else specifically to be said about that, though.
Are you sure this is right? There’s exponentially many different print statements. Do you have an argument why they should have low combined weight?
The number of N-bit print statements whose output starts with the n-bit Hamlet is exactly 2^(N-n). If K(Hamlet)=n/6, then there are at least 2^(N-n/6) programs whose output starts with Hamlet. Probably more.
EDIT: Ooops, I didn’t prove anything about the structuredness of the remaining N-n bits, so this is not too useful in itself.