A: Wait, if simplicity changes preferences, but does not change the level of existence, how do you explain the fact that we appear to be in a world that is simple? Isn’t that a priori extremely unlikely?
B: This is where it gets a little bit fuzzy, but I do not think that question makes sense. Unlikely by what measure? You are presupposing an existence measure on the collection of theoretical worlds just to ask that question.
I saw a good explanation of this point (in the #lesswrong IRC sometime last month), which I’ll try to reproduce here.
Imagine that there is a specific flavor of Universal Turing Machine which runs everything in Tegmark IV; all potential-universe objects are encoded as tapes for this UTM to simulate.
Take a simple set of rules. There is only one tape that corresponds to these precise rules. Now add an incredibly-specific edge case which applies in exactly one situation that could possibly arise, but otherwise have the same simple set of rules hold; this adds another tape that will produce precisely the same observations from the inside as the simple rules, except in one case that is grotesquely uncommon and very likely will never occur (because it happens to require a state unreachable from the initial conditions). As far as the inhabitants of these universes are concerned, they are the same.
Of course, there are an infinite variety of these edge cases, each of which occurs in a vanishingly specific not-necessarily-reachable circumstance, and nearly every combination of which results in a universe that looks precisely like the simple set of rules. And detailing each edge case should add a finite amount of tape used, added onto the tape needed for the description of the simple rules.
So if you take a more complex set of rules which takes twice as much tape to describe, and compare it with the set of all tape-length-equal instantiations of apparently-those-simple-rules, you’ll find that, taking into account the many ways that you can make universes which are like-the-simple-rules-except-where-very-specifically-noted, universes where the correct generalization is the simple rules are more common than universes where the correct generalization is the complex rules, by a factor dependent exclusively on the Kolmogorov complexity.
TL;DR: We probably don’t live in a simple universe, but we probably live in a universe where the correct model to form anticipations with is simple.
Take a simple set of rules. There is only one tape that corresponds to these precise rules.
Not quite. There’s an infinite number of tapes that correspond to these precise rules exactly. Each “program of length L” is also two programs of length L+1 , four programs of length L+2, and so on—the following bits are simply not read.
In general, there’s a large number of different ways how a program with same behaviour could be implemented, and for programs up to a specific length, the number is larger for programs that describe fundamentally simpler behaviours.
Not quite. There’s an infinite number of tapes that correspond to these precise rules exactly. Each “program of length L” is also two programs of length L+1 , four programs of length L+2, and so on—the following bits are simply not read.
That only works under some methods of specification of the model machine, and I don’t consider it to be a reasonable generalization to assume. It’s at best contested, and at worse semantic distinction-without-a-difference.
That only works under some methods of specification of the model machine
Should not be a problem for you to name one where it doesn’t work, right?
Universal TMs all have to permit setting up an interpreter for running the code for another UTM, you really have a lot less lee-way for “methods of specification” than you think.
If you are hung up on there being an actual difference in the output, rather than hand-waving about the special circumstances or such, you ought to provide at least a sketch of a set up (so that people can understand clearly how you get more of the simpler programs). For example, for programs that do halt, instead of halting you can make them copy extra bits from the input tape to the output tape—that would be really easy to set up. And the programs in general can be compactly made to halt after something like BusyBeaver(10) steps.
Whether those L+1, L+2, etc. count as different programs from the length-L one is, last I checked, contentious, and because theorists feel strongly about this, various different widely-used formalisms for defining what makes something a UTM disagree about whether unread tape matters. If someone pokes a hole in the argument in the great-grandparent so that the general conclusion works iff the 2 L+1, 4 L+2, etc. count as different universes, then it would become worth addressing. But as long as it works without it, I’m going to stick with the most difficult case.
Again, give an example for the assertions being made.
As for your argument, as others have pointed out, you did not prove anything about the extra length for setting up special output in very specific circumstances, or sketched how that could be accomplished. “Sticking with the most difficult case” you got into the region where you are unable to actually produce an argument. It is far less than obviously true (and may well be false) that the programs which are simple programs but with special output in very specific circumstances, are a notable fraction of large programs.
My knowledge of algorithmic information theory marks the approach advocated by private_messaging as “the best way” as established by years of experimenting with ways to specify priors over programs. I admit to little knowledge of the controversy, but agree with private_messaging’s insistence for a burden of providing-the-alternative on your part.
the approach advocated by private_messaging as “the best way”
What? There is no mention of this anywhere. I have no idea what you’re referring to with this phrase.
I’m not going to provide an alternative because it doesn’t matter. Using the alternate formulation where you count every padded program with the same results as being unique, you get the same results, with the addition of another underlying assumption. By assuming they do not count as the same, I (am attempting to) force potential objections to confront the actual basis for the argument.
So go ahead, if you like. Treat those as distinct and work out the consequences. You’ll reach precisely the same conclusions.
I’m not sure about this. This sounds like a version of this, which I agree with.
What I’m wary of is this:
each of which occurs in a vanishingly specific not-necessarily-reachable circumstance, and nearly every combination of which results in a universe that looks precisely like the simple set of rules.
This sounds like a mathematical claim that isn’t yet proven. My intuitions aren’t sharp enough to make an educated guess on it either.
The vanishingly-small edge cases I refer to would be specific descriptions describing something like “At 5:03:05 PM on Tuesday the 7th of March, 3243, if two photons collide at coordinates X,Y,Z, emit three photons each of equal energy to the incoming two in such-and-such an orientation.” Something that has an effect at most once during the entirety of the universe, and possibly not at all. This can be specified as an additional behavior in space proportional to the description of the required conditions + space proportional to the description of the required result (or more likely the diff of that result from the ordinary-rules result), which can be very small. And it’s obviously true that you can add a huge number of these before any observer could expect to detect enough of them to make a dent in the long-run confidence they had in the correctness of the simple model.
You can treat any difference more systematic than this as a different model while still having enough ability to ‘make comments’ to force the effective Kolmogorov weighting.
Why would that be more common than “create a giant blackhole at same coordinates”? Or an array of black holes, spaced by 10 light years, or the like, you get the idea.
You need to establish that little differences would be more common than giant differences I described.
You need to establish that little differences would be more common than giant differences I described.
No, I don’t, because they don’t have to be more common. They just have to be common. I didn’t include black holes, etc. in the simple version, because they’re not necessary to get the result. You could include them in the category of variations, and the conclusion would get stronger, not weaker. For most observers in the universe, there was always a giant black hole there and that’s all there is to it.
The set of small variations is a multiplier on the abundance of universes which look like Lawful Universe #N. The larger the set of small variations, the bigger that multiplier gets, for everything.
No, I don’t, because they don’t have to be more common.
You’ve been trying to show that “universes where the correct generalization is the simple rules are more common than universes where the correct generalization is the complex rules”, and you’ve been arguing that this is still true when we are not considering the longer programs that are exactly equivalent to the shorter programs.
That, so far, is an entirely baseless assertion. You only described—rather vaguely—some of the more complex programs that look like simpler programs, without demonstrating that those programs are not grossly outnumbered by the more complex programs that look very obviously more complex. Such as, for example, programs encoding an universe with apparent true randomness—done using the extra bits.
That being said, our universe does look like it has infinite complexity (due to apparent non-determinism), and as such, infinite complexity of that kind is not improbable. E.g. I can set up a short TM tape prefix that will copy all subsequent bits from the program tape to the output tape. If you pick a very long program at random, it’s not very improbable that it begins with this short prefix, and thus corresponds to a random universe with no order to it whatsoever. Vast majority of long programs beginning with this prefix will not correspond to any shorter program, as random data is not compressible on average. Perhaps a point could be made that most of very long programs correspond to universes with simple probabilistic laws.
You only described—rather vaguely—some of the more complex programs that look like simpler programs, without demonstrating that those programs are not grossly outnumbered by the more complex programs that look very obviously more complex. Such as, for example, programs encoding an universe with apparent true randomness—done using the extra bits.
No, that’s not it at all. I have described how, for every complex program that looks complex, you can construct a large number of equally complex programs that look simple, and therefore should expect simple models to be much more common than complex ones.
You’d need to show that for every complex-looking program, you can make >=n simple looking programs, which do not overlap with the other simple looking programs that you’re constructing for another complex looking program. (Because it won’t do if for every complex looking program, you’re constructing the same, say, 100 simple looking programs). I don’t even see a vague sketch of an argument for that.
edit: Hell you haven’t even defined what constitutes a complex looking program. There’s a trivial example: all programs beginning with the shortest prefix that copies all subsequent program bits verbatim onto the output tape. These programs are complex looking in the sense that vast majority of them do not have any simpler representation than they are. Those programs are also incredibly numerous.
edit2: also the whole argument completely breaks down at infinity. Observe: for every even integer, I can construct 10 odd integers (10n +1, 10n+3, …) . Does that mean a randomly chosen integer is likely to be even? No.
Because it won’t do if for every complex looking program, you’re constructing the same, say, 100 simple looking programs.
That is exactly what I’ve done, and it’s sufficient. The whole point is to justify why the Kolmogorov measure for apparent universe probability is justified starting from the assumption that all mathematical-object universes are equally likely. Demonstrating that the number of additional copies that can be made of a simpler universe relative to the more complex one is in direct proportion to the difference in Kolmogorov complexity, which is what I have done, is sufficient.
You know, it’d be a lot more helpful if it was anything remotely close to “done” rather than vaguely handwaved with some sort of fuzzy (mis)understanding of terms being discussed at it’s core. What does “difference in Kolmogorov complexity” even mean when your program of length L does not have any equivalents of length <L ? If it has no simpler equivalent, Kolmogorov’s complexity is L.
Given a program describing some “simple rules” (what ever that means, anyway), one can make a likewise large number of variations where, instead of a single photon being created somewhere obscure or under some hard to reach conditions, photons are created on a randomly spaced regular lattice over some space of conditions, for example, with some specific spacing of the points of that lattice. Which is very noticeable, and does not locally look like any “simple rules” to much anyone.
edit: note that most definitions of T.M. do not have pointers, and heads move by 1 step at a time, which actually makes it very nontrivial to do some highly localized, surgical changes to data, especially in the context of some program that’s applying same rules everywhere. So it is not obviously the case that a single point change to the world would be less code than something blatantly obvious to the inhabitants.
Who said anything about a mathematical proof? I linked a more formal exposition of the logic in a more abstract model elsewhere in this comment thread; this is an application of that principle.
The amount of space it takes to encode that difference puts a bound on the number of such differences there can be. and they add up to less than, or at least less than a constant times the weight of the original simple model.
The amount of space it takes to encode that difference puts a bound on the number of such differences there can be.
For a given description length, yes, but since there is no necessary bound on description length, there is not a necessary limit to the number of possible differences. As your description length devoted to ‘comments’ increases, you can make the responses and circumstances ever more specified, multiplying the number of worlds which resemble the simpler world, relative to the more complex one.
I do not think this works. All the complex worlds which look like this simple world put together (with Kolmogorov weight) should sum up to less than the simple alternative.
There are infinitely many ways to perturb a simple set of rules, but they get much more complex very quickly.
The point is that this is a derivation for Kolmogorov weighting. These universes are weighted naively; every universe is weighted the same regardless of complexity. From this naive weighting, after you add the nearly-undetectable variations, the Kolmogorov measure falls out naturally.
kokotajlod linked this, which uses the same logic on a more abstract example to demonstrate how the Kolmogorov measure arises. It’s significantly better written than my exposition.
The claim that we are probably in a complex world makes sense in the naive weighting, but I do not see any motivation to talk about that naive weighting at all, especially since it only seems to make sense in a finite multiverse.
Now I don’t understand you. I’ll take a stab at guessing your objection and explaining the relevant parts further:
This logical derivation works perfectly well in an infinite universe; it works for all finite description lengths, and that continues in the limit, which is the infinite universe. Every “lawful universe” is the center of a cluster in universe-space, and the size of the cluster is proportional to the simplicity (by the Kolmogorov criterion) of the central universe. This is most easily illustrated in the finite case, but works just as well in the infinite one.
I do implicitly assume that the naive weighting of ‘every mathematical object gets exactly one universe’ is inherently correct. I don’t have any justification for why this should be true, but until I get some I’m happy to treat it as axiomatic.
The conclusion can be summarized as “Assume the naive weighting of universes is true. Then from the inside, the Kolmogorov weighting will appear to be true, and a Bayesian reasoner should treat it as true.”
I saw a good explanation of this point (in the #lesswrong IRC sometime last month), which I’ll try to reproduce here.
Imagine that there is a specific flavor of Universal Turing Machine which runs everything in Tegmark IV; all potential-universe objects are encoded as tapes for this UTM to simulate.
Take a simple set of rules. There is only one tape that corresponds to these precise rules. Now add an incredibly-specific edge case which applies in exactly one situation that could possibly arise, but otherwise have the same simple set of rules hold; this adds another tape that will produce precisely the same observations from the inside as the simple rules, except in one case that is grotesquely uncommon and very likely will never occur (because it happens to require a state unreachable from the initial conditions). As far as the inhabitants of these universes are concerned, they are the same.
Of course, there are an infinite variety of these edge cases, each of which occurs in a vanishingly specific not-necessarily-reachable circumstance, and nearly every combination of which results in a universe that looks precisely like the simple set of rules. And detailing each edge case should add a finite amount of tape used, added onto the tape needed for the description of the simple rules.
So if you take a more complex set of rules which takes twice as much tape to describe, and compare it with the set of all tape-length-equal instantiations of apparently-those-simple-rules, you’ll find that, taking into account the many ways that you can make universes which are like-the-simple-rules-except-where-very-specifically-noted, universes where the correct generalization is the simple rules are more common than universes where the correct generalization is the complex rules, by a factor dependent exclusively on the Kolmogorov complexity.
TL;DR: We probably don’t live in a simple universe, but we probably live in a universe where the correct model to form anticipations with is simple.
Not quite. There’s an infinite number of tapes that correspond to these precise rules exactly. Each “program of length L” is also two programs of length L+1 , four programs of length L+2, and so on—the following bits are simply not read.
In general, there’s a large number of different ways how a program with same behaviour could be implemented, and for programs up to a specific length, the number is larger for programs that describe fundamentally simpler behaviours.
That only works under some methods of specification of the model machine, and I don’t consider it to be a reasonable generalization to assume. It’s at best contested, and at worse semantic distinction-without-a-difference.
Should not be a problem for you to name one where it doesn’t work, right?
Universal TMs all have to permit setting up an interpreter for running the code for another UTM, you really have a lot less lee-way for “methods of specification” than you think.
If you are hung up on there being an actual difference in the output, rather than hand-waving about the special circumstances or such, you ought to provide at least a sketch of a set up (so that people can understand clearly how you get more of the simpler programs). For example, for programs that do halt, instead of halting you can make them copy extra bits from the input tape to the output tape—that would be really easy to set up. And the programs in general can be compactly made to halt after something like BusyBeaver(10) steps.
Whether those L+1, L+2, etc. count as different programs from the length-L one is, last I checked, contentious, and because theorists feel strongly about this, various different widely-used formalisms for defining what makes something a UTM disagree about whether unread tape matters. If someone pokes a hole in the argument in the great-grandparent so that the general conclusion works iff the 2 L+1, 4 L+2, etc. count as different universes, then it would become worth addressing. But as long as it works without it, I’m going to stick with the most difficult case.
Again, give an example for the assertions being made.
As for your argument, as others have pointed out, you did not prove anything about the extra length for setting up special output in very specific circumstances, or sketched how that could be accomplished. “Sticking with the most difficult case” you got into the region where you are unable to actually produce an argument. It is far less than obviously true (and may well be false) that the programs which are simple programs but with special output in very specific circumstances, are a notable fraction of large programs.
My knowledge of algorithmic information theory marks the approach advocated by private_messaging as “the best way” as established by years of experimenting with ways to specify priors over programs. I admit to little knowledge of the controversy, but agree with private_messaging’s insistence for a burden of providing-the-alternative on your part.
What? There is no mention of this anywhere. I have no idea what you’re referring to with this phrase.
I’m not going to provide an alternative because it doesn’t matter. Using the alternate formulation where you count every padded program with the same results as being unique, you get the same results, with the addition of another underlying assumption. By assuming they do not count as the same, I (am attempting to) force potential objections to confront the actual basis for the argument.
So go ahead, if you like. Treat those as distinct and work out the consequences. You’ll reach precisely the same conclusions.
I’m not sure about this. This sounds like a version of this, which I agree with.
What I’m wary of is this:
This sounds like a mathematical claim that isn’t yet proven. My intuitions aren’t sharp enough to make an educated guess on it either.
It is indeed a version of that.
The vanishingly-small edge cases I refer to would be specific descriptions describing something like “At 5:03:05 PM on Tuesday the 7th of March, 3243, if two photons collide at coordinates X,Y,Z, emit three photons each of equal energy to the incoming two in such-and-such an orientation.” Something that has an effect at most once during the entirety of the universe, and possibly not at all. This can be specified as an additional behavior in space proportional to the description of the required conditions + space proportional to the description of the required result (or more likely the diff of that result from the ordinary-rules result), which can be very small. And it’s obviously true that you can add a huge number of these before any observer could expect to detect enough of them to make a dent in the long-run confidence they had in the correctness of the simple model.
You can treat any difference more systematic than this as a different model while still having enough ability to ‘make comments’ to force the effective Kolmogorov weighting.
Why would that be more common than “create a giant blackhole at same coordinates”? Or an array of black holes, spaced by 10 light years, or the like, you get the idea.
You need to establish that little differences would be more common than giant differences I described.
No, I don’t, because they don’t have to be more common. They just have to be common. I didn’t include black holes, etc. in the simple version, because they’re not necessary to get the result. You could include them in the category of variations, and the conclusion would get stronger, not weaker. For most observers in the universe, there was always a giant black hole there and that’s all there is to it.
The set of small variations is a multiplier on the abundance of universes which look like Lawful Universe #N. The larger the set of small variations, the bigger that multiplier gets, for everything.
You’ve been trying to show that “universes where the correct generalization is the simple rules are more common than universes where the correct generalization is the complex rules”, and you’ve been arguing that this is still true when we are not considering the longer programs that are exactly equivalent to the shorter programs.
That, so far, is an entirely baseless assertion. You only described—rather vaguely—some of the more complex programs that look like simpler programs, without demonstrating that those programs are not grossly outnumbered by the more complex programs that look very obviously more complex. Such as, for example, programs encoding an universe with apparent true randomness—done using the extra bits.
That being said, our universe does look like it has infinite complexity (due to apparent non-determinism), and as such, infinite complexity of that kind is not improbable. E.g. I can set up a short TM tape prefix that will copy all subsequent bits from the program tape to the output tape. If you pick a very long program at random, it’s not very improbable that it begins with this short prefix, and thus corresponds to a random universe with no order to it whatsoever. Vast majority of long programs beginning with this prefix will not correspond to any shorter program, as random data is not compressible on average. Perhaps a point could be made that most of very long programs correspond to universes with simple probabilistic laws.
No, that’s not it at all. I have described how, for every complex program that looks complex, you can construct a large number of equally complex programs that look simple, and therefore should expect simple models to be much more common than complex ones.
You’d need to show that for every complex-looking program, you can make >=n simple looking programs, which do not overlap with the other simple looking programs that you’re constructing for another complex looking program. (Because it won’t do if for every complex looking program, you’re constructing the same, say, 100 simple looking programs). I don’t even see a vague sketch of an argument for that.
edit: Hell you haven’t even defined what constitutes a complex looking program. There’s a trivial example: all programs beginning with the shortest prefix that copies all subsequent program bits verbatim onto the output tape. These programs are complex looking in the sense that vast majority of them do not have any simpler representation than they are. Those programs are also incredibly numerous.
edit2: also the whole argument completely breaks down at infinity. Observe: for every even integer, I can construct 10 odd integers (10n +1, 10n+3, …) . Does that mean a randomly chosen integer is likely to be even? No.
That is exactly what I’ve done, and it’s sufficient. The whole point is to justify why the Kolmogorov measure for apparent universe probability is justified starting from the assumption that all mathematical-object universes are equally likely. Demonstrating that the number of additional copies that can be made of a simpler universe relative to the more complex one is in direct proportion to the difference in Kolmogorov complexity, which is what I have done, is sufficient.
You know, it’d be a lot more helpful if it was anything remotely close to “done” rather than vaguely handwaved with some sort of fuzzy (mis)understanding of terms being discussed at it’s core. What does “difference in Kolmogorov complexity” even mean when your program of length L does not have any equivalents of length <L ? If it has no simpler equivalent, Kolmogorov’s complexity is L.
Given a program describing some “simple rules” (what ever that means, anyway), one can make a likewise large number of variations where, instead of a single photon being created somewhere obscure or under some hard to reach conditions, photons are created on a randomly spaced regular lattice over some space of conditions, for example, with some specific spacing of the points of that lattice. Which is very noticeable, and does not locally look like any “simple rules” to much anyone.
edit: note that most definitions of T.M. do not have pointers, and heads move by 1 step at a time, which actually makes it very nontrivial to do some highly localized, surgical changes to data, especially in the context of some program that’s applying same rules everywhere. So it is not obviously the case that a single point change to the world would be less code than something blatantly obvious to the inhabitants.
This doesn’t look remotely like a mathematical proof, though.
Who said anything about a mathematical proof? I linked a more formal exposition of the logic in a more abstract model elsewhere in this comment thread; this is an application of that principle.
private_messaging beat me to the point.
The amount of space it takes to encode that difference puts a bound on the number of such differences there can be. and they add up to less than, or at least less than a constant times the weight of the original simple model.
For a given description length, yes, but since there is no necessary bound on description length, there is not a necessary limit to the number of possible differences. As your description length devoted to ‘comments’ increases, you can make the responses and circumstances ever more specified, multiplying the number of worlds which resemble the simpler world, relative to the more complex one.
Yeah, that comment was when I thought you were talking about a simplicity weighting, not the naive weighting.
I do not think this works. All the complex worlds which look like this simple world put together (with Kolmogorov weight) should sum up to less than the simple alternative.
There are infinitely many ways to perturb a simple set of rules, but they get much more complex very quickly.
The point is that this is a derivation for Kolmogorov weighting. These universes are weighted naively; every universe is weighted the same regardless of complexity. From this naive weighting, after you add the nearly-undetectable variations, the Kolmogorov measure falls out naturally.
kokotajlod linked this, which uses the same logic on a more abstract example to demonstrate how the Kolmogorov measure arises. It’s significantly better written than my exposition.
Ok, I was completely misunderstanding you.
The claim that we are probably in a complex world makes sense in the naive weighting, but I do not see any motivation to talk about that naive weighting at all, especially since it only seems to make sense in a finite multiverse.
Now I don’t understand you. I’ll take a stab at guessing your objection and explaining the relevant parts further:
This logical derivation works perfectly well in an infinite universe; it works for all finite description lengths, and that continues in the limit, which is the infinite universe. Every “lawful universe” is the center of a cluster in universe-space, and the size of the cluster is proportional to the simplicity (by the Kolmogorov criterion) of the central universe. This is most easily illustrated in the finite case, but works just as well in the infinite one.
I do implicitly assume that the naive weighting of ‘every mathematical object gets exactly one universe’ is inherently correct. I don’t have any justification for why this should be true, but until I get some I’m happy to treat it as axiomatic.
The conclusion can be summarized as “Assume the naive weighting of universes is true. Then from the inside, the Kolmogorov weighting will appear to be true, and a Bayesian reasoner should treat it as true.”
I think I completely understand your position. Thank you for sharing. I agree with it, modulo your axiom that the naive weighting was correct.
All of my objections before were because I did not realize you were using that naive weighting.
I do have two problems with the naive weighting:
First, it has the problem that description length is a man-made construct, and is dependent on your language.
Second, there are predicates we can say about the infinite descriptions which are not measurable, and I am not sure what to do with that.