The Presumptuous Philosopher, self-locating information, and Solomonoff induction
I need help working some ideas out.
If you’re not familiar with the Presumptuous Philosopher, it’s a thought experiment due to Bostrom about predicting the laws of the universe based on anthropic reasoning:
It is the year 2100 and physicists have narrowed down the search for a theory of everything to only two remaining plausible candidate theories: T1 and T2 (using considerations from super-duper symmetry). According to T1 the world is very, very big but finite and there are a total of a trillion trillion observers in the cosmos. According to T2, the world is very, very, very big but finite and there are a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent between these two theories.
Physicists are preparing a simple experiment that will falsify one of the theories. Enter the presumptuous philosopher: “Hey guys, it is completely unnecessary for you to do the experiment, because I can already show you that T2 is about a trillion times more likely to be true than T1!”
If you’re not familiar with Solomonoff induction, the relevant aspects are that it tries to predict your sequence of observations (not the laws of physics, at least not directly), and that it assigns probability to different sequences that decreases exponentially in how complicated those sequences of observations are. By “complicated” we don’t merely mean how verbose the laws of physics are, we also have to include how complicated it is to go from the laws of physics to specifying your specific observations.
Imagine that our Presumptuous Philosopher has just explained to the physicists that they already know that T2 is a trillion times more likely, because it has more observers in it.
But one of the engineers present is a Solomonoff inductor, and they ask “So does this mean that I should expect the experiment to show us T2, even though the theories are of similar complexity? How does the number of observers change the probability, if all I care about is how complicated it is to predict my sequence of observations?”
“Excellent question,” says the Presumptuous Philosopher. “Think of T1 and T2 not as two different hypotheses, but as two different collections of hypothetical ways to predict your observations, with one actual hypothesis for each copy of you in the theory. If there are a trillion times more ’you’s in T2, then there are a trillion times as many ways to predict your sequence of observations, which should be treated to a fair share of the probability.”
“Okay, so you’re saying the actual hypotheses that predict my observations, which I should assign probability to according to their complexity, are things like ‘T1 and I’m person #1’ or ‘T2 and I’m person #10^10’?” says the Solomonoff inductor.
“But I’m still confused. Because it still requires information to say that I’m person #1 or person #10^10. Even if we assume that it’s equally easy to specify where a person is in both theories, it just plain old takes more bits to say 10^10 than it does to say 1.”
“The amount of bits it takes to just say the number of the person adds log(n) bits to the length of the hypothesis. And so when I evaluate how likely that hypothesis is, which decreases exponentially as you add more bits, it turns out to actually be n times less likely. If we treat T1 and T2 as two different collections of numbered hypotheses, and add up all the probability in the collections, they don’t add linearly, just because of the bits it takes to do the numbering. Instead, they add like the (discrete) integral of 1/n, which brings us back to log(n) scaling!” the Solomonoff inductor exclaims.
“Are you sure that’s right?” says the Presumptuous philosopher. “If there was exactly 1 of you in T1, and 1 trillion of you in T2, then you’re saying that the ratio of probabilities should actually be log(a trillion) to one, or only about 28 times more likely! Does that really make sense, that if there are trillion of you who are going to see the experiment turn out one way, and only one of you who is going to see it turn out the other way, the correct probability ratio is.… 28?”
“Well, when you put it that way, it does sound a little...”
“And what if there were 10 of you in T1 and 10 trillion in T2? When you multiply both populations by a constant, you only add to the log, and so the ratio changes! Now we’re supposed to believe that the probability ratio is more like (28+log(10))/(1+log(10)) = 8. Eight?! By the time we get up to a trillion of you in T1 and a trillion trillion in T2, this would imply that T2 is only twice as likely, despite still having a trillion times more copies of you than T1.”
So, I’m really not sure what conclusion to draw from this. Do we hold that there’s an important symmetry when we multiply the sizes of the universes, and therefore Solomonoff induction has to be modified? Or do we trust that Solomonoff induction is pretty theoretically sound, and reject the paradigm of bundling hypotheses about sequences of observations into “universes” that we treat as having symmetries?
Or maybe neither, and I’ve posed the question badly, or misinterpreted how Solomonoff induction applies here. But if so, I’d really like to know how.