I’ve thought about something very similar before, and the conclusion I came to was that the number of copies in a world makes no difference to its likelihood. As far as I can tell, the disagreement is here:
“But I’m still confused. Because it still requires information to say that I’m person #1 or person #10^10. Even if we assume that it’s equally easy to specify where a person is in both theories, it just plain old takes more bits to say 10^10 than it does to say 1.”
“The amount of bits it takes to just say the number of the person adds log(n) bits to the length of the hypothesis. And so when I evaluate how likely that hypothesis is, which decreases exponentially as you add more bits, it turns out to actually be n times less likely. If we treat T1 and T2 as two different collections of numbered hypotheses, and add up all the probability in the collections, they don’t add linearly, just because of the bits it takes to do the numbering. Instead, they add like the (discrete) integral of 1/n, which brings us back to log(n) scaling!”
In T1, there is only one copy of you, so you dont need any bits to specify which one you are. In T2, there is a trillion copies of you, so you need log2(a trillion) ~= 40 bits to specify which one you are. This makes each of those hypothesis (1/2)40=1/(1trillion) times less likely, and since theres a trillion of them, it cancels exactly.
Whereas you seem to think that saying you are #1/1 000 000 000 000 takes fewer bits than saying you are #538 984 236 587⁄1 000 000 000 000. I can see how you get that idea when you imagine physically writing them out, but the thing that allows you to skip initial 0s in physical writing is that you have non-digit signs that tell you when the number encoding is over. You would need at least one such sign as a terminal character if you wanted to represent them like that in the computer, so you would actually need 3 signs instead of two. I’m pretty sure that comes out worse overall in terms of information theory then taking a set 40 bits.
Or perhaps an easier way to see the problem: If we take your number encoding seriously, it implies that “T2 and I’m person #1” is more likely than “T2 and I’m person #10^10″, since it would have fewer bits. But the order in which we number the people is arbitrary. Clearly something has gone wrong upstream from that.
I’ve thought about something very similar before, and the conclusion I came to was that the number of copies in a world makes no difference to its likelihood. As far as I can tell, the disagreement is here:
In T1, there is only one copy of you, so you dont need any bits to specify which one you are. In T2, there is a trillion copies of you, so you need log2(a trillion) ~= 40 bits to specify which one you are. This makes each of those hypothesis (1/2)40=1/(1trillion) times less likely, and since theres a trillion of them, it cancels exactly.
Whereas you seem to think that saying you are #1/1 000 000 000 000 takes fewer bits than saying you are #538 984 236 587⁄1 000 000 000 000. I can see how you get that idea when you imagine physically writing them out, but the thing that allows you to skip initial 0s in physical writing is that you have non-digit signs that tell you when the number encoding is over. You would need at least one such sign as a terminal character if you wanted to represent them like that in the computer, so you would actually need 3 signs instead of two. I’m pretty sure that comes out worse overall in terms of information theory then taking a set 40 bits.
Or perhaps an easier way to see the problem: If we take your number encoding seriously, it implies that “T2 and I’m person #1” is more likely than “T2 and I’m person #10^10″, since it would have fewer bits. But the order in which we number the people is arbitrary. Clearly something has gone wrong upstream from that.