Just to clarify, guessing that there are 1546 buses maximizes the probability that you are exactly correct, but it does not minimize your expected error, since you are guessing close to many numbers (everything below 1546) that are impossible. This is known in statistics as the “German tank problem”[1] and the posterior distribution is actually not well-defined in many setups.
The intuition is that if we both saw bus 1546, and you guessed that there were 1546 buses and I guessed that there were 1547, you would be a little more likely to be correct but I would almost certainly be closer to the real number.
The Bayesian update isn’t generally well-defined because you get a divergent mean. Your implicit prior is 1/n which is an improper prior. This is fine for deriving a posterior median, which in this case happens to be about 3,100 buses, and a posterior distribution, which in this case is a truncated zeta distribution with s=2 and k=1546. But the posterior mean does not exist.
I’m not using this is a prior, I’m using it to update my existing prior (whatever that was). I believe the posterior will be well defined, so long as the prior was.
As a worked example, if I start off assuming that chance of there being n busses is 1/2^n (nice and simple, adds up to 1), then the posterior is 1/n(ln(2))(2^n) - multiply the two distributions, then divide by the integral (ln(2)) so that it adds up to 1.
No, that’s not the posterior distribution—clearly, the number of buses cannot be lower than 1546, but that distribution has material probability mass on low integers. I’m not quite sure how you got that equation.
But regardless, I think this shows where we disagree. That prior has mean 2… that’s a pretty strong assumption about the distribution of n. If you want to avoid that kind of assumption, you can get posterior distributions but not a posterior expectation.
I’m not disagreeing with that categorically—for many priors the posterior distribution is well defined. But all of those priors carry information (in the information theoretical sense) about the number of buses. If you have an uninformative reference prior, your posterior distribution does not have a mean.
You can see the sketch of this proof if you consider the likelihoods of seeing the bus for any given n. If there are 1546 buses, there was a 1/1546 chance you saw this one. If there were 1547, there was a 1/1547 chance you saw this one. This is the harmonic series, which diverges. That divergence is the fundamental issue that’s going to cause the mean to be undefined.
You can’t make claims about the posterior without setting at least some conditions on what your prior is—obviously, for some priors the posterior expectation is well-defined. (Trivially, if I already think n=2000 with probability 1, I will still think that after seeing the bus.) But I claim that all such priors make assumptions about the distribution of the possible number of buses. In the uninformative case, your posterior distribution is well-defined (as I said, it’s a truncated zeta distribution) but it does not have a finite mean.
But I claim that all such priors make assumptions about the distribution of the possible number of buses
I mean, yes, that’s the definition of a prior. How to calculate a prior is an old question in bayesianism, with different approaches—kolmogorov complexity being one.
In Gotts’ approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.
in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.
It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott’s equation.
No, that is not the definition of a prior. There are priors which imply an expected number of buses, and priors that don’t. If you select a prior that doesn’t, you can still get a meaningful posterior distribution even if that posterior distribution doesn’t have a real-valued mean.
Just to clarify, guessing that there are 1546 buses maximizes the probability that you are exactly correct, but it does not minimize your expected error, since you are guessing close to many numbers (everything below 1546) that are impossible. This is known in statistics as the “German tank problem”[1] and the posterior distribution is actually not well-defined in many setups.
From WW2 soldiers trying to estimate enemies’ manufacturing capacity based on tank serial numbers
I’m sorry, I’m not sure what you mean. Under bayesianism this is straightforward.
Oh I see. I’m not trying to guess a specific number, I’m trying to update my distribution.
The intuition is that if we both saw bus 1546, and you guessed that there were 1546 buses and I guessed that there were 1547, you would be a little more likely to be correct but I would almost certainly be closer to the real number.
The Bayesian update isn’t generally well-defined because you get a divergent mean. Your implicit prior is 1/n which is an improper prior. This is fine for deriving a posterior median, which in this case happens to be about 3,100 buses, and a posterior distribution, which in this case is a truncated zeta distribution with s=2 and k=1546. But the posterior mean does not exist.
I’m not using this is a prior, I’m using it to update my existing prior (whatever that was). I believe the posterior will be well defined, so long as the prior was.
As a worked example, if I start off assuming that chance of there being n busses is 1/2^n (nice and simple, adds up to 1), then the posterior is 1/n(ln(2))(2^n) - multiply the two distributions, then divide by the integral (ln(2)) so that it adds up to 1.
No, that’s not the posterior distribution—clearly, the number of buses cannot be lower than 1546, but that distribution has material probability mass on low integers. I’m not quite sure how you got that equation.
But regardless, I think this shows where we disagree. That prior has mean 2… that’s a pretty strong assumption about the distribution of n. If you want to avoid that kind of assumption, you can get posterior distributions but not a posterior expectation.
Sorry, I meant to add in an example where for simplicity you saw the bus numbered 1.
Agreed it’s a terrible prior, it’s just an easy one for a worked example.
I’m not disagreeing with that categorically—for many priors the posterior distribution is well defined. But all of those priors carry information (in the information theoretical sense) about the number of buses. If you have an uninformative reference prior, your posterior distribution does not have a mean.
You can see the sketch of this proof if you consider the likelihoods of seeing the bus for any given n. If there are 1546 buses, there was a 1/1546 chance you saw this one. If there were 1547, there was a 1/1547 chance you saw this one. This is the harmonic series, which diverges. That divergence is the fundamental issue that’s going to cause the mean to be undefined.
You can’t make claims about the posterior without setting at least some conditions on what your prior is—obviously, for some priors the posterior expectation is well-defined. (Trivially, if I already think n=2000 with probability 1, I will still think that after seeing the bus.) But I claim that all such priors make assumptions about the distribution of the possible number of buses. In the uninformative case, your posterior distribution is well-defined (as I said, it’s a truncated zeta distribution) but it does not have a finite mean.
I mean, yes, that’s the definition of a prior. How to calculate a prior is an old question in bayesianism, with different approaches—kolmogorov complexity being one.
In Gotts’ approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.
Again, I am confused.
From what you write I understand this :
p(bus has number ≤ n | city has N buses) = n/N
so p(bus has number ≤ 1546 | city has N buses) = 0.5 iff. N = 2992
therefore p(city has 2992 buses | bus has number 1546) = 0.5
But from your other comment, it looks like that last step and conclusion is not what you mean. Can you confirm that?
Or do you mean :
therefore p(city has ≤ 2992 buses | bus has number 1546) = 0.5 ?
Or something else entirely?
In last line there should be
therefore p(city has less than 2992 buses | bus has number 1546) = 0.5
Ok. Thanks. So:
p(bus has number ≤ 1546 | city has 2992 buses) = 0.5
implies
p(city has < 2992 buses | bus has number 1546) = 0.5
?
If that is your reasoning, I do not see how you go from the former to the latter.
Is it a general fact that:
p(bus has number ≤ n | city has N buses) = p(city has < N buses | bus has number n)
or does it work only for 0.5?
May be we better take equation (2) from the original Gott’s work https://gwern.net/doc/existential-risk/1993-gott.pdf:
1 / 3 t < T < 3t with 50 per cent confidence,
in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.
It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott’s equation.
Oh, it looks exactly like the kind of reference that everyone here seems to be aware of and I am not. ^^ I will be reading that. Thanks a lot.
No, that is not the definition of a prior. There are priors which imply an expected number of buses, and priors that don’t. If you select a prior that doesn’t, you can still get a meaningful posterior distribution even if that posterior distribution doesn’t have a real-valued mean.