I’m not disagreeing with that categorically—for many priors the posterior distribution is well defined. But all of those priors carry information (in the information theoretical sense) about the number of buses. If you have an uninformative reference prior, your posterior distribution does not have a mean.
You can see the sketch of this proof if you consider the likelihoods of seeing the bus for any given n. If there are 1546 buses, there was a 1/1546 chance you saw this one. If there were 1547, there was a 1/1547 chance you saw this one. This is the harmonic series, which diverges. That divergence is the fundamental issue that’s going to cause the mean to be undefined.
You can’t make claims about the posterior without setting at least some conditions on what your prior is—obviously, for some priors the posterior expectation is well-defined. (Trivially, if I already think n=2000 with probability 1, I will still think that after seeing the bus.) But I claim that all such priors make assumptions about the distribution of the possible number of buses. In the uninformative case, your posterior distribution is well-defined (as I said, it’s a truncated zeta distribution) but it does not have a finite mean.
But I claim that all such priors make assumptions about the distribution of the possible number of buses
I mean, yes, that’s the definition of a prior. How to calculate a prior is an old question in bayesianism, with different approaches—kolmogorov complexity being one.
In Gotts’ approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.
in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.
It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott’s equation.
No, that is not the definition of a prior. There are priors which imply an expected number of buses, and priors that don’t. If you select a prior that doesn’t, you can still get a meaningful posterior distribution even if that posterior distribution doesn’t have a real-valued mean.
I’m not disagreeing with that categorically—for many priors the posterior distribution is well defined. But all of those priors carry information (in the information theoretical sense) about the number of buses. If you have an uninformative reference prior, your posterior distribution does not have a mean.
You can see the sketch of this proof if you consider the likelihoods of seeing the bus for any given n. If there are 1546 buses, there was a 1/1546 chance you saw this one. If there were 1547, there was a 1/1547 chance you saw this one. This is the harmonic series, which diverges. That divergence is the fundamental issue that’s going to cause the mean to be undefined.
You can’t make claims about the posterior without setting at least some conditions on what your prior is—obviously, for some priors the posterior expectation is well-defined. (Trivially, if I already think n=2000 with probability 1, I will still think that after seeing the bus.) But I claim that all such priors make assumptions about the distribution of the possible number of buses. In the uninformative case, your posterior distribution is well-defined (as I said, it’s a truncated zeta distribution) but it does not have a finite mean.
I mean, yes, that’s the definition of a prior. How to calculate a prior is an old question in bayesianism, with different approaches—kolmogorov complexity being one.
In Gotts’ approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.
Again, I am confused.
From what you write I understand this :
p(bus has number ≤ n | city has N buses) = n/N
so p(bus has number ≤ 1546 | city has N buses) = 0.5 iff. N = 2992
therefore p(city has 2992 buses | bus has number 1546) = 0.5
But from your other comment, it looks like that last step and conclusion is not what you mean. Can you confirm that?
Or do you mean :
therefore p(city has ≤ 2992 buses | bus has number 1546) = 0.5 ?
Or something else entirely?
In last line there should be
therefore p(city has less than 2992 buses | bus has number 1546) = 0.5
Ok. Thanks. So:
p(bus has number ≤ 1546 | city has 2992 buses) = 0.5
implies
p(city has < 2992 buses | bus has number 1546) = 0.5
?
If that is your reasoning, I do not see how you go from the former to the latter.
Is it a general fact that:
p(bus has number ≤ n | city has N buses) = p(city has < N buses | bus has number n)
or does it work only for 0.5?
May be we better take equation (2) from the original Gott’s work https://gwern.net/doc/existential-risk/1993-gott.pdf:
1 / 3 t < T < 3t with 50 per cent confidence,
in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.
It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott’s equation.
Oh, it looks exactly like the kind of reference that everyone here seems to be aware of and I am not. ^^ I will be reading that. Thanks a lot.
No, that is not the definition of a prior. There are priors which imply an expected number of buses, and priors that don’t. If you select a prior that doesn’t, you can still get a meaningful posterior distribution even if that posterior distribution doesn’t have a real-valued mean.