purple fire comments on Experimental testing: can I treat myself as a random sample?

purple fire 22 Apr 2025 17:15 UTC
6 points
1
Just to clarify, guessing that there are 1546 buses maximizes the probability that you are exactly correct, but it does not minimize your expected error, since you are guessing close to many numbers (everything below 1546) that are impossible. This is known in statistics as the “German tank problem”^[1] and the posterior distribution is actually not well-defined in many setups.
1. ^
  From WW2 soldiers trying to estimate enemies’ manufacturing capacity based on tank serial numbers
- Yair Halberstadt 22 Apr 2025 17:29 UTC
  2 points
  0
  Parent
  I’m sorry, I’m not sure what you mean. Under bayesianism this is straightforward.
  - Yair Halberstadt 22 Apr 2025 17:30 UTC
    4 points
    2
    Parent
    Oh I see. I’m not trying to guess a specific number, I’m trying to update my distribution.
    - purple fire 22 Apr 2025 17:45 UTC
      4 points
      3
      Parent
      The intuition is that if we both saw bus 1546, and you guessed that there were 1546 buses and I guessed that there were 1547, you would be a little more likely to be correct but I would almost certainly be closer to the real number.
      The Bayesian update isn’t generally well-defined because you get a divergent mean. Your implicit prior is 1/n which is an improper prior. This is fine for deriving a posterior median, which in this case happens to be about 3,100 buses, and a posterior distribution, which in this case is a truncated zeta distribution with s=2 and k=1546. But the posterior mean does not exist.
      - Yair Halberstadt 22 Apr 2025 18:07 UTC
        1 point
        0
        Parent
        I’m not using this is a prior, I’m using it to update my existing prior (whatever that was). I believe the posterior will be well defined, so long as the prior was.
        Yair Halberstadt 22 Apr 2025 18:25 UTC
        2 points
        −1
        Parent
        As a worked example, if I start off assuming that chance of there being n busses is 1/2^n (nice and simple, adds up to 1), then the posterior is 1/n(ln(2))(2^n) - multiply the two distributions, then divide by the integral (ln(2)) so that it adds up to 1.
        purple fire 22 Apr 2025 18:45 UTC
        2 points
        1
        Parent
        No, that’s not the posterior distribution—clearly, the number of buses cannot be lower than 1546, but that distribution has material probability mass on low integers. I’m not quite sure how you got that equation.
        But regardless, I think this shows where we disagree. That prior has mean 2… that’s a pretty strong assumption about the distribution of n. If you want to avoid that kind of assumption, you can get posterior distributions but not a posterior expectation.
        Yair Halberstadt 22 Apr 2025 19:22 UTC
        2 points
        1
        Parent
        Sorry, I meant to add in an example where for simplicity you saw the bus numbered 1.
        Agreed it’s a terrible prior, it’s just an easy one for a worked example.
        purple fire 22 Apr 2025 18:39 UTC
        1 point
        0
        Parent
        I’m not disagreeing with that categorically—for many priors the posterior distribution is well defined. But all of those priors carry information (in the information theoretical sense) about the number of buses. If you have an uninformative reference prior, your posterior distribution does not have a mean.
        You can see the sketch of this proof if you consider the likelihoods of seeing the bus for any given n. If there are 1546 buses, there was a 1/1546 chance you saw this one. If there were 1547, there was a 1/1547 chance you saw this one. This is the harmonic series, which diverges. That divergence is the fundamental issue that’s going to cause the mean to be undefined.
        You can’t make claims about the posterior without setting at least some conditions on what your prior is—obviously, for some priors the posterior expectation is well-defined. (Trivially, if I already think n=2000 with probability 1, I will still think that after seeing the bus.) But I claim that all such priors make assumptions about the distribution of the possible number of buses. In the uninformative case, your posterior distribution is well-defined (as I said, it’s a truncated zeta distribution) but it does not have a finite mean.
        Yair Halberstadt 22 Apr 2025 19:25 UTC
        1 point
        0
        Parent
        But I claim that all such priors make assumptions about the distribution of the possible number of buses
        I mean, yes, that’s the definition of a prior. How to calculate a prior is an old question in bayesianism, with different approaches—kolmogorov complexity being one.
        avturchin 23 Apr 2025 11:07 UTC
        2 points
        −1
        Parent
        In Gotts’ approach, the bus distribution statistic between different cities is irrelevant. The number of buses N for this city is already fixed. When you draw the bus number n, you just randomly selected from N. In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability. Laplace came to similar result using much more complex calculations of summing all possible probability distribution.
        rnollet 23 Apr 2025 18:50 UTC
        1 point
        0
        Parent
        In that case, probability is n/N, and if we look for 0.5 probability, we get 0.5 = 1546/N which gives us N = 2992 with 0.5 probability.
        Again, I am confused.
        From what you write I understand this :
        p(bus has number ≤ n | city has N buses) = n/N
        so p(bus has number ≤ 1546 | city has N buses) = 0.5 iff. N = 2992
        therefore p(city has 2992 buses | bus has number 1546) = 0.5
        But from your other comment, it looks like that last step and conclusion is not what you mean. Can you confirm that?
        Or do you mean :
        therefore p(city has ≤ 2992 buses | bus has number 1546) = 0.5 ?
        Or something else entirely?
        avturchin 24 Apr 2025 9:53 UTC
        3 points
        0
        Parent
        In last line there should be
        therefore p(city has less than 2992 buses | bus has number 1546) = 0.5
        rnollet 24 Apr 2025 12:18 UTC
        1 point
        0
        Parent
        Ok. Thanks. So:
        p(bus has number ≤ 1546 | city has 2992 buses) = 0.5
        implies
        p(city has < 2992 buses | bus has number 1546) = 0.5
        ?
        If that is your reasoning, I do not see how you go from the former to the latter.
        Is it a general fact that:
        p(bus has number ≤ n | city has N buses) = p(city has < N buses | bus has number n)
        or does it work only for 0.5?
        Expand this thread
        avturchin 24 Apr 2025 13:31 UTC
        3 points
        0
        Parent
        May be we better take equation (2) from the original Gott’s work https://gwern.net/doc/existential-risk/1993-gott.pdf:
        1 / 3 t < T < 3t with 50 per cent confidence,
        
        in which T is the total number of buses and t is the number of buses above observed bus number T0. In our case, T is between 2061 and 6184 with 50 per cent probability.
        
        It is a correct claim, and saying that the total number of buses is double of the observed bus number is an oversimplification of that claim which we use only to point in the direction of the full Gott’s equation.
        rnollet 24 Apr 2025 20:34 UTC
        3 points
        0
        Parent
        Oh, it looks exactly like the kind of reference that everyone here seems to be aware of and I am not. ^^ I will be reading that. Thanks a lot.
        purple fire 22 Apr 2025 19:29 UTC
        1 point
        0
        Parent
        No, that is not the definition of a prior. There are priors which imply an expected number of buses, and priors that don’t. If you select a prior that doesn’t, you can still get a meaningful posterior distribution even if that posterior distribution doesn’t have a real-valued mean.