Morendil comments on Open Thread, August 2010-- part 2

Morendil 29 Aug 2010 22:01 UTC
3 points
0
A quick probability math question.

Consider a population of blobs, initially comprising N individual blobs. Each individual blob independently has a probability p of reproducing, just once, spawning exactly one new blob. The next generation (an expected N*p individuals) has the same probability for each individual to spawn one new blob, and so on. Eventually the process will stop, with a total blob population of P.

The question is about the probability distribution for P, given N and p. Is this a well-known probability distribution? If so, which? Even if not, are there things that can be said about it which are mathematically obvious? (Not obvious to me, obviously. I’d be interested in which gaps in my math education I’m revealing by even asking these questions.)
- Wei Dai 29 Aug 2010 22:20 UTC
  13 points
  0
  Parent
  Here’s my solution. The descendants of each initial blob spawn independently of descendants of other initial blobs, so this is a sum of N independent distributions. The number of descendants of one initial blob is obviously the geometric distribution. Googling “sum of independent geometric distributions” gives Negative binomial distribution as the answer.
  - RobinZ 30 Aug 2010 0:39 UTC
    2 points
    0
    Parent
    Agreed—there are never more than N breeding blobs, each success increases P by one, and each failure reduces the remaining number of breeding blobs by one. Essentially, if r = N, X = P-N.
  - Morendil 30 Aug 2010 5:35 UTC
    0 points
    0
    Parent
    Thanks for answering several questions at once. :)
  - Pavitra 29 Aug 2010 22:35 UTC
    0 points
    0
    Parent
    I don’t think that’s right. I don’t have the math to show why yet, but my current working hunch says to make explicit your assumptions about whether the initial number of blobs, and the number of generations, are continuous or discrete, because the geometric distribution may not actually be right.
- Perplexed 29 Aug 2010 22:32 UTC
  1 point
  0
  Parent
  After G generations, each blob has a probability q=p^G of having a descendant. So, it seems to me that P will be distributed as a binomial with q and N as parameters.
  - FAWS 29 Aug 2010 22:51 UTC
    3 points
    0
    Parent
    The blobs don’t reproduce with probability p in any given generation, they reproduce with probability p ever. The scenario doesn’t require generations in the sense you seem to be thinking of, it could all happen within 1 second, or a first generation blob might reproduce after the highest generation blob that reproduces has already done so.
    - Perplexed 29 Aug 2010 23:38 UTC
      5 points
      0
      Parent
      Oh, ok. I thought the blobs died each generation. A shrinking population. Instead they go into nursing homes. A growing population which stabilizes once everyone is geriatric.
      
      Got it. Wei pretty clearly has the solution. Negative Binomial distribution
      
      The negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs.
      
      Pretty damned obvious, actually, that (P-N) is distributed as a negative binomial where r is set to N; failure = failure to reproduce; success = birth.
- Pavitra 29 Aug 2010 22:14 UTC
  0 points
  0
  Parent
  Offhand, I think you would also need to know the number of generations. I’ll have to do some pen-and-paper work to work out what the distribution looks like.
  - FAWS 29 Aug 2010 23:05 UTC
    0 points
    0
    Parent
    Huh? Why? The expected number of blobs is given by N/(1-p), the number of actually realized generations is not a variable, it’s determined by N, p and chance. I have no idea how the distribution looks, but the number of actual generations should be one of the things you have a distribution across, not an input.
    - Pavitra 29 Aug 2010 23:13 UTC
      0 points
      0
      Parent
      Morendil said:
      
      Eventually the process will stop, with a total blob population of P.
      
      Under your model, P=0 with frequency 1, so that doesn’t make sense. I think the idea is to stop after a predetermined number of generations and see how many blobs are left.
      
      Edit: No, wait, I see what’s going on. You’re right.