Consider a population of blobs, initially comprising N individual blobs. Each individual blob independently has a probability p of reproducing, just once, spawning exactly one new blob. The next generation (an expected N*p individuals) has the same probability for each individual to spawn one new blob, and so on. Eventually the process will stop, with a total blob population of P.
The question is about the probability distribution for P, given N and p. Is this a well-known probability distribution? If so, which? Even if not, are there things that can be said about it which are mathematically obvious? (Not obvious to me, obviously. I’d be interested in which gaps in my math education I’m revealing by even asking these questions.)
Here’s my solution. The descendants of each initial blob spawn independently of descendants of other initial blobs, so this is a sum of N independent distributions. The number of descendants of one initial blob is obviously the geometric distribution. Googling “sum of independent geometric distributions” gives Negative binomial distribution as the answer.
Agreed—there are never more than N breeding blobs, each success increases P by one, and each failure reduces the remaining number of breeding blobs by one. Essentially, if r = N, X = P-N.
I don’t think that’s right. I don’t have the math to show why yet, but my current working hunch says to make explicit your assumptions about whether the initial number of blobs, and the number of generations, are continuous or discrete, because the geometric distribution may not actually be right.
After G generations, each blob has a probability q=p^G of having a descendant. So, it seems to me that P will be distributed as a binomial with q and N as parameters.
The blobs don’t reproduce with probability p in any given generation, they reproduce with probability p ever. The scenario doesn’t require generations in the sense you seem to be thinking of, it could all happen within 1 second, or a first generation blob might reproduce after the highest generation blob that reproduces has already done so.
Oh, ok. I thought the blobs died each generation. A shrinking population. Instead they go into nursing homes. A growing population which stabilizes once everyone is geriatric.
The negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs.
Pretty damned obvious, actually, that (P-N) is distributed as a negative binomial where r is set to N; failure = failure to reproduce; success = birth.
Offhand, I think you would also need to know the number of generations. I’ll have to do some pen-and-paper work to work out what the distribution looks like.
Huh? Why? The expected number of blobs is given by N/(1-p), the number of actually realized generations is not a variable, it’s determined by N, p and chance. I have no idea how the distribution looks, but the number of actual generations should be one of the things you have a distribution across, not an input.
Eventually the process will stop, with a total blob population of P.
Under your model, P=0 with frequency 1, so that doesn’t make sense. I think the idea is to stop after a predetermined number of generations and see how many blobs are left.
Edit: No, wait, I see what’s going on. You’re right.
A quick probability math question.
Consider a population of blobs, initially comprising N individual blobs. Each individual blob independently has a probability p of reproducing, just once, spawning exactly one new blob. The next generation (an expected N*p individuals) has the same probability for each individual to spawn one new blob, and so on. Eventually the process will stop, with a total blob population of P.
The question is about the probability distribution for P, given N and p. Is this a well-known probability distribution? If so, which? Even if not, are there things that can be said about it which are mathematically obvious? (Not obvious to me, obviously. I’d be interested in which gaps in my math education I’m revealing by even asking these questions.)
Here’s my solution. The descendants of each initial blob spawn independently of descendants of other initial blobs, so this is a sum of N independent distributions. The number of descendants of one initial blob is obviously the geometric distribution. Googling “sum of independent geometric distributions” gives Negative binomial distribution as the answer.
Agreed—there are never more than N breeding blobs, each success increases P by one, and each failure reduces the remaining number of breeding blobs by one. Essentially, if r = N, X = P-N.
Thanks for answering several questions at once. :)
I don’t think that’s right. I don’t have the math to show why yet, but my current working hunch says to make explicit your assumptions about whether the initial number of blobs, and the number of generations, are continuous or discrete, because the geometric distribution may not actually be right.
After G generations, each blob has a probability q=p^G of having a descendant. So, it seems to me that P will be distributed as a binomial with q and N as parameters.
The blobs don’t reproduce with probability p in any given generation, they reproduce with probability p ever. The scenario doesn’t require generations in the sense you seem to be thinking of, it could all happen within 1 second, or a first generation blob might reproduce after the highest generation blob that reproduces has already done so.
Oh, ok. I thought the blobs died each generation. A shrinking population. Instead they go into nursing homes. A growing population which stabilizes once everyone is geriatric.
Got it. Wei pretty clearly has the solution. Negative Binomial distribution
Pretty damned obvious, actually, that (P-N) is distributed as a negative binomial where r is set to N; failure = failure to reproduce; success = birth.
Offhand, I think you would also need to know the number of generations. I’ll have to do some pen-and-paper work to work out what the distribution looks like.
Huh? Why? The expected number of blobs is given by N/(1-p), the number of actually realized generations is not a variable, it’s determined by N, p and chance. I have no idea how the distribution looks, but the number of actual generations should be one of the things you have a distribution across, not an input.
Morendil said:
Under your model, P=0 with frequency 1, so that doesn’t make sense. I think the idea is to stop after a predetermined number of generations and see how many blobs are left.
Edit: No, wait, I see what’s going on. You’re right.