Landsburg’s solution is somewhat head-scratching. It works fine if you’re sampling families. But the problem as stated doesn’t sample individual families. It asks about the overall population.
This washes out his effect pretty strongly, to the point that the expected excess in the case of an infinite-sized country is the same absolute excess as in the case of one family: You expect 0.3 more boys than girls. zero.
In total. Across the whole country. And it wasn’t asking about the families, it was asking about the country. So the ratio is a teeny tiny deviation from 50%, not 30.6%. This is very much an ends in ‘gry’ situation.
(Note: what I was trying to say above, about the 0.3 more boys than girls, and then got confused:
This is all about the stopping condition. The stopping condition produces a bias by the disproportionate effect of the denominators on either side of 1⁄2. But this stopping condition only steps in once to the country and to the single family all the same: there’s effectively only one stopping condition, that being when N boys have been born. Increasing N from 1 to a million just washes out the effect by adding more unbiased random babies on what for clarity you can imagine as the beginning of the sequence. So you have this biased ratio, and then you weighted-average it with 1⁄2 with a huge weight on the 1⁄2)
It’s tricky. I at first thought that he was talking about sampling families, but he isn’t.
The expected fraction G/(G+B) over many trials, where each trial t involves N families and leads to Gt girls and Bt boys, is not (sum of Gt) / (sum of (Gt + Bt)).
Trials that result in a smaller number of children have more boys than girls. Trials that result in an unusually large number of children have more girls than boys. Yet both kinds of trials count as 1 sample when computing the average fraction of girls. So the average population fraction is smaller than the population fraction from all those trials.
Not quite. The expected difference in numbers is zero. It’s the expected ratio G/(G+B) for a country that is a hair—an unmeasurably small hair—under 0.5, and if you multiply that hair by the population you get something tending to a constant fraction of a person.
While there’s an interesting puzzle to be posed, Landsburg didn’t formulate it in the right way. The question he should have posed (and the one he actually answers in his solution) is “what is the expected proportion of girls per family?” This is the question whose correct answer shows a substantial deviation from the naive 0.5.
But I don’t see what the puzzle has to do with the rest of the post.
Yeah, I got a bit confused because I imagined that you’d go to the maternity ward and watch the happy mothers leaving with their babies. If the last one you saw come out was a girl, you know they’re not done. Symmetry broken. You are taking an unbiased list of possibilities and throwing out the half of them that end with a girl...
Except that to fix the problem, you end up adding an expected equal number of girls as boys.
I had it straight the first time, really. But his answer was so confusingly off that it re-befuddled me.
He did answer the question he posed, which was “What is the expected fraction of girls in a population [of N families]?” It’s not an unmeasurably-small hair. It depends on N. When N=4, the expected fraction is about .46. If you don’t believe it, do the simulation. I did.
I believe the mathematics. He is correct that E(G/(G+B)) < 0.5. But a “country” of four families? A country, not otherwise specified, has millions of families, and if that is interpreted mathematically as asking for the limit of infinite N, then E(G/(G+B)) tends to the limit of 0.5.
To make the point that this puzzle is intended to make, about expectation not commuting with ratios, it should be posed of a single family, where E(G)/E(G+B) = 0.5, E(G/(G+B)) = 1-log(2).
But as I said earlier, how is this puzzle relevant to the rest of your post? The mathematics and the simulation agree.
Estimating the mean and variance of the Cauchy distribution by simulation makes an entertaining exercise.
Thinking about betting $15,000 on a math problem, to be adjudicated by the outcome of a computer simulation, made me wonder how we know when a computer simulation would give the right answer. Showing the results for the similar-looking but divergent series is the simplest example I could think of of when a computer simulation gives a very misleading estimate of expected value, which is the problem this post is about.
(In response to a longer version of the previous post which was in response to the pre-edited version of its parent, which was opposite in nearly every way—and because it was up really briefly, I forgot that he could have seen the pre-edit version. If RK weren’t a ninja this wouldn’t have come up)
Dude, I already conceded. You were right, I said I was wrong. When I was saying I had it straight the first time, I meant before I read the solution. That confused me, then I wrote in response, in error. Then you straightened me out again.
He did answer the question he posed, which was “What is the expected fraction of girls in a population [of N families]?” It’s not an unmeasurably-small hair. It depends on N. When N=4, the expected fraction is about .46.
Here is the exact solution for the expected value of G/(G+B) with k families. From numerical calculation with k up to 150, it looks like the discrepancy 0.5 - g/(g+b) approaches 0.25/k (from below) as k goes to infinity, which is certainly mysterious.
(The expected value of G-B is always 0, though, so I don’t know what you mean by an excess of 0.3.)
So for a reasonably-sized country of 1 million people, we’re looking at a ratio of B/(B+G) = 0.50000025? I’ll buy that.
And the 0.3 was a screwup on my part (my mistaken reasoning is described in a cousin of this post).
Funny though that the correct answer happens to be really close to my completely erroneous answer. It has the same scaling, the same direction, and similar magnitude (0.25/k instead of 0.3/k).
Landsburg’s solution is somewhat head-scratching. It works fine if you’re sampling families. But the problem as stated doesn’t sample individual families. It asks about the overall population.
This washes out his effect pretty strongly, to the point that the expected excess in the case of an infinite-sized country is the same absolute excess as in the case of one family: You expect 0.3 more boys than girls. zero.
In total. Across the whole country. And it wasn’t asking about the families, it was asking about the country. So the ratio is a teeny tiny deviation from 50%, not 30.6%. This is very much an ends in ‘gry’ situation.
(Note: what I was trying to say above, about the 0.3 more boys than girls, and then got confused:
This is all about the stopping condition. The stopping condition produces a bias by the disproportionate effect of the denominators on either side of 1⁄2. But this stopping condition only steps in once to the country and to the single family all the same: there’s effectively only one stopping condition, that being when N boys have been born. Increasing N from 1 to a million just washes out the effect by adding more unbiased random babies on what for clarity you can imagine as the beginning of the sequence. So you have this biased ratio, and then you weighted-average it with 1⁄2 with a huge weight on the 1⁄2)
It’s tricky. I at first thought that he was talking about sampling families, but he isn’t.
The expected fraction G/(G+B) over many trials, where each trial t involves N families and leads to Gt girls and Bt boys, is not (sum of Gt) / (sum of (Gt + Bt)).
Trials that result in a smaller number of children have more boys than girls. Trials that result in an unusually large number of children have more girls than boys. Yet both kinds of trials count as 1 sample when computing the average fraction of girls. So the average population fraction is smaller than the population fraction from all those trials.
Not quite. The expected difference in numbers is zero. It’s the expected ratio G/(G+B) for a country that is a hair—an unmeasurably small hair—under 0.5, and if you multiply that hair by the population you get something tending to a constant fraction of a person.
While there’s an interesting puzzle to be posed, Landsburg didn’t formulate it in the right way. The question he should have posed (and the one he actually answers in his solution) is “what is the expected proportion of girls per family?” This is the question whose correct answer shows a substantial deviation from the naive 0.5.
But I don’t see what the puzzle has to do with the rest of the post.
Yeah, I got a bit confused because I imagined that you’d go to the maternity ward and watch the happy mothers leaving with their babies. If the last one you saw come out was a girl, you know they’re not done. Symmetry broken. You are taking an unbiased list of possibilities and throwing out the half of them that end with a girl...
Except that to fix the problem, you end up adding an expected equal number of girls as boys.
I had it straight the first time, really. But his answer was so confusingly off that it re-befuddled me.
E(G-B) = 0, E(G/(G+B)) < 0.5.
He did answer the question he posed, which was “What is the expected fraction of girls in a population [of N families]?” It’s not an unmeasurably-small hair. It depends on N. When N=4, the expected fraction is about .46. If you don’t believe it, do the simulation. I did.
I believe the mathematics. He is correct that E(G/(G+B)) < 0.5. But a “country” of four families? A country, not otherwise specified, has millions of families, and if that is interpreted mathematically as asking for the limit of infinite N, then E(G/(G+B)) tends to the limit of 0.5.
To make the point that this puzzle is intended to make, about expectation not commuting with ratios, it should be posed of a single family, where E(G)/E(G+B) = 0.5, E(G/(G+B)) = 1-log(2).
But as I said earlier, how is this puzzle relevant to the rest of your post? The mathematics and the simulation agree.
Estimating the mean and variance of the Cauchy distribution by simulation makes an entertaining exercise.
Thinking about betting $15,000 on a math problem, to be adjudicated by the outcome of a computer simulation, made me wonder how we know when a computer simulation would give the right answer. Showing the results for the similar-looking but divergent series is the simplest example I could think of of when a computer simulation gives a very misleading estimate of expected value, which is the problem this post is about.
The question asked about a country. Unless you’re counting hypothetical micro-seasteads as countries, the ratio is within noise of 50%.
(In response to a longer version of the previous post which was in response to the pre-edited version of its parent, which was opposite in nearly every way—and because it was up really briefly, I forgot that he could have seen the pre-edit version. If RK weren’t a ninja this wouldn’t have come up)
Dude, I already conceded. You were right, I said I was wrong. When I was saying I had it straight the first time, I meant before I read the solution. That confused me, then I wrote in response, in error. Then you straightened me out again.
Sorry, I posted before you corrected that post. I shall edit out my asperity.
Sorry for resisting correction for that short time.
He did answer the question he posed, which was “What is the expected fraction of girls in a population [of N families]?” It’s not an unmeasurably-small hair. It depends on N. When N=4, the expected fraction is about .46.
Here is the exact solution for the expected value of
G/(G+B)
withk
families. From numerical calculation withk
up to 150, it looks like the discrepancy0.5 - g/(g+b)
approaches0.25/k
(from below) ask
goes to infinity, which is certainly mysterious.(The expected value of
G-B
is always 0, though, so I don’t know what you mean by an excess of 0.3.)So for a reasonably-sized country of 1 million people, we’re looking at a ratio of B/(B+G) = 0.50000025? I’ll buy that.
And the 0.3 was a screwup on my part (my mistaken reasoning is described in a cousin of this post).
Funny though that the correct answer happens to be really close to my completely erroneous answer. It has the same scaling, the same direction, and similar magnitude (0.25/k instead of 0.3/k).