i.e. we take a random sample of 100 men and 100 women with SAT scores between 1200-1400 (high but not perfect scores). Are the male scores going to average better than the females?
So, first let’s ask this question, supposing that the test is perfectly accurate. We’ll run through the numbers separately for the two subtests (so we don’t have to deal with correlation), taking means and variances from here.
Of those who scored 600-700 on the hypothetical normally distributed math SAT (hence “HNDMSAT”), the male mean was 643.3 (with 20% of the male population in this band), and the female mean was 640.6 (with 14.8% of the female population in this band).
Of those who scored 600-700 on the HNDVSAT, the male mean was 641.0 (with 14.9% of the male population in this band), and the female mean was 640.1 (with 13.7% of the female population in this band).
When we introduce the test error into the process, the computation gets a lot messier. The quick and dirty way to do things is to say “well, let’s just shrink the mean band scores towards the population mean with the reliability coefficient.” This turns the male edge on the HNDMSAT of 2.7 into 5.4, and the male edge of .9 into 1.8. (I think it’s coincidental that this is roughly doubling the edge.)
My intuition says no: while I’d expect fewer females to be in that range to begin with, I can’t see any reason to assume their scores would cluster towards the lower end of the range compared to males.
That’s because you’re not thinking in bell curves. The range is all on one side of the mean, the male mean is closer to the bottom of the band, and the male variation is higher.
So, first let’s ask this question, supposing that the test is perfectly accurate. We’ll run through the numbers separately for the two subtests (so we don’t have to deal with correlation), taking means and variances from here.
Of those who scored 600-700 on the hypothetical normally distributed math SAT (hence “HNDMSAT”), the male mean was 643.3 (with 20% of the male population in this band), and the female mean was 640.6 (with 14.8% of the female population in this band).
Of those who scored 600-700 on the HNDVSAT, the male mean was 641.0 (with 14.9% of the male population in this band), and the female mean was 640.1 (with 13.7% of the female population in this band).
When we introduce the test error into the process, the computation gets a lot messier. The quick and dirty way to do things is to say “well, let’s just shrink the mean band scores towards the population mean with the reliability coefficient.” This turns the male edge on the HNDMSAT of 2.7 into 5.4, and the male edge of .9 into 1.8. (I think it’s coincidental that this is roughly doubling the edge.)
That’s because you’re not thinking in bell curves. The range is all on one side of the mean, the male mean is closer to the bottom of the band, and the male variation is higher.