From the picture accompanying the article (so numbers may be slightly off): on the final exam, men’s average score minus women’s average score was 11.3+-4.6% in the control group and 2.4+-3.8% in the experimental group. The difference in gap was thus 8.9+-6%, so about 1.5 standard deviations from no difference.
Women’s score in the experimental group minus the control group was 5.9+-5.2%. Respectable, but only a bit above 1 standard deviation.
Men’s score in the experimental group minus the control group was −3.3+-3.8%. Focusing on their values, rather than values other people have, made men worse at this test by a comparable amount to how much it made women better (in terms of standard deviations, not absolutes). The standard deviations narrowed for both groups- for the women, this was reported as the worst women doing better, and for the men, it seems reasonable to assume this means the best men did worse.
So, what the heck is going on here? Most likely seems statistical fluke- the experimental group happened to contain worse men and better women. These results don’t seem terribly statistically significant (to get my numbers, I added together four normals with stdevs of the error bars on the picture; it would be better to check the statistical analysis in the paper itself), and so that possibility is rather strong.
An alternative is that most of these “gap-closing” mechanisms actually impede the superior group and actually help the inferior group. The control group’s male score minus the experimental group’s female score is 5.6+-4%- almost 1.5 stdevs from no difference (control male—control female was almost 2.5 stdevs from no difference).
Two ways to have half of each: this might have been a statistical fluke that the experimental men did worse, but this actually improves female performance. Or, the value affirmation might have made everyone do worse, but the women by some fluke did better (this is least likely, given that the women would have to be 2 stdevs unlikely upwards in the experimental group).
An alternative is that most of these “gap-closing” mechanisms actually impede the superior group and actually help the inferior group.
That was my first thought. If a physics teacher made me waste 15 minutes on such a stupid, non-physics-related exercise, I’d likely do very badly in the class (more likely, walk out and drop the class immediately).
That would explain a possible difference between an experimental group that spent a 15 minute exercise on stuff other than physics and a control group that did just physics- the best students might leave the experimental group, bringing down its mean and standard deviation. But as only the focus differed between the two groups, I don’t see how the impulse to leave classes that waste your time would manifest itself as a difference between the experimental and control groups. If such an effect is measurable in outcomes, it would not be noticed in this experiment.
Here I had just assumed one of the groups would have been taught some physics during that 15 minutes. I guess we’ll just have to keep wondering how much better teaching physics does at making people learn physics, than not teaching physics.
From the picture accompanying the article (so numbers may be slightly off): on the final exam, men’s average score minus women’s average score was 11.3+-4.6% in the control group and 2.4+-3.8% in the experimental group. The difference in gap was thus 8.9+-6%, so about 1.5 standard deviations from no difference.
Women’s score in the experimental group minus the control group was 5.9+-5.2%. Respectable, but only a bit above 1 standard deviation.
Men’s score in the experimental group minus the control group was −3.3+-3.8%. Focusing on their values, rather than values other people have, made men worse at this test by a comparable amount to how much it made women better (in terms of standard deviations, not absolutes). The standard deviations narrowed for both groups- for the women, this was reported as the worst women doing better, and for the men, it seems reasonable to assume this means the best men did worse.
So, what the heck is going on here? Most likely seems statistical fluke- the experimental group happened to contain worse men and better women. These results don’t seem terribly statistically significant (to get my numbers, I added together four normals with stdevs of the error bars on the picture; it would be better to check the statistical analysis in the paper itself), and so that possibility is rather strong.
An alternative is that most of these “gap-closing” mechanisms actually impede the superior group and actually help the inferior group. The control group’s male score minus the experimental group’s female score is 5.6+-4%- almost 1.5 stdevs from no difference (control male—control female was almost 2.5 stdevs from no difference).
Two ways to have half of each: this might have been a statistical fluke that the experimental men did worse, but this actually improves female performance. Or, the value affirmation might have made everyone do worse, but the women by some fluke did better (this is least likely, given that the women would have to be 2 stdevs unlikely upwards in the experimental group).
That was my first thought. If a physics teacher made me waste 15 minutes on such a stupid, non-physics-related exercise, I’d likely do very badly in the class (more likely, walk out and drop the class immediately).
Everybody wasted 15 minutes. The question was just what they focused on (and both options weren’t physics related).
I think I might be missing your point—I already thought that was the case.
That would explain a possible difference between an experimental group that spent a 15 minute exercise on stuff other than physics and a control group that did just physics- the best students might leave the experimental group, bringing down its mean and standard deviation. But as only the focus differed between the two groups, I don’t see how the impulse to leave classes that waste your time would manifest itself as a difference between the experimental and control groups. If such an effect is measurable in outcomes, it would not be noticed in this experiment.
Ah, missed that detail, thanks.
Here I had just assumed one of the groups would have been taught some physics during that 15 minutes. I guess we’ll just have to keep wondering how much better teaching physics does at making people learn physics, than not teaching physics.