If you are a decision maker in education in your area, please, please, please look into the various Bayesian predictive models used for math placement;
Bayesian methods still can (and in practice, will) use race and the like as evidence, meaning, if you’re black you need higher test scores and grades to qualify—they just don’t entirely stone-wall you from qualifying, which is a step forward I guess.
The fair approach is to have an entrance exam for better math classes, blind to the race.
if you’re black you need higher test scores and grades to qualify
This seems to me mistaken. The reason race can be used as a proxy in the first place is that there is some correlation with performance on the standard tests. If you use the standard tests, then that entirely screens out all the race information; there is no additional information in race that you didn’t get from the test. This is similar to checking whether the plane flies: All information from authority and from theory is screened out by the experiment.
More generally: If A is a proxy for B, and you use B, then trying to use A in addition is double-counting.
Now, perhaps you are arguing that race is a proxy for test scores and something else, and you can still extract the something-else? If so this should be made explicit.
Consider a test followed by a re-test (which we are trying to predict). To calculate expected score on the re-test you need to apply regression to the mean. For a population where you measured lower mean or (in high range) smaller variance, you’ll have to regress more.
Of course, that mathematical fact doesn’t make it non racist or morally right to do such adjustment. You could add a couple simple extra questions to the test, to obtain similar improvement in the accuracy. Or you could use some other side data instead—weight, height, and blood type, for example, there’s a lot of other data you can use besides race, if the race is used but nothing else, that’s because of tradition of racism, not because of some awesome rationality. It’s fairly amusing to see how race realists justify racism with increased accuracy, but start complaining when you adjust your evaluation of them in much same manner using racism/non-racism as evidence...
edit: An important correction. The test-to-test variance may also differ between the groups. E.g. if we have some robots that always test the same, even if they have low mean, they’ll have smaller regression to the mean than humans.
Consider a test followed by a re-test (which we are trying to predict). To calculate expected score on the re-test you need to apply regression to the mean.
This is only true if you assume there is some component of luck or guesswork to the score. I admit that this may be a good model for the kinds of tests you get in American high schools. However, it is not clear to me that “black people” is the correct population to use for the regression, because by construction you have an untypical member. Why not “high-scoring people” or “all students”?
Perhaps it would be helpful to construct an example using something other than race as the difference between populations, to avoid emotional entanglements?
If there is no component of luck or guesswork or something that varies from test to test, then the retest will be exactly the same as the original test, but that’s not what we see in pretty much any test. or any measurement of anything.
Other way around. If you’ve begun with a socioeconomic disadvantage, then achieving a particular test score is an indicator of greater inherent ability, insofar as such a thing exists. Someone who can run a mile in five minutes while carrying a fifty-pound weight is a better runner than someone who can run the same mile in the same speed while carrying no additional weight.
That depends very much to specific priors and correlations.
If you’re looking for the expected score on a re-test, you should apply regression towards the mean, and for a lower mean, that’s more regression. A school may be interested in the probability of student success on a course, which is not a measure of inherent ability either but very much depends to the same disadvantages that lower the test score.
edit: that is to say, if you made a programming contest where the contestants write programs to predict re-test scores from score and a profile photo, given huge enough database of US students (split in two, one available to our contestants, one for the final test), winner code will literally measure skin albedo, and in some cases maybe also try to detect eyeglasses. Of course, the morale of the story is not that racism is good but that socially sometimes we don’t want the most accurate guess.
edit2: Subtler measures may correlate too, besides the racial ones. E.g. angle between line connecting pupils of the eyes, and horizontal, the pupil dilation in the photo, use/non use of flash, strength of red eye effect, and who knows what else (how busy does the background look, maybe?). I don’t think many people here want to have their math scores be adjusted depending to how they held their head in a photograph. edit3: ohh, and the image metadata, or noise signatures, that’d be a big one—is the image taken by an expensive camera? Get free points on your math test. And a free tip: squint. It will think you’re asian or smart enough to squint.
I think fubar may be right in a certain way: if you observe someone reaching a very high score while having a known poor environment (let’s say you’ve tested them enough so one can ignore issues of <1 reliability causing a regression to the mean on subsequent retests), then you might then estimate that the non-environmental contributions must be unusually high—because something must be causing him to score very high, and it’s sure not the environment. So for example, we might infer that his genes or prenatal environment or personality are better than average.
Yes. As I say, depends to what we are trying to predict and priors. Even with 1 test and significant regression, it’s correct to infer higher non-environmental contribution, just not higher combination of environmental and non-environmental.
It seems to me that private_messaging is right and explains his point beautifully. Here’s a Robin Hanson post making a similar point. Also see this discussion, especially Wei Dai’s comment.
I’m confused as to why race would matter if you already have the grades and test scores information. Race might be helpful in predicting what their previous grades and test scores were, but I doubt it would improve the accuracy much over a model that excluded it.
And if it really was true that race matters in who will benefit the most from the program, then so be it. Why would you not want to help those that benefited the most first, regardless what information was used to predict it?
The fair approach is to have an entrance exam for better math classes, blind to the race.
Is it more important to be fair or accurate?
There are times where it’s more important to be fair. For example, punishing a person because he’s guilty discourages crime. Punishing someone because he’s black does not. Thus, using the fact that he’s black as evidence will mean more guilty people will go to jail, more innocents will avoid being jailed, and more people will commit crime.
Well, what do you think about losing points because your profile photo has atypical proportions, or atypical posture? Points adjustment for round face, or for relative finger lengths? For having too many or too few facebook friends, likes, and so on? Weight, height, and blood type?
I don’t think that really applies here, though.
Well, if you want to encourage education rather than encourage being white or having typical posture or other things like that, it does apply.
Well, if you want to encourage education rather than encourage being white or having typical posture or other things like that, it does apply.
If you’re giving prizes to the best students to encourage them, then it applies. If you’re trying to match the teaching style to the student, I don’t think it does.
One might say that the sanity waterline one has to cross to rationally handle test-score-based Bayesian predictions in a by-and-large rational way is much lower than the sanity waterline one has to cross to rationally handle relative-finger-length-based predictions, which itself is lower than the waterline for skin-color-based predictions.
What sanity? Everyone is pushing for measures that would be advantageous for themselves, opposing disadvantageous measures, and there’s nothing particularly insane about that, it’s just instinctive selfishness. The white ‘nerds’ for instance could be OK with adjustment for race, but very much not OK with adjustment for various looking odd metrics (which lump them together with the autistic). It’s only Bayesian when it’s someone else; when it’s you losing points, that’s you being lumped together with other people (on basis of some random trait that happens to be widely measured), which is of course bad and irrational and a bias (complete with examples of how it is inexact). Nothing insane about that either, it’s just selfishness.
Meanwhile, I’d dare to guess you can get considerably larger boost in accuracy from adding a couple more questions to a test, or using data from some other standardized test.
Bayesian methods still can (and in practice, will) use race and the like as evidence, meaning, if you’re black you need higher test scores and grades to qualify—they just don’t entirely stone-wall you from qualifying, which is a step forward I guess.
The fair approach is to have an entrance exam for better math classes, blind to the race.
This seems to me mistaken. The reason race can be used as a proxy in the first place is that there is some correlation with performance on the standard tests. If you use the standard tests, then that entirely screens out all the race information; there is no additional information in race that you didn’t get from the test. This is similar to checking whether the plane flies: All information from authority and from theory is screened out by the experiment.
More generally: If A is a proxy for B, and you use B, then trying to use A in addition is double-counting.
Now, perhaps you are arguing that race is a proxy for test scores and something else, and you can still extract the something-else? If so this should be made explicit.
Consider a test followed by a re-test (which we are trying to predict). To calculate expected score on the re-test you need to apply regression to the mean. For a population where you measured lower mean or (in high range) smaller variance, you’ll have to regress more.
Of course, that mathematical fact doesn’t make it non racist or morally right to do such adjustment. You could add a couple simple extra questions to the test, to obtain similar improvement in the accuracy. Or you could use some other side data instead—weight, height, and blood type, for example, there’s a lot of other data you can use besides race, if the race is used but nothing else, that’s because of tradition of racism, not because of some awesome rationality. It’s fairly amusing to see how race realists justify racism with increased accuracy, but start complaining when you adjust your evaluation of them in much same manner using racism/non-racism as evidence...
edit: An important correction. The test-to-test variance may also differ between the groups. E.g. if we have some robots that always test the same, even if they have low mean, they’ll have smaller regression to the mean than humans.
This is only true if you assume there is some component of luck or guesswork to the score. I admit that this may be a good model for the kinds of tests you get in American high schools. However, it is not clear to me that “black people” is the correct population to use for the regression, because by construction you have an untypical member. Why not “high-scoring people” or “all students”?
Perhaps it would be helpful to construct an example using something other than race as the difference between populations, to avoid emotional entanglements?
Try neuroskeptic.
If there is no component of luck or guesswork or something that varies from test to test, then the retest will be exactly the same as the original test, but that’s not what we see in pretty much any test. or any measurement of anything.
Other way around. If you’ve begun with a socioeconomic disadvantage, then achieving a particular test score is an indicator of greater inherent ability, insofar as such a thing exists. Someone who can run a mile in five minutes while carrying a fifty-pound weight is a better runner than someone who can run the same mile in the same speed while carrying no additional weight.
That depends very much to specific priors and correlations.
If you’re looking for the expected score on a re-test, you should apply regression towards the mean, and for a lower mean, that’s more regression. A school may be interested in the probability of student success on a course, which is not a measure of inherent ability either but very much depends to the same disadvantages that lower the test score.
edit: that is to say, if you made a programming contest where the contestants write programs to predict re-test scores from score and a profile photo, given huge enough database of US students (split in two, one available to our contestants, one for the final test), winner code will literally measure skin albedo, and in some cases maybe also try to detect eyeglasses. Of course, the morale of the story is not that racism is good but that socially sometimes we don’t want the most accurate guess.
edit2: Subtler measures may correlate too, besides the racial ones. E.g. angle between line connecting pupils of the eyes, and horizontal, the pupil dilation in the photo, use/non use of flash, strength of red eye effect, and who knows what else (how busy does the background look, maybe?). I don’t think many people here want to have their math scores be adjusted depending to how they held their head in a photograph. edit3: ohh, and the image metadata, or noise signatures, that’d be a big one—is the image taken by an expensive camera? Get free points on your math test. And a free tip: squint. It will think you’re asian or smart enough to squint.
I think fubar may be right in a certain way: if you observe someone reaching a very high score while having a known poor environment (let’s say you’ve tested them enough so one can ignore issues of <1 reliability causing a regression to the mean on subsequent retests), then you might then estimate that the non-environmental contributions must be unusually high—because something must be causing him to score very high, and it’s sure not the environment. So for example, we might infer that his genes or prenatal environment or personality are better than average.
Yes. As I say, depends to what we are trying to predict and priors. Even with 1 test and significant regression, it’s correct to infer higher non-environmental contribution, just not higher combination of environmental and non-environmental.
It seems to me that private_messaging is right and explains his point beautifully. Here’s a Robin Hanson post making a similar point. Also see this discussion, especially Wei Dai’s comment.
And more recently: http://lesswrong.com/lw/fmt/lw_women_lw_online/8h0h and http://lesswrong.com/lw/fmt/lw_women_lw_online/8hqg
I’m confused as to why race would matter if you already have the grades and test scores information. Race might be helpful in predicting what their previous grades and test scores were, but I doubt it would improve the accuracy much over a model that excluded it.
And if it really was true that race matters in who will benefit the most from the program, then so be it. Why would you not want to help those that benefited the most first, regardless what information was used to predict it?
Is it more important to be fair or accurate?
There are times where it’s more important to be fair. For example, punishing a person because he’s guilty discourages crime. Punishing someone because he’s black does not. Thus, using the fact that he’s black as evidence will mean more guilty people will go to jail, more innocents will avoid being jailed, and more people will commit crime.
I don’t think that really applies here, though.
Well, what do you think about losing points because your profile photo has atypical proportions, or atypical posture? Points adjustment for round face, or for relative finger lengths? For having too many or too few facebook friends, likes, and so on? Weight, height, and blood type?
Well, if you want to encourage education rather than encourage being white or having typical posture or other things like that, it does apply.
If you’re giving prizes to the best students to encourage them, then it applies. If you’re trying to match the teaching style to the student, I don’t think it does.
One might say that the sanity waterline one has to cross to rationally handle test-score-based Bayesian predictions in a by-and-large rational way is much lower than the sanity waterline one has to cross to rationally handle relative-finger-length-based predictions, which itself is lower than the waterline for skin-color-based predictions.
What sanity? Everyone is pushing for measures that would be advantageous for themselves, opposing disadvantageous measures, and there’s nothing particularly insane about that, it’s just instinctive selfishness. The white ‘nerds’ for instance could be OK with adjustment for race, but very much not OK with adjustment for various looking odd metrics (which lump them together with the autistic). It’s only Bayesian when it’s someone else; when it’s you losing points, that’s you being lumped together with other people (on basis of some random trait that happens to be widely measured), which is of course bad and irrational and a bias (complete with examples of how it is inexact). Nothing insane about that either, it’s just selfishness.
Meanwhile, I’d dare to guess you can get considerably larger boost in accuracy from adding a couple more questions to a test, or using data from some other standardized test.