SAT Scores out of 1600/SAT Scores out of 2400 .844 (59)
I’m surprised that this correlation wasn’t higher. They’re both pretty much the same test, right?
Three explanations I thought of:
I’m missing something/I have an inaccurate model of the difference between the two tests.
There’s a lot of random difference between SAT scores from different testings. If this is true, I would expect there to be a correlation of around .844 between one test score and a later test score under the same grading system.
SAT scores are correlated with age (no idea whether this is true or not) and people take the two tests some time apart, and thus have better scores on the second.
I thought they were partially not the same because they added the writing subtest.
If this is true, I would expect there to be a correlation of around .844 between one test score and a later test score under the same grading system.
The reliability of recent SAT tests seems to generally be ~0.9 according to one random PDF I found (and has long been high). If I am understanding the formulas in this page correctly, then in this application, reliability simplifies to the Pearson’s r of the 2 scores*, and that reliability of 0.9 is pretty similar to the LW old/new correlation r of 0.84.
So this may be simply what one would expect from people taking the SAT twice, without having to invoke the lowered correlation caused by the additional sections and any other tweaks they’ve made.
* Specifically, I’m looking at Artifactual Influences, #3: reliability, where I think we can reuse the example: for test-retest, assume the LWer doesn’t get dumber or smarter and the true correlation would be 1; the reliability of the old SAT should be 0.9, the reliability of the new one should be 0.9 too, so you get ‘1 sqrt(0.9 0.9)’ or ‘sqrt(0.9 * 0.9)’ or ‘sqrt(0.9^2)’ or ‘0.9’. So, the expected correlation of 2 SAT tests simplifies to the original reliability of 0.9.
The psychometric term for #2 is test-retest reliability, and the numbers I’ve seen for the SAT range between .8 and .95, so I think that is a complete explanation for this phenomenon.
If the 2400 scores (which came later) are higher than the 1600 scores, that’s evidence for #3, but comparing them is difficult because they do test different things and are normed differently.
I’m surprised that this correlation wasn’t higher. They’re both pretty much the same test, right?
Three explanations I thought of:
I’m missing something/I have an inaccurate model of the difference between the two tests.
There’s a lot of random difference between SAT scores from different testings. If this is true, I would expect there to be a correlation of around .844 between one test score and a later test score under the same grading system.
SAT scores are correlated with age (no idea whether this is true or not) and people take the two tests some time apart, and thus have better scores on the second.
Any ideas?
I thought they were partially not the same because they added the writing subtest.
The reliability of recent SAT tests seems to generally be ~0.9 according to one random PDF I found (and has long been high). If I am understanding the formulas in this page correctly, then in this application, reliability simplifies to the Pearson’s r of the 2 scores*, and that reliability of 0.9 is pretty similar to the LW old/new correlation r of 0.84.
So this may be simply what one would expect from people taking the SAT twice, without having to invoke the lowered correlation caused by the additional sections and any other tweaks they’ve made.
* Specifically, I’m looking at Artifactual Influences, #3: reliability, where I think we can reuse the example: for test-retest, assume the LWer doesn’t get dumber or smarter and the true correlation would be 1; the reliability of the old SAT should be 0.9, the reliability of the new one should be 0.9 too, so you get ‘1 sqrt(0.9 0.9)’ or ‘sqrt(0.9 * 0.9)’ or ‘sqrt(0.9^2)’ or ‘0.9’. So, the expected correlation of 2 SAT tests simplifies to the original reliability of 0.9.
The psychometric term for #2 is test-retest reliability, and the numbers I’ve seen for the SAT range between .8 and .95, so I think that is a complete explanation for this phenomenon.
If the 2400 scores (which came later) are higher than the 1600 scores, that’s evidence for #3, but comparing them is difficult because they do test different things and are normed differently.