The psychometric term for #2 is test-retest reliability, and the numbers I’ve seen for the SAT range between .8 and .95, so I think that is a complete explanation for this phenomenon.
If the 2400 scores (which came later) are higher than the 1600 scores, that’s evidence for #3, but comparing them is difficult because they do test different things and are normed differently.
The psychometric term for #2 is test-retest reliability, and the numbers I’ve seen for the SAT range between .8 and .95, so I think that is a complete explanation for this phenomenon.
If the 2400 scores (which came later) are higher than the 1600 scores, that’s evidence for #3, but comparing them is difficult because they do test different things and are normed differently.