The irrelevance of test scores is greatly exaggerated

Link post

Here’s some claims about how grades (GPA) and test scores (ACT) predict success in college.

In a study released this month, the University of Chicago Consortium on School Research found—after surveying more than 55,000 public high school graduates—that grade point averages were five times as strong at predicting college graduation as were ACT scores. (Fortune)

High school GPAs show a very strong relationship with college graduation despite sizable school effects, and the relationship does not differ across high schools. In contrast, the relationship between ACT scores and college graduation is weak-to nothing once school effects are controlled. (University of Chicago Consortium on School Research)

“It was surprising not only to see that there was no relationship between ACT scores and college graduation at some high schools, but also to see that at many high schools the relationship was negative among students with the highest test scores” (Science Daily)

“The bottom line is that high school grades are powerful tools for gauging students’ readiness for college, regardless of which high school a student attends, while ACT scores are not.” (Inside Higher Ed)

(See also the Washington Post, Science Blog, Fatherly, The Chicago Sun Times, etc.)

All these articles are mild adaptions of a press release for Allensworth and Clark’s 2020 paper “High School GPAs and ACT Scores as Predictors of College Completion”.

I understood these articles as making the following claim: Standardized test scores are nearly useless (at least once you know GPAs), and colleges can eliminate them from admissions with no downside.

Surprised by this claim, I read the paper. I apologize if this is indelicate, but… the paper doesn’t give the slightest shred of evidence that the above claim is true. It’s not that the paper is wrong, exactly, it simply doesn’t address how useful ACT scores are for college admissions.

So why do we have all these articles that seem to make this claim, you ask? That’s an interesting question! But first, let’s see what’s actually in the paper.

Test scores are not irrelevant

The authors got data for 55,084 students who graduated from Chicago public schools between 2006 and 2009. Most of their analysis only looks at a subset of 17,753 who enrolled in a 4-year college immediately after high school. Here’s the percentage of those students who graduated college within 6 years for each possible GPA and ACT score:

We can also visualize this by plotting each row of the above matrix as a line. This shows how graduation rates change for a fixed GPA score as the ACT score is changed.

It doesn’t appear that ACT scores are useless… But let’s test this more rigorously.

Test scores are highly predictive

The full dataset isn’t available, but since we have the number of students in each ACT /​ GPA bin above, we can create a “pseudo” dataset, with a small loss of precision in the GPA and ACT score for each student. I did this, and then fit models to predict if a student would graduate using GPA alone, ACT alone, or with both together. (The model is cubic spline regression on top of a quantile transformation.)

To measure how good these fits are, I used cross-validation, repeatedly holding out 20% of the data, fitting a model like above to the other 80%, and then predicting if each student will graduate. You can measure how accurate the predictions are, either as a simple error rate (1-accuracy) or as a Brier score. I also compare to a model using no features, which just predicts the base rate for everyone.

ACT only.219.355
GPA only.210.330

It’s true that GPA does a bit better than the ACT. But if you care about that difference, you should care even more about the difference between (GPA only) and (GPA plus ACT). It’s not coherent to simultaneously claim that the GPA is better than the ACT and also that the ACT doesn’t add value to the GPA.

I repeated this same calculation with other predictors: logistic regression, decision trees, and random forests. The numbers barely changed at all.

Still, these are all just calculations based on the first table in the paper.

What the paper actually did

For each student, they recorded three variables:

  • Gender

  • Ethnicity (Black, Latino, Asian)

  • Poverty (average poverty rate in the student’s census block)

For the students who enrolled in a 4-year college, they recorded four variables about that college:

  • The number of students at the college

  • The percentage of full-time students

  • The student-faculty ratio

  • The college’s average graduation rate

They standardized all the variables to have unit mean and unit variance (except for gender and ethnicity since these are binary). For example, GPA=0 for someone with the average grades, and GPA=-2 for someone 2 standard deviations below average.

They also included squared versions of GPA and ACT, and . These are never negative and larger for any student who is unusual either on the high or low end. They do this because the relationship is “slightly quadratic”, which is reasonable, but it’s not explained why the other variables don’t get a squared version.

With this data in hand, they fit a bunch of models.

First, they predicted graduation rates from grades alone. Higher grades were better. There’s nothing really surprising here, so let’s skip the details.

Second, they predicted graduation rates from ACT scores alone. Higher ACT scores were better. As you’d expect this relationship is strong. Again, let’s skip the details.

Third, they predicted graduation rates from grades, student variables, and variables for the college the student enrolled at. This model gets a “likely-to-graduate” score for each student as follows. This labels student background variables and college institutional variables in different colors for clarity.

The “likely-to-graduate” score becomes a probability after a sigmoid transformation. If you’re not familiar with sigmoid functions, think of them like this: If a student has a score is X then graduation probability is around .5 + .025 × X. For larger X (say |X|>1) scores start to have diminishing returns, since probabilities must be between 0 and 1.

For example, the coefficient for (male) above is -.092. This means that a male has around a 2.3% lower chance of graduating than an otherwise identical female. (For students with very high or very low scores the effect will be less.)

Fourth, they predicted graduation rates from ACT scores, student variables, and college variables.

The dependence on ACT is less than the dependence on GPA in Model 3. However, the dependence on student background and college variables is much higher.

Fifth, they predicted graduation rates from GPAs, ACT scores, student variables, and college variables.

Here, there’s minimal dependence on ACT, but a negative dependence on , meaning that extreme ACT scores (high or low) both lead to lower likely-to-graduate scores.

Does that seem counterintuitive to you? Remember, we are taking a student who is already enrolled in a particular known college and predicting how likely that are to graduate from that college.

Sixth, they predicted graduation rates from the same stuff as in the previous model, but now adding mean GPA and ACT for the student’s school. They also now standardize some variables relative to each high school.

I can’t tell what variables are affected by this change of the way things are standardized. My guess is that it’s just for GPA and the SAT, but it might affect other variables too.

What this says about how to do college admissions

I mean… not much?

Here’s what these models do: Take a student with a certain GPA, ACT scores, background who is accepted to and enrolls in a given college. How likely are they to graduate?

It’s true that these models have small coefficients in front of ACT. But does this mean ACT scores aren’t good predictors of preparation for college? No. ACT scores are still influencing who enrolls in college and what college they go to. These models made that influence disappear by dropping all the students who didn’t go to college, and then conditioning on the college they went to.

These models don’t say much of anything about how college admissions should work. There’s three reasons why.

First, these models are conditioning on student background! Look at the coefficients in Model 5. What exactly is the proposal here, to do college admissions using those coefficients? So, college should explicitly penalize men and poor students like this model does? Come on.

Second, test scores influence if students go to college at all. This entire analysis ignores the 67% of students who don’t enroll in college. The paper confirms that ACT scores are a strong predictor of college enrollment.

Of course, many factors influence if a student will go to college. Do they want to? Can they get in? Can they afford it?

You might say, “Well of course the ACT is predictive here—colleges are using it.” Sure, but that’s because colleges think it gauges preparation. It’s possible they’re wrong, but… isn’t that kind of the question here? It’s absurd to assume the ACT isn’t predictive of college success, and then use that assumption to prove that the ACT isn’t predictive of college success.

Third, for students who go to college, test scores influence which college they go to, and more selective colleges have higher graduation rates. Here’s three private colleges in the Boston area and three public colleges in Michigan.

Collegeacceptance rateaverage graduation rate
Harvard University5%98%
Northeastern University18%85%
Suffolk University84%63%
University of Michigan—Ann Arbor23%92%
Michigan State University71%80%
Grand Valley State University83%60%

The paper also does a regression on students who go to college to try to predict the graduation rate of the college they end up at. Again, GPA and ACT scores are about equally predictive.

Of course, you could also drop the student background and college variables, and just predict from GPA and ACT. But remember, we did that above, and the ACT was extremely predictive.

Alternatively, I guess you could condition on student background without conditioning on the college students go to. I doubt this is a good idea or a realistic idea, but at least it’s causally possible for colleges to use such a model to do admissions.

Why didn’t the authors do this? Well… Actually, they did.

Unfortunately, this is sort of hidden away on a corner of the paper, and no coefficients are given other than for GPA and ACT. It’s not clear if high-school GPA or ACT are even included here. The authors were not able to provide the other coefficients (nor to even acknowledge multiple polite requests notthatimbitteraboutit).

The laundering of unproven claims

What happened? There’s really nothing fundamentally wrong in the paper. It fits some models to some data and gets some coefficients! Interpreted carefully, it’s all fine. And the paper itself never really pushes anything beyond the line of what’s technically correct.

Somehow, though, the second the paper ends, and the press release starts, all that is thrown out the window. Rather than “ACT scores definitely predict college graduation, but they don’t seem to give any extra information if you already know how they interact with college application, acceptance and enrollment, if we condition on student demographic variables in an implausible way”, we get “ACT scores don’t predict college success”.

To be fair, a couple hedges like “once school effects are controlled” make their way into the articles but are treated as a minor technical asides and never explained.

Let’s separate a bunch of claims.

  1. It might be desirable to reduce the influence of test scores on college admissions to achieve worthy social goals.

  2. It might be that test scores don’t predict college graduation rates.

  3. It might be that test scores only predict college graduation because selective (and high graduation-rate) colleges choose to use them in admissions.

  4. It might be that if selective colleges stopped using test scores in admissions, test scores would no longer predict admissions.

I’m open to claim #1 being true. If you believe #1, it would be convenient if #2, #3, and #4 were true. But the universe is not here to please us. #2 is not just unproven but proven to be false. This paper does not provide evidence for #3 or #4. Yet because these claims were inserted into the public narrative after peer review, we have a situation where the paper isn’t wrong, yet it is being used as evidence for claims it manifestly failed to establish.

Journals don’t issue retractions for press releases.

A field guide

There’s a fair number of errors and undefined notation in the paper, which might throw you off if you try to read it. I’ve created a guide to help with this.