If you want to predict someone’s actual response rate, the model is fine—you can get an accurate prediction by plugging in that person’s information into the model, provided they’re in the attractiveness stratum of the data pool used to fit the model. But for OkCupid’s claimed causal inference, we’re interested in counterfactual inquiries like, “What sort of response rate would black woman X get if she were white?” But we can’t twiddle just the race variable in the model to get the answer: if X were white, her attractiveness rating would be different too, and she might end up in a different stratum than the one addressed by the model. Since there are no results for that counterfactual stratum, the model cannot address the counterfactual, i.e., it can’t be used to make causal claims.
If you want to predict someone’s actual response rate, the model is fine—you can get an accurate prediction by plugging in that person’s information into the model, provided they’re in the attractiveness stratum of the data pool used to fit the model. But for OkCupid’s claimed causal inference, we’re interested in counterfactual inquiries like, “What sort of response rate would black woman X get if she were white?” But we can’t twiddle just the race variable in the model to get the answer: if X were white, her attractiveness rating would be different too, and she might end up in a different stratum than the one addressed by the model. Since there are no results for that counterfactual stratum, the model cannot address the counterfactual, i.e., it can’t be used to make causal claims.
ETA: Andrew Gelman calls this the fallacy of controlling for an intermediate outcome.
Thanks for the link, that description applies to OkCupid’s analysis perfectly.