While I am fine with your math, I do not like the phrasing “X explains Z% of the variance of Y”, because to the casual reader, it suggests that there is a causal relationship. For example, I might say “smoking explains X% of the (variance of the binary variable indicating the presence of) lung cancer (cases)”. Here I have a causal relationship.
But consider “IQ explains X% of the variance of lifetime earnings in Americans”, or “Lifetime earnings explain Y% of the IQ variance in Americans”. The casual reader might read the first sentence and infer a causal relationship. “Every point of IQ I CRISPR into my kid will raise the expected amount of money they make by $Z”. But purely from the correlation, we can not be sure that this intervention will have any effect at all (though there are good reasons to believe that there is some causal relationship).
More bluntly, getting a positive result on a cancer screening is correlated with dying in the next decade, but bribing your doctor to falsify your results has the opposite effect on your life expectancy as you would guess from the correlation.
Predictive power is different from causal efficacy. Consider a racist society where the government ensures that all white people get rich but all black people stay poor. In this society, the gene for lactose tolerance (which most white people have, but most black people lack) would do a great job predicting social class, but it wouldn’t cause social class.
(As usual, worth reading in full.)
Or take your initial statement
The group consensus on somebody’s attractiveness accounted for roughly 60% of the variance in people’s perceptions of the person’s relative attractiveness.
There could be vastly different causal models which explain this observation:
a) Every group member randomly assigns an attractiveness rating to a newcomer. Then everyone signals the attractiveness rating they assigned implicitly or explicitly through group interactions, and every group member updates towards the group consensus.
b) The group has some rough consensus about which traits are attractive (perhaps there is an universal attractiveness, or the group members adjusted their preferences to the group average over time, or people who find certain traits attractive ended in the group for complicated reasons), so they will rate a newcomer similarly based on their traits.
Likely, in reality it is going to be a mix of both of these and also three more causal chains. Again, as soon as you are discussing interventions you will find “explains X% of the variance” insufficient. Say you want to ask a specific person of that group on a date. You know that the group generally likes people with stripy socks, but that your potential date is indifferent to them. In case (a), you want to wear stripy socks because the group consensus of your attractiveness will update the attractiveness rating of your potential date, while in case (b) it does not matter.
I agree that “X explains Q% of the variance in Y” to me sounds like an assertion of causality, and a definition of that phrase that is merely correlations seems misleading.
Might it be better to say “After controlling for Y, the variance of X is reduced by Q%” if one does not want to imply causation?
While I am fine with your math, I do not like the phrasing “X explains Z% of the variance of Y”, because to the casual reader, it suggests that there is a causal relationship. For example, I might say “smoking explains X% of the (variance of the binary variable indicating the presence of) lung cancer (cases)”. Here I have a causal relationship.
But consider “IQ explains X% of the variance of lifetime earnings in Americans”, or “Lifetime earnings explain Y% of the IQ variance in Americans”. The casual reader might read the first sentence and infer a causal relationship. “Every point of IQ I CRISPR into my kid will raise the expected amount of money they make by $Z”. But purely from the correlation, we can not be sure that this intervention will have any effect at all (though there are good reasons to believe that there is some causal relationship).
More bluntly, getting a positive result on a cancer screening is correlated with dying in the next decade, but bribing your doctor to falsify your results has the opposite effect on your life expectancy as you would guess from the correlation.
Scott Alexander has recently written about heritability:
(As usual, worth reading in full.)
Or take your initial statement
There could be vastly different causal models which explain this observation:
a) Every group member randomly assigns an attractiveness rating to a newcomer. Then everyone signals the attractiveness rating they assigned implicitly or explicitly through group interactions, and every group member updates towards the group consensus.
b) The group has some rough consensus about which traits are attractive (perhaps there is an universal attractiveness, or the group members adjusted their preferences to the group average over time, or people who find certain traits attractive ended in the group for complicated reasons), so they will rate a newcomer similarly based on their traits.
Likely, in reality it is going to be a mix of both of these and also three more causal chains. Again, as soon as you are discussing interventions you will find “explains X% of the variance” insufficient. Say you want to ask a specific person of that group on a date. You know that the group generally likes people with stripy socks, but that your potential date is indifferent to them. In case (a), you want to wear stripy socks because the group consensus of your attractiveness will update the attractiveness rating of your potential date, while in case (b) it does not matter.
I agree that “X explains Q% of the variance in Y” to me sounds like an assertion of causality, and a definition of that phrase that is merely correlations seems misleading.
Might it be better to say “After controlling for Y, the variance of X is reduced by Q%” if one does not want to imply causation?