There was a part of the post where I wrote “I might well be screwing up the math here”, where I wasn’t sure whether to square something or not, and didn’t bother to sort it out. Anyway, I think this comment is a claim that I was doing it wrong, maybe? But that person is not very confident either, and anyway I’m not following their reasoning. Still hoping that someone will give a more definitive answer. I would love to correct the post if I’m wrong.
X is a predictor of T (genes, polygenic score, etc), and
Y is the observed test score.
In terms of the original quote: “I thought that if some input [X] explains X% of the variance of some outcome [T], then it explains r2X% of the variance of a noisy measurement of the outcome [Y].”
So we know something about the relationship between X and T (the percentage of the variance in T that X explains), and something about the relationship between Y and T (the reliability of Y), and we’d like to use this to tell us something about the relationship between X and Y (how much of the variance in Y that X explains).
First, we’ll show how we can use the test-retest correlation to estimate reliability. Model the observed phenotype as Y=T+E with error E (E⊥T and E⊥X). Given two independent repeats Y1=T+E1, Y2=T+E2, E1⊥E2, Ei⊥T, and Ei⊥X, then Cov(Y1,Y2)=Cov(T+E1,T+E2)=Var(T) . Also Var(Y1)=Var(Y2)=Var(Y). So corr(Y1,Y2)=Cov(Y1,Y2)σY1σY2=Var(T)/Var(Y). This (Var(T)/Var(Y)) is the definition of reliability (under the classical test theory model with parallel forms / independent errors), so we can use the test-retest correlation to estimate reliability, Rel(Y)≈0.55.
Now, Cov(X,Y)=Cov(X,T+E)=Cov(X,T)+Cov(X,E)=Cov(X,T) since X⊥E. From which we can see that corr(X,Y)=Cov(X,Y)σXσY=Cov(X,T)σXσY=Cov(X,T)σXσTσTσY=corr(X,T)√Var(T)Var(Y)=corr(X,T)√Rel(Y).
Square both sides of that equation to get corr(X,Y)2=corr(X,T)2Rel(Y). Since we only have a single predictor, R2=corr(X,Y)2, so this is equivalent to R2X→Y=R2X→TRel(Y) (though apparently a similar argument also works to derive the same result with multiple predictors). So the attenuation correction factor is Rel(Y).
If R2X→T=0.50 and Rel(Y)≈0.55, then R2X→Y≈0.275, not 0.50×0.552≈0.151.
So, in this case the attenuation factor is just the correlation. When would it have been the squared correlation (like you originally had)? The key was that we started with the test-retest correlation corr(Y1,Y2) and we showed that Rel(Y)=corr(Y1,Y2). If what we had instead was corr(T,Y), then corr(T,Y)=√Var(T)Var(Y)=√Rel(Y), and the attenuation factor would instead have been corr(T,Y)2.
There was a part of the post where I wrote “I might well be screwing up the math here”, where I wasn’t sure whether to square something or not, and didn’t bother to sort it out. Anyway, I think this comment is a claim that I was doing it wrong, maybe? But that person is not very confident either, and anyway I’m not following their reasoning. Still hoping that someone will give a more definitive answer. I would love to correct the post if I’m wrong.
Let’s define three variables:
T is the true phenotype we’d like to measure,
X is a predictor of T (genes, polygenic score, etc), and
Y is the observed test score.
In terms of the original quote: “I thought that if some input [X] explains X% of the variance of some outcome [T], then it explains r2X% of the variance of a noisy measurement of the outcome [Y].”
So we know something about the relationship between X and T (the percentage of the variance in T that X explains), and something about the relationship between Y and T (the reliability of Y), and we’d like to use this to tell us something about the relationship between X and Y (how much of the variance in Y that X explains).
First, we’ll show how we can use the test-retest correlation to estimate reliability. Model the observed phenotype as Y=T+E with error E (E⊥T and E⊥X). Given two independent repeats Y1=T+E1, Y2=T+E2, E1⊥E2, Ei⊥T, and Ei⊥X, then Cov(Y1,Y2)=Cov(T+E1,T+E2)=Var(T) . Also Var(Y1)=Var(Y2)=Var(Y). So corr(Y1,Y2)=Cov(Y1,Y2)σY1σY2=Var(T)/Var(Y). This (Var(T)/Var(Y)) is the definition of reliability (under the classical test theory model with parallel forms / independent errors), so we can use the test-retest correlation to estimate reliability, Rel(Y)≈0.55.
Now, Cov(X,Y)=Cov(X,T+E)=Cov(X,T)+Cov(X,E)=Cov(X,T) since X⊥E.
From which we can see that corr(X,Y)=Cov(X,Y)σXσY=Cov(X,T)σXσY=Cov(X,T)σXσTσTσY=corr(X,T)√Var(T)Var(Y)=corr(X,T)√Rel(Y).
Square both sides of that equation to get corr(X,Y)2=corr(X,T)2Rel(Y). Since we only have a single predictor, R2=corr(X,Y)2, so this is equivalent to R2X→Y=R2X→TRel(Y) (though apparently a similar argument also works to derive the same result with multiple predictors). So the attenuation correction factor is Rel(Y).
If R2X→T=0.50 and Rel(Y)≈0.55, then R2X→Y≈0.275, not 0.50×0.552≈0.151.
So, in this case the attenuation factor is just the correlation. When would it have been the squared correlation (like you originally had)? The key was that we started with the test-retest correlation corr(Y1,Y2) and we showed that Rel(Y)=corr(Y1,Y2). If what we had instead was corr(T,Y), then corr(T,Y)=√Var(T)Var(Y)=√Rel(Y), and the attenuation factor would instead have been corr(T,Y)2.