I have heard skepticism of the Flynn effect. I have not taken the time to evaluate the arguments in detail. My understand of the basic argument rests on something called measurement invariance.
As a simpler example: imagine that athletic ability at various sports all strongly correlate (+ other evidence), thus leading you to think there’s a general ‘physical skills’ latent factor. Now, suppose I take a great basketball player, and then damage his arms enough that he’ll no longer be great. Potentially he could still be relatively good at, say, soccer. That is, I’ve broken the correlation between basketball performance and underlying latent athleticism.
Measurement noninvariance is when a measurement (e.g. an IQ test) doesn’t measure the same latent variable in two different populations. For example, an LLM (especially the earlier ones) that does great on the general knowledge part of an IQ test is less likely to be good at the fluid intelligence stuff. As another example, this may be what leads to those with ADHD testing lower on IQ tests, and supposedly also makes it hard to compare educational outcomes across countries or times (e.g. I’ve heard it said that PISA score fluctuations in the US aren’t really indications of the underlying factors).
The claim is that the Flynn effect has the same problem. The gains aren’t really increases in g.
I have heard skepticism of the Flynn effect. I have not taken the time to evaluate the arguments in detail. My understand of the basic argument rests on something called measurement invariance.
As a simpler example: imagine that athletic ability at various sports all strongly correlate (+ other evidence), thus leading you to think there’s a general ‘physical skills’ latent factor. Now, suppose I take a great basketball player, and then damage his arms enough that he’ll no longer be great. Potentially he could still be relatively good at, say, soccer. That is, I’ve broken the correlation between basketball performance and underlying latent athleticism.
Measurement noninvariance is when a measurement (e.g. an IQ test) doesn’t measure the same latent variable in two different populations. For example, an LLM (especially the earlier ones) that does great on the general knowledge part of an IQ test is less likely to be good at the fluid intelligence stuff. As another example, this may be what leads to those with ADHD testing lower on IQ tests, and supposedly also makes it hard to compare educational outcomes across countries or times (e.g. I’ve heard it said that PISA score fluctuations in the US aren’t really indications of the underlying factors).
The claim is that the Flynn effect has the same problem. The gains aren’t really increases in g.
Links:
Cremieux: The demise of the Flynn effect
scidirect: Flynn effects are biased by differential item functioning over time: A test using overlapping items in Wechsler scales
scidirect: Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect