But neither of those are particularly compelling reasons for disagreement—can anyone more familiar with the psychological/statistical territory shed some light?
Shalizi’s most basic point — that factor analysis will generate a general factor for any bunch of sufficiently strongly correlated variables — is correct.
Here’s a demo. The statistical analysis package R comes with some built-in datasets to play with. I skimmed through the list and picked out six monthly datasets (72 data points in each):
It’s pretty unlikely that there’s a single causal general factor that explains most of the variation in all six of these time series, especially as they’re from mostly non-overlapping time intervals. They aren’t even that well correlated with each other: the mean correlation between different time series is −0.10 with a std. dev. of 0.34. And yet, when I ask R’s canned factor analysis routine to calculate a general factor for these six time series, that general factor explains 1⁄3 of their variance!
However, Shalizi’s blog post covers a lot more ground than just this basic point, and it’s difficult for me to work out exactly what he’s trying to say, which in turn makes it difficult to say how correct he is overall. What does Shalizi mean specifically by calling g a myth? Does he think it is very unlikely to exist, or just that factor analysis is not good evidence for it? Who does he think is in error about its nature? I can think of one researcher in particularwho stands out as just not getting it, but beyond that I’m just not sure.
In your example, we have no reason to privilege the hypothesis that there is an underlying causal factor behind that data. In the case of g, wouldn’t its relationships to neurobiology be a reason to give a higher prior probability to the hypothesis that g is actually measuring something real? These results would seem surprising if g was merely a statistical “myth.”
In the case of g, wouldn’t its relationships to neurobiology be a reason to give a higher prior probability to the hypothesis that g is actually measuring something real?
The best evidence that g measures something real is that IQ tests are highly reliable, i.e. if you get your IQ or g assessed twice, there’s a very good correlation between your first score and your second score. Something has to generate the covariance between retestings; that g & IQ also correlate with neurobiological variables is just icing on the cake.
To answer your question directly, g’s neurobiological associations are further evidence that g measures something real, and I believe g does measure something real, though I am not sure what.
These results would seem surprising if g was merely a statistical “myth.”
Shalizi is, somewhat confusingly, using the word “myth” to mean something like “g’s role as a genuine physiological causal agent is exaggerated because factor analysis sucks for causal inference”, rather than its normal meaning of “made up”. Working with Shalizi’s (not especially clear) meaning of the word “myth”, then, it’s not that surprising that g correlates with neurobiology, because it is measuring something — it’s just not been proven to represent a single causal agent.
Personally I would’ve preferred Shalizi to use some word other than “myth” (maybe “construct”) to avoid exactly this confusion: it sounds as if he’s denying that g measures anything, but I don’t believe that’s his intent, nor what he actually believes. (Though I think there’s a small but non-negligible chance I’m wrong about that.)
From what I can gather, he’s saying all other evidence points to a large number of highly specialized mental functions instead of one general intelligence factor, and that psychologists are making a basic error by not understanding how to apply and interpret the statistical tests they’re using. It’s the latter which I find particularly unlikely (not impossible though).
You might be right. I’m not really competent to judge the first issue (causal structure of the mind), and the second issue (interpretation of factor analytic g) is vague enough that I could see myself going either way on it.
Belatedly: Economic development (including population growth?) is related to CO2, lung deaths, international airline passengers, average air temperatures (through global warming), and car accidents.
Shalizi’s most basic point — that factor analysis will generate a general factor for any bunch of sufficiently strongly correlated variables — is correct.
Here’s a demo. The statistical analysis package R comes with some built-in datasets to play with. I skimmed through the list and picked out six monthly datasets (72 data points in each):
atmospheric CO2 concentrations, 1959-1964
female UK lung deaths, 1974-1979
international airline passengers, 1949-1954
sunspot counts, 1749-1754
average air temperatures at Nottingham Castle, 1920-1925
car drivers killed & seriously injured in Great Britain, 1969-1974
It’s pretty unlikely that there’s a single causal general factor that explains most of the variation in all six of these time series, especially as they’re from mostly non-overlapping time intervals. They aren’t even that well correlated with each other: the mean correlation between different time series is −0.10 with a std. dev. of 0.34. And yet, when I ask R’s canned factor analysis routine to calculate a general factor for these six time series, that general factor explains 1⁄3 of their variance!
However, Shalizi’s blog post covers a lot more ground than just this basic point, and it’s difficult for me to work out exactly what he’s trying to say, which in turn makes it difficult to say how correct he is overall. What does Shalizi mean specifically by calling g a myth? Does he think it is very unlikely to exist, or just that factor analysis is not good evidence for it? Who does he think is in error about its nature? I can think of one researcher in particular who stands out as just not getting it, but beyond that I’m just not sure.
In your example, we have no reason to privilege the hypothesis that there is an underlying causal factor behind that data. In the case of g, wouldn’t its relationships to neurobiology be a reason to give a higher prior probability to the hypothesis that g is actually measuring something real? These results would seem surprising if g was merely a statistical “myth.”
The best evidence that g measures something real is that IQ tests are highly reliable, i.e. if you get your IQ or g assessed twice, there’s a very good correlation between your first score and your second score. Something has to generate the covariance between retestings; that g & IQ also correlate with neurobiological variables is just icing on the cake.
To answer your question directly, g’s neurobiological associations are further evidence that g measures something real, and I believe g does measure something real, though I am not sure what.
Shalizi is, somewhat confusingly, using the word “myth” to mean something like “g’s role as a genuine physiological causal agent is exaggerated because factor analysis sucks for causal inference”, rather than its normal meaning of “made up”. Working with Shalizi’s (not especially clear) meaning of the word “myth”, then, it’s not that surprising that g correlates with neurobiology, because it is measuring something — it’s just not been proven to represent a single causal agent.
Personally I would’ve preferred Shalizi to use some word other than “myth” (maybe “construct”) to avoid exactly this confusion: it sounds as if he’s denying that g measures anything, but I don’t believe that’s his intent, nor what he actually believes. (Though I think there’s a small but non-negligible chance I’m wrong about that.)
From what I can gather, he’s saying all other evidence points to a large number of highly specialized mental functions instead of one general intelligence factor, and that psychologists are making a basic error by not understanding how to apply and interpret the statistical tests they’re using. It’s the latter which I find particularly unlikely (not impossible though).
You might be right. I’m not really competent to judge the first issue (causal structure of the mind), and the second issue (interpretation of factor analytic g) is vague enough that I could see myself going either way on it.
By the way, welcome to Less Wrong! Feel free to introduce yourself on that thread!
If you haven’t been reading through the Sequences already, there was a conversation last month about good, accessible introductory posts that has a bunch of links and links-to-links.
Thank you!
Belatedly: Economic development (including population growth?) is related to CO2, lung deaths, international airline passengers, average air temperatures (through global warming), and car accidents.