dynomight comments on Factors of mental and physical abilities—a statistical analysis

dynomight 18 Aug 2021 17:04 UTC
1 point
Thanks, I clarified the noise issue. Regarding factor analysis, could you check if I understand everything correctly? Here’s what I think is the situation:

We can write a factor analysis model (with a single factor) as

$x = w g + e$

where:
1. $x$ is observed data
2. $g \sim N (0, 1)$ is a random latent variable
3. $w \in R^{n}$ is some vector (a parameter)
4. $e \sim N (0, Σ)$ is a random noise variable
5. $Σ$ is the covariance of the noise (a parameter)
It always holds (assuming $g$ and $e$ are independent) that

$Cov [x] = w w^{T} + Σ .$

In the simplest variant of factor analysis (in the current post) we use $Σ = a I$ in which case you get that

$Cov [x] = w w^{T} + a I .$

You can check if this model fits by (1) checking that $x$ is Normal and (2) checking if the covariance of x can be decomposed as in the above equation. (Which is equivalent to having all singular values the same except one).

The next slightly-less-simple variant of factor analysis (which I think you’re suggesting) would be to use $Σ = diag (a)$ where $a$ is a vector, in which case you get that

$Cov [x] = w w^{T} + diag (a) .$

You can again check if this model fits by (1) checking that $x$ is Normal and (2) checking if the covariance of $x$ can be decomposed as in the above equation. (The difference is, now this doesn’t reduce to some simple singular value condition.)

Do I have all that right?
- Radford Neal 19 Aug 2021 2:24 UTC
  2 points
  Parent
  Assuming you’re using “C” to denote Covariance (“Cov” is more common), that seems right.
  It’s typical that the noise covariance is diagonal, since a general covariance matrix for the noise would render use of a latent variable unnecessary (the whole covariance matrix for x could be explained by the covariance matrix of the “noise”, which would actually include the signal as well). (Though it could be that some people use a non-diagonal covariance matrix that is subject to some other sort of constraint that makes the procedure meaningful.)
  Of course, it is very typical for people to use factor analysis models with more than one latent variable. There’s no a priori reason why “intelligence” couldn’t have a two-dimensional latent variable. In any real problem, we of course don’t expect any model that doesn’t produce a fully general covariance matrix to be exactly correct, but it’s scientifically interesting if a restricted model (eg, just one latent variable) is close to being correct, since that points to possible underlying mechanisms.