a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.
a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.