Varimax Rotation on 'COEFF' matrix output from princomp command giving strange output

I work at Columbia University Earth Institute, and I need to troubleshoot an output I am getting when I conduct a Varimax Rotation on my PCA outputs using the commands 'princomp' and 'rotatefactors' command:
my original data matrix "A" is a 59/34 matrix, where rows are observations and columns are variables. The Matrix has units that must be standardized. To do this I have used the following command: [COEFF, SCORE, latent] = princomp(zscore(A));
Based on the "rotatefactors" command instructions, to conduct a varimax rotation I then put my loadings matrix ("COEFF") into the command, i.e.: RotatedLoadings = rotatefactors(COEFF).
**My Question: My output from the rotate factors command does not look correct at all. The matrix returned to me is a 34X34 matrix of mostly zeroes. Each column as one "1". This is not the case when I use the same data in the command 'factoran', which gives me rotated loadings that (while I realize they should be slightly different) look far more accurate. However, for this work I need to use PCA.
Can someone advise on this? Why is my rotated loadings matrix incorrect when I use the PCA/rotatefactors commands? What am I doing incorrectly?
Thank you in advance!

 채택된 답변

Kaitlin, I think this is an artifact of your using the maximal number of PCs. Varimax attempts to find a rotation of your PCs such that each one is strongly correlated with as few of the original variables as possible. But since you have 34 variables and 34 (orthogonal) PC's that just means, "each rotated PC is one of the original variables." Which is to say, you've found the inverse to princomp. Usually, you'd want to throw away unimportant PCs to reduce the dimensionality of the data. You may have reasons for keeping all 34, but there's not much point in doing PCA if you are.
You have 59 observations on 34 variables, and so the PC coef matrix is 34x34. Consider this simpler example with 3 variables:
Generate a data cloud
rng default
mu = [1 2 3];
T = randn(3); Sigma = T*T';
X = mvnrnd(mu,Sigma,10);
Get the PC coefs and verify that they're orthonormal
coefs = princomp(zscore(X))
coefs'*coefs
Make a biplot of the three original variables against the three PCs. Each vector represents one of the three original variables, each axis represents one of the three PCs. You can see that the vectors are perpendicular to each other (rotate the plot interactively to see it better). That's because the PCs are orthogonal.
biplot(coefs)
Rotate all three coefs, and verify that they're still orthonormal.
rotatedCoefs3 = rotatefactors(coefs(:,1:3),'Method','varimax')
rotatedCoefs3'*rotatedCoefs3
A biplot of the rotated PCs demonstrates that varimax has rotated the axes of the first biplot so that each PC lines up exactly with one of the original variables.
biplot(rotatedCoefs3)
If you do the same thing, but retain only two PCs, you'll see what you were expecting. Hope this helps.

댓글 수: 8

Peter, this is extremely helpful! The matlab instructions for the 'rotatefactor' command made it seem as if I had to give it the entire COEFF matrix for it to rotate, from which it would then produce an output of a reduced number of factors.
So, I suppose my follow up question is, what tools are available in MatLab for determining the number of principal components to retain, besides simply looking at the 'latent' (eigenvalues) output from 'princomp' and determining from there (following the Kaiser Criterion, I would like to retain only the factors with an eigenvalue greater than 1)? For example, in your example you retain 3 to demonstrate this to me. But it may be the case in this data that it is necessary to retain, say, 11. Is their a MatLab command for that, or shall I continue to use the 'latent' output to decide.
Thank you,
Kaitlin
Kaitlin, choosing the number of PCs is an art, not a science. I can't tell you what to do because I don't know what you're up to. Often the choice is based on the percent of variance explained. Often the choice is required to be 2 or 3 because the goal is visualization. Looking at a plot of the variances (latent roots) is a good first start. The documentation shows how to do this. The "Kaiser criterion" is at best ad-hoc, and often completely meaningless.
By the way, I guess I forgot to demonstrate how you ought to be calling rotatefactors: pass in coefs(:,1:numComponentsKept).
Peter, thank you for your response.
I am following up because I am still getting some odd (though much improved) loadings from the rotation that do not look right to me. I have searched all of the other "answers" in this forum to no avail, so I will have to follow up with you here.
In returning to your original answer...I have done as you suggested on my original data (Matrix A, e.g. the 59X34 matrix). It would be helpful to add here that my variables are socioeconomic variables and my observations are census tracts.:
coefs =princomp(zscore(A));
rotatedCoefs3 = rotatefactors(coefs(:,1:3),'Method','varimax');
This did help! However, the loadings in the rotated factors are still very low (<0.4), and within each factor the loadings are similar to each other, i.e. there are not noticeably higher loadings within each factor from which to determine which variables hold the most weight for the factor.
To check, I put the same data into the "factoran" command (which automatically rotates using the varimax rotation) using the same number of factors (3). The resulting loadings looked more accurate. For example, the loading for the 34th variable in Factor 1 (Row 34, Column 1) from 'factoran' is 0.9, whereas in 'rotatefactors' it is 0.2.
I realize statistically you cannot directly compare factor analysis and principal component analysis results, but the discrepancies between the two methods I mention above seems too large.
Any thoughts on why the loadings resulting from the 'rotatefactors' rotation method are so small?
Any advice is warmly welcomed.
Kaitlin
Kaitlin, two things:
First, in your original post, you pointed out that when you kept all the components, the rotated PCA loadings were all either (approximately) 1 or 0. Now you've said that after keeping only 3 components, the rotated loadings are "still very low". I can't reconcile those two statements -- 1 is as large as they can be.
Second, even though many people think of them as the same thing, PCA and FA (or at least the version of FA that FACTORAN implements) use completely unrelated algorithms that find optimal solutions with respect to completely different criteria. At its core, PCA attempts to explain variance, while FA attempts to explain covariance (and puts any unexplained variance into the "specific variances"). In practice, the two methods often give similar results, but it's really easy to cook up example where that's not true.
It sounds like you have a prior expectation for what your loadings should look like, i.e., you are expecting to find a small number of components that collectively explain a large proportion of the variance, and each correlate strongly with only one of the original variables. Maybe you're right in expecting that. But in general there is reason why PCA should do that, and it can't possibly do that if you have scaled your variables to unit variance with zscore. I'm rusty on FA, but I suspect that it won't come up with such a solution either. Perhaps I've just misunderstood your description.
You already saw that if you keep all the components, you get PCA loadings that are too specific to your original variables -- loadings all either 1 or 0, and that's not surprising. Now you've seen that if you keep only three components, they are too broadly loaded across your original variables, and that would not be surprising if your data can't reasonably be reduced to three dimensions. You've chosen to use PCA. So I think you need to first worry about picking a suitable number of components, based on how much variance is explained, to see if you can reduce the dimensionality of your data. Only then should you be worrying about rotation and interpreting what your solution means in terms of your original variables.
Actually, if PCA and FA do in fact lead to very different results on your data, you might learn something useful by pursuing why that is. But they're your data, so I have to leave that up to you.
Oops, I left out an important word: "But in general there is ***no***reason why PCA should do that,"
I think your second paragraph may get to the heart of resolving my issue...
But first, to answer your question about where my expectations for what loadings should look like is coming from, I should have been more clear. My original post was coming from the fact that I was trying to reproduce outputs from another study. Thus, my decision to retain a certain number of factors, and my expectations for what the loadings should look like were an artifact of trying to reproduce that other study's results, which used FA. I was testing the two MatLab methods to see if I was doing them correctly.
I was finding that I could replicate the factor loadings (coefficients) using FACTORAN, which was expected, but not using PRINCOMP, all else being equal (e.g. rotating both, retaining the same factors in both, etc.). To control for the effect of rotation, I compared both commands' outputs without rotation, but they still do not resemble each other. This is confusing because I assumed the two methods in MatLab would produce at least similar loadings before rotation.
What seems to be the crux of the issue (from your paragraph 2 above) is that I expected both FACTORAN and PRINCOMP to reproduce the study's results, or at least give somewhat similar results to one another. But you are saying this is not necessarily the case. FACTORAN implements a completely different algorithm, and therefore will not necessarily produce the same factor loadings matrix as PRINCOMP.
This may get to the heart of the issue. But, if I may, I would appreciate some final clarification on your response to wrap this up:
(1) The FACTORAN "loadings" output and the PRINCOMP "coefs" output are both factor loadings matrices. In other words, both matrices are what a user would refer to to see the correlations between individual variables and an entire factor, correct?
(2) However, even though they are both factor loading matrices, all else being equal, you are saying that the commands can still produce incongruent matrices because of the version of FA that FACTORAN implements in Matlab, or because of some characteristic in the raw data "that may be worth exploring", correct?
(3) But then, if I've understood you correctly up to this point, could you clarify for me how your statement, "At its core, PCA attempts to explain variance, while FA attempts to explain covariance" connects to the actual loadings(coefficients)? I understand what you are saying, but I do not understand exactly why the coefficient between a variable and a factor would be impacted by a method that explains variance versus a method that explains covariance. Wouldn't that just impact how the variance is described? Furthermore, in the context of MatLab and its algorithms, does that mean the two matrices give different information (e.g. #1 above is NOT correct?)...
I would greatly appreciate your clarification on this point...
(4) In light of #1-3, can I analyze my loadings matrix from PRINCOMP in the same way I would analyze my loadings matrix from FACTORAN? If not, why? Hopefully your response, especially to #3 will clarify this.
Does my questioning make sense?
Thank you very much. Hopefully this conversation is helpful to other users, and will provide useful clarifications for others.
Warm cheers,
Kaitlin
There's a lot here, and it's complicated. Let me try to address at least some of what you ask.
* First of all, rotations.
A FA "solution" is unique only up to rotation -- for a fixed number of factors, any rigid rotation of one solution is another equally valid solution. The point of rotation is to come up with a solution whose factors can be explained in meaningful terms. Most often that means "each latent factor contributes heavily to one group of closely related measured variables." You seem to want to find a solution that is much more specific than that, where each latent factor contributes heavily to only one measured variable, and that doesn't seem realistic. unless you have one factor for each variable, which is pointless (and overparameterized in the FA model anyway).
A "full" PCA solution for P variables has P components, some of which typically contribute more than the others to the overall variation in the data (and if you have too few observations, some may contribute nothing). The components are ordered by the amount of variation the contribute to the data. If you rotate those components, you break that ordering. What you typically do with that full solution is to throw away most components and keep a very few that you can (for example) visualize, but that still "explain" most of the variation in the data. At that point, many people who come from the FA world want to rotate that reduced solution to be able interpret it. That's OK for interpretation, but with the caveat that you break the "variance ordering" by doing so. But once you have selected the number of components, you are free to rotate them, and indeed any rotation of the unrotated reduced solution is just as good at explaining your data (again, given the number of unrotated components).
So, in general, you would need to rotate either PCA or FA to get similar solutions. There is no reason to expect similar solutions without a rotation. The PROCRUSTESE function can be helpful in that respect -- it can rotate/reflect your PCA coefs to a beset fit of your FA loadings.
* Different algorithms and criteria.
Consider what PCA does: First it finds the linear combination of your measured variables that has the largest variance. That's principal component 1 -- column 1 of the coefs matrix. Then it finds the linear combination that is orthogonal to the first, and has the next largest variance. And so on, until you get P components. They are all orthogonal, so you can think of PCA as nothing more than a rotation of the original coordinate axes. It's easiest to think about this with an elogated cloud of points in 2- or 3-D. The point, however, is to notice what might happen: if you have one variable that has much more variance than all the others, your first PC will essentially be that one variable.
Now consider the FA model. It has the loadings matrix, and specific variances. It tries to fit the covariance matrix of your data with a matrix that looks like L*L' + Psi, where L is the (taller than wide) loadings matrix) and Psi is the (diagonal) specific variances matrix. FA can find correlations among you variables, and then assign any left-over "individual" variation to Psi. So if you have one variable with much larger variance than the others, it may not appear in L at all, but only in Psi.
You added complexity because by using PCA on standardized data, it's impossible for one variable to have a much larger variance than the others.
So yes, both PCA and FA return a "loading matrix", and both represent a new variable that is constructed as a linear combination of the original variables. But how they contribute to the mode can be quite different. The loadings in FA primarily describe the correlations among your original variables, the same is not necessarily true for the coefs from PCA.
What's going on in your data? I can't possibly answer that. Hope this help, best of luck.
Thanks, Peter. Your time was much appreciated.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by