How to apply PCA correctly?
이전 댓글 표시
Hello
I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).
Now, I can do PCA with the following command:
[coeff, score] = pca(X);
As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.
Is this correct?
But if this is correct, why is then X * coeff not equal to score?
댓글 수: 1
Sepp @Sepp
your doubt can be clarified by this tutorial (eventhough in another program context) .. specially after 5' in https://www.youtube.com/watch?v=eJ08Gdl5LH0
the cliclist
fabulous and generous explanation
채택된 답변
추가 답변 (2개)
Yaser Khojah
2019년 4월 17일
2 개 추천
Dear the cyclist, thanks for showing this example. I have a question regarding to the order of the COEFF since they are different than the V. Is there anyway to see which order of these columns? In another word, what are the variables of each column?
댓글 수: 8
the cyclist
2019년 4월 17일
편집: the cyclist
2019년 4월 17일
Quoting from the first section of the documentation for the pca function.
"Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance."
You can see that
var(dataInPrincipalComponentSpace)
has values in descending order.
Yaser Khojah
2019년 4월 17일
i understand that but I do not see how the PC is related to the column of the original data (X). How can I know which variables from the original data has the strength impact?
Nyssa Capman
2020년 1월 5일
편집: Nyssa Capman
2020년 3월 11일
I believe each row of coeff corresponds to the variables, in the order they were input as.
So, the first column has the coefficients for the 1st* PC, for each variable. The second column has the coefficints for the 2nd PC, for each variable, and so on.
This post is now several months old, and not really the original question, however I was also confused by this when getting started so I wanted to add this in case someone else is confused in the future and finds this post.
*[edited typo from '2nd' to '1st']
Image Analyst
2020년 1월 5일
"So, the first column has the coefficients for the 2nd PC, for each variable. " ??? Huh? And this is supposed to reduce confusion?
Alex
2020년 3월 31일
Hello,
I have some doubts on pca.
I have 2 variables with n observations each, and the coeff matrix is the following:
0.9999 -0.00944
0.0094 0.9999
As I understood, the first column represents the coefficient of the first principal component, 0.9999 is for the first variable in the initial matrix and 0.0094 for the second one.
But why the linear combination of coeff*variable does not give the same result as the first column of score?
Thank you
the cyclist
2020년 3월 31일
As you can see in my code above it is
X * coeff
that should equal score, not
coeff * X
(where X is the de-meaned input to pca).
Yuan Luo
2020년 11월 8일
why X need to be de-meaned? since pca by defualt will center the data.
the cyclist
2020년 12월 26일
Sorry it took me a while to see this question.
If you do
[coeff,score] = pca(X);
it is true that pca() will internally de-mean the data. So, score is derived from de-meaned data.
But it does not mean that X itself [outside of pca()] has been de-meaned. So, if you are trying to re-create what happens inside pca(), you need to manually de-mean X first.
Greg Heath
2015년 12월 13일
0 개 추천
Hope this helps.
Thank you for formally accepting my answer
Greg
카테고리
도움말 센터 및 File Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!