- PCA: https://www.mathworks.com/help/stats/pca.html.
- stepwiselm: https://www.mathworks.com/help/stats/stepwiselm.html.
- fitlm: https://www.mathworks.com/help/stats/fitlm.html.
Using and Reading PCA Statistical Results
조회 수: 6 (최근 30일)
이전 댓글 표시
Using pca() with a 7627x17 matrix of variables (columns = variables, rows = observations), I am outputting the coeff and explained as the pca() help page details.
Question 1: Does the input matrix require the last column to be the response/dependent variable (like stepwiselm() and fitlm())? If no, how are these results being calculated without the response/dependent variable involved?
Question 2: Am I right in this interpretation of the results below by saying the third predictor variable comprises ~0.995 of the influence of the first principle component, which explains 88.6% of the predictor variable variance? If that's correct, then how is this newfound info useful in explaining my response/dependent variable?
Somebody set me straight, please. Ultimately, my goal is to determine which original predictor variables explains the response variable the best, and then quantify that. My thought was to take the most influential variables as determined by PCA and then build the model using stepwiselm() or fitlm().
coeff:
0.00129729002558727 0.289549385373420 0.956728609103309 -0.0212665364089570
-0.0953914970117933 0.952329812960381 -0.287992542335858 0.0187816335691187
0.995206207590380 0.0910468102726022 -0.0294065217015872 -0.0186854868976485
-0.000397082225762763 -0.00246282907057587 0.00447608503472252 -0.0110332572889799
0.00322274954398379 -0.0226606357739981 -0.00261825636192142 0.0575596465344526
-0.000101350632956080 -6.19964461638700e-06 -1.49483017598818e-05 -5.08226633593425e-05
0.00852263907323397 0.0165679978418983 0.00865470258457646 0.128426646955215
0.00104561882463731 -0.00110133731797852 0.00439471539229791 -0.000299508982820549
0.00121992923377658 -0.000898286153472998 0.00374785080734514 0.00364889452050408
0.00119226703173668 -0.00114898617310396 0.00212207512452650 0.00644443091048378
0.000748542020111456 -0.00201026790580145 -0.00109958052522231 0.00763259394674056
-0.000523073666889972 -0.00286022621479378 0.00406153294859059 0.00776832543962255
-0.000417086918391104 -0.00115684825783806 0.00631626569801059 0.00530098022903176
0.000918261002587542 0.00258292617487317 0.00785296732106662 0.00108255503976634
0.000864124551517556 0.00145714217693526 0.00281347075801307 -0.00354952384903065
-9.23014821423366e-05 0.000388571081226511 0.000173062207881905 -0.00110996732336648
0.0193283447001054 -0.0109382094403551 0.0244754518699291 0.989293153560380
explained:
88.6118904456048
8.77262546524695
2.07011007727990
0.437152292606406
0.0639436168558254
0.0223883103766897
0.0106035177147705
0.00808530454905423
0.00161881588630160
0.000820663786317213
0.000409453989136154
0.000267674592543745
7.93610602222986e-05
4.25607722914739e-06
5.80587204580441e-07
1.42789942911271e-07
2.09966934592139e-08
댓글 수: 0
답변 (1개)
Shivansh
2024년 5월 26일
Hi Balsip!
PCA (Principal Component Analysis) does not require the last column to be the response or dependent variable. PCA is an unsupervised learning technique used primarily for dimensionality reduction or to identify patterns in data based on the correlation between features.
For the second part, your interpretation is correct that the third predictor significantly influences the first principal component, accounting for 88.6% of the variance in predictors. However, PCA does not directly relate predictors to a response variable. For understanding which predictors best explain the response, use influential variables from PCA in regression models like "stepwiselm" or "fitlm".
You can refer to the following documentations for more information:
I hope it helps!
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!