difference between pca and pcaFromStatToolbox

Question

Amir 2016년 2월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/269483-difference-between-pca-and-pcafromstattoolbox

댓글: the cyclist 2016년 2월 23일

It might sound stupid, but I actually am confused with the results of the pca and pcaFromStatToolbox. I noticed the different output just now and I am wondering why there are different:

[coeff1,score1]=pcaFromStatToolbox(ran)
[coeff2,score2]=pca(x)

so lets have an example:

>> pcaData=rand(4,5)
pcaData =
      0.4638    0.7937    0.6250    0.1400    0.4149
      0.7046    0.5080    0.3831    0.8778    0.0977
      0.0153    0.8616    0.8466    0.7827    0.0962
      0.5929    0.9365    0.0800    0.0978    0.8779

----

>> [n,nn]=pcaFromStatToolbox(pcaData)
n =
     -0.2633    0.6508   -0.4266
     -0.1495   -0.3995    0.3599
      0.4276   -0.4884   -0.4736
      0.6094    0.4031    0.5820
     -0.5951   -0.1256    0.3542
nn =
     -0.1772   -0.2040   -0.2480
      0.3372    0.5222   -0.0219
      0.6069   -0.3321    0.1240
     -0.7668    0.0139    0.1458
------
>> [m,mm]=pca(pcaData)
netlab pca: using eig
netlab pca: sorting evec
m =
      0.3671
      0.1416
      0.0329
      0.0000
      0.0000
mm =
      0.2633   -0.6508   -0.4266   -0.1072    0.5601
      0.1495    0.3995    0.3599    0.3753    0.7400
     -0.4276    0.4884   -0.4736   -0.5079    0.3106
     -0.6094   -0.4031    0.5820   -0.2916    0.2056
      0.5951    0.1256    0.3542   -0.7104         0

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Amir 2016년 2월 23일

MATLAB Online에서 열기

Good call! I noticed the other pca is from a tool that was in my path

Estimation_of_Distribution_Algorithms/BNT/KPMstats/pca.m

and it says:

function [PCcoeff, PCvec] = pca(data, N)
%PCA  Principal Components Analysis
%
%  Description
%   PCCOEFF = PCA(DATA) computes the eigenvalues of the covariance
%  matrix of the dataset DATA and returns them as PCCOEFF.  These
%  coefficients give the variance of DATA along the corresponding
%  principal components.
%
%  PCCOEFF = PCA(DATA, N) returns the largest N eigenvalues.
%
%  [PCCOEFF, PCVEC] = PCA(DATA) returns the principal components as well
%  as the coefficients.  This is considerably more computationally
%  demanding than just computing the eigenvalues.
%
%  See also
%  EIGDEC, GTMINIT, PPCA
%
%  Copyright (c) Ian T Nabney (1996-2001)

the cyclist 2016년 2월 23일

편집: the cyclist 2016년 2월 23일

I just did a quick search KPMstats and MATLAB. I found this annotation:

"KPMstats is a directory of miscellaneous statistics functions written by Kevin Patrick Murphy and various other people (see individual file headers)."

Personally, I would need to dig in to get more confidence in Murphy et al. (who are surely fine fellows). I have a fair amount of experience with the MATLAB pca, and I am very confident in its output.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

the cyclist 2016년 2월 23일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/269483-difference-between-pca-and-pcafromstattoolbox#answer_210858

편집: the cyclist 2016년 2월 23일

Even without knowing the source of the other function, I can make a guess.

Notice that for the input, you have 4 observations (4 rows) of 5 variables. So, you can fully explain 100% of the variation with just 4 principal components. Furthermore, because MATLAB centers the variables, you can do it with 3 principal components.

Notice that MATLAB outputs 3 principal component coefficients, where your other software outputs 5 vectors. That other software It is clearly making a different assumption in the case where you only actually need 3 to fully span the space. My guess is that the 4th and 5th vectors (the ones that are different from MATLAB) are linear combinations of the first 3.