PCA scaling and centering documentation wrong?

조회 수: 15 (최근 30일)
Ari Paul
Ari Paul 2015년 3월 24일
편집: the cyclist 2022년 8월 8일
The pca() documentation says that the raw data is automatically centered at the start of the process. If true, then pca(X) should be equal to pca(Y), where Y = centered data. But they're not (specific data below). Additionally, when I use either eig() or svd() to compute the principal components, I can only get them to match the pca output when I first manually center the data before using pca(). Ultimately my question is simply how do I correctly calculate the principal components of raw data? I.e. do I need to manually center and scale it first? Only manually center? Only manually scale?
Sample data: X =
1.0000 -3.0000 -1.0000; 2.0000 -2.0000 -0.5000; 3.0000 -0.5000 0.2500; 4.0000 2.0000 1.0000; 5.0000 5.0000 2.5000;
Centering X -> Y= -2.0000 -3.3000 -1.4500; -1.0000 -2.3000 -0.9500; 0 -0.8000 -0.2000; 1.0000 1.7000 0.5500; 2.0000 4.7000 2.0500;
pca(X) = -0.7360 -0.6037 -0.3062; -0.6688 0.7186 0.1907; -0.1049 -0.3452 0.9327;
pca(Y) =
0.4058 0.8414 0.3569
0.9124 -0.3960 -0.1036
0.0542 0.3676 -0.9284
svd(Y) = 0.4058 0.9124 0.0542; 0.8414 -0.3960 0.3676; 0.3569 -0.1036 -0.9284;
eig(cov(Y)) = 0.0542 0.9124 0.4058; 0.3676 -0.3960 0.8414; -0.9284 -0.1036 0.3569; ^this is the same output just in a different order.

답변 (2개)

Sagar
Sagar 2015년 8월 9일
You got it little wrong. When you do PCA(Y), by default, PCA again centers the data. So if you want to get the same values as PCA(X), use 'centered', 'off' name-value pair option: PCA_of_Y = PCA (Y, 'centered', 'off'); Now it will definitely be equal to PCA(X).

the cyclist
the cyclist 2019년 6월 26일
편집: the cyclist 2022년 8월 8일
Answering a gazillion years after-the-fact, because I just turned this up in my own search.
X = [1.0000 -3.0000 -1.0000;
2.0000 -2.0000 -0.5000;
3.0000 -0.5000 0.2500;
4.0000 2.0000 1.0000;
5.0000 5.0000 2.5000];
Y = X - mean(X);
pca(X)
ans = 3×3
0.4058 0.9124 -0.0542 0.8414 -0.3960 -0.3676 0.3569 -0.1036 0.9284
pca(Y)
ans = 3×3
0.4058 0.9124 -0.0542 0.8414 -0.3960 -0.3676 0.3569 -0.1036 0.9284
both give the same PCA results (as of when I answered this).
So, either something got fixed, or you made a mistake.

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by