PCA operation and its inverse operation on a dataset

Question

GEEVARGHESE TITUS 2017년 2월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/326839-pca-operation-and-its-inverse-operation-on-a-dataset

답변: Paras Gupta 2024년 7월 19일

Was trying the PCA function based on the example in matlab help

load hald % The ingredients data has 13 observations for 4 variables.
coeff = pca(ingredients)
coeff =
     -0.0678   -0.6460    0.5673    0.5062
     -0.6785   -0.0200   -0.5440    0.4933
      0.0290    0.7553    0.4036    0.5156
      0.7309   -0.1085   -0.4684    0.4844

I have a few doubts 1. The observation do we need to pre-process the raw data or can we use it as such? 2. Based on the code, we are doing dimensionality reduction, then how will we be able to get the data with the original structure back(error will be introduced). That is the original data is 13x4 and the coeff size is 4x4. What else are needed by the decoder?

[coeff,score,latent,tsquared,explained,mu] = pca(ingredients)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Paras Gupta 2024년 7월 19일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/326839-pca-operation-and-its-inverse-operation-on-a-dataset#answer_1487846

MATLAB Online에서 열기

Hello,

It is generally a good practice to pre-process the raw data. Common pre-processing steps include:

Normalization/Standardization: PCA is sensitive to the scales of the data. Standardizing the data ensures that each feature contributes equally to the analysis. You can refer the documentation on the "zscore" function - https://www.mathworks.com/help/stats/zscore.html
Handling Missing Values: If your data has missing values, you may need to handle them by removing incomplete records. You can refer the documentation on the "rmmissing" function - https://www.mathworks.com/help/matlab/ref/rmmissing.html

If you do do not want to remove missing entries from your data, you can use the Alternating least squares (ALS) algorithm for PCA in matlab which better handles missing values. You can refer the folllowing link on selecting the algorithm for PCA - https://www.mathworks.com/help/stats/pca.html#bth9ibe-Algorithm

[coeff,score,latent,tsquared,explained] = pca(ingredients,'algorithm','als');

When you perform PCA, you are transforming your data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they explain in the data. This allows you to reduce the dimensionality by keeping only the first few principal components.

To reconstruct the data back to its original structure, you can use the principal component scores 'score' and the principal component coefficients 'coeff'. However, if you reduce the dimensionality, some information will be lost, introducing reconstruction error.

The following code shows how reonstruction of the original data can be done:

[coeff, score, latent, tsquared, explained, mu] = pca(ingredients);
% Select the number of principal components to keep
numComponentsToKeep = 2;
% Reduce dimensionality
reducedScore = score(:, 1:numComponentsToKeep);
% Reconstruct the data (rank-k approximation, where k is numComponentsToKeep)
reconstructedData = reducedScore * coeff(:, 1:numComponentsToKeep)' + repmat(mu1,size(ingredients,1),1);

You can also refer to the documentation on "pca" function for more information on the code above - https://www.mathworks.com/help/stats/pca.html

Hope this helps.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

PCA operation and its inverse operation on a dataset

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

PCA operation and its inverse operation on a dataset

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기