PCA operation and its inverse operation on a dataset

조회 수: 8 (최근 30일)
GEEVARGHESE TITUS
GEEVARGHESE TITUS 2017년 2월 25일
답변: Paras Gupta 2024년 7월 19일
Was trying the PCA function based on the example in matlab help
load hald % The ingredients data has 13 observations for 4 variables.
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have a few doubts 1. The observation do we need to pre-process the raw data or can we use it as such? 2. Based on the code, we are doing dimensionality reduction, then how will we be able to get the data with the original structure back(error will be introduced). That is the original data is 13x4 and the coeff size is 4x4. What else are needed by the decoder?
[coeff,score,latent,tsquared,explained,mu] = pca(ingredients)

답변 (1개)

Paras Gupta
Paras Gupta 2024년 7월 19일
Hello,
It is generally a good practice to pre-process the raw data. Common pre-processing steps include:
If you do do not want to remove missing entries from your data, you can use the Alternating least squares (ALS) algorithm for PCA in matlab which better handles missing values. You can refer the folllowing link on selecting the algorithm for PCA - https://www.mathworks.com/help/stats/pca.html#bth9ibe-Algorithm
[coeff,score,latent,tsquared,explained] = pca(ingredients,'algorithm','als');
When you perform PCA, you are transforming your data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they explain in the data. This allows you to reduce the dimensionality by keeping only the first few principal components.
To reconstruct the data back to its original structure, you can use the principal component scores 'score' and the principal component coefficients 'coeff'. However, if you reduce the dimensionality, some information will be lost, introducing reconstruction error.
The following code shows how reonstruction of the original data can be done:
[coeff, score, latent, tsquared, explained, mu] = pca(ingredients);
% Select the number of principal components to keep
numComponentsToKeep = 2;
% Reduce dimensionality
reducedScore = score(:, 1:numComponentsToKeep);
% Reconstruct the data (rank-k approximation, where k is numComponentsToKeep)
reconstructedData = reducedScore * coeff(:, 1:numComponentsToKeep)' + repmat(mu1,size(ingredients,1),1);
You can also refer to the documentation on "pca" function for more information on the code above - https://www.mathworks.com/help/stats/pca.html
Hope this helps.

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by