Explained variance for a dataset containing quantitative and qualitative data

조회 수: 11 (최근 30일)
Hi everbody,
I'm working on datasets containing both quantitative and qualitative data. Given a subset of data, I'm trying to determine the explained variance with regard to the original mixed dataset. I understand that in case of numerical data I could use:
[~,~,~,~,explained] = pca(X(:,3:15));
explained
However I'm bound to using mixed data. The subset of the original dataset is provided to me.
Is there any obvious solution I'm missing here? I might just be lacking expertise.
Thanks in advance!

답변 (1개)

Vijeta
Vijeta 2023년 5월 2일
Hi Banjamin,
When dealing with mixed data, you can use a technique called Multiple Correspondence Analysis (MCA) instead of PCA to analyze the data. MCA is a multivariate statistical technique that can handle mixed datasets consisting of both quantitative and qualitative variables. MCA is based on the calculation of a similarity matrix between the different categories of the qualitative variables, which is then used to calculate the principal components.
We can normalize the quantitative data using standardization, and perform MCA on the qualitative data using the pca function in MATLAB. We then combine the MCA and quantitative data into X_mca_quant, and perform PCA on the combined data using the pca function in MATLAB. Finally, we display the explained variance using the explained variable.
Note that in this example, we assume that the qualitative variables are categorical and do not have a natural ordering. If your qualitative variables have a natural ordering, you may need to convert them to numerical values before performing MCA.
  댓글 수: 1
Benjamin Lender
Benjamin Lender 2023년 5월 4일
Hallo Vijeta,
thank you for your answer! I'm currently using FAMD, which brings together MCA and PCA, to reduce the data. However, I'm transfering the results back to original variables, rather than using the new dimension. This is because of constraints of the application case.
Now at this point, I cannot use the "explained variance" feature anymore and am stuck with a subset of my original data, trying to determine which portion of original variance of the mixed dataset is explained by the subset.
Can you help me out here?
Thanks!!
Ben

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by