Hello everyone. I have generated a code in which I use a Gaussian correlation kernel to generate 1000 realizations of a stochastic process and then, perform PCA over the resulting process. The result is a matrix of 501*1000.
However, when I perform the PCA over this matrix, the results contradict the help at https://la.mathworks.com/help/stats/pca.html
In the info it says that if one inrtoduces a n*p matrix, coeff will be a p*p matrix and score an n*p. Here, I get different results, coeff is a p*n matrix and score a p*p; the weird thing is that the process is reconstructed propperly. Can anyone tell me what is happening?
Thanks.
Additionally, reading theory, coeffs should be standard normal random variables; if I plot the histograms, the resulting variables are normal but not standard. If someone could tell me why these are not standard I would be very thankfull.
The code in question:
close all
clear
clc
[X,Y] = meshgrid(0:0.002:1,0:0.002:1);
Z=exp((-1)*abs(X-Y));
tam=size(X, 1);
number_realizations=1000;
realizacion_mat=zeros(tam, number_realizations);
cov_mat=cov(Z);
[evec_mal, evalM_mal]=eig(cov_mat);
eval_mal=eig(evalM_mal);
num_eval=size(eval_mal,1);
for i=1:num_eval
eval(i)=eval_mal(num_eval-i+1);
evec(:,i)=evec_mal(:,num_eval-i+1);
end
figure
hold on
for j=1:number_realizations
realizacion=zeros(tam, 1);
for i=1:tam
v_a = normrnd(0,1);
realizacion=realizacion+sqrt(eval(i))*evec(:,i)*v_a;
end
realizacion_mat(:,j)=realizacion;
plot(realizacion)
clear('realizacion')
end
[coeff,score,latent,tsquared,explained,mu] = pca(realizacion_mat,'Centered',false);
reconstruction_process=score*coeff';
diference=reconstruction_process-realizacion_mat;
figure
plot(diference)
for i=1:5
figure
histogram(coeff(:,i), 20)
end

 채택된 답변

Jon
Jon 2019년 7월 9일
편집: Jon 2019년 7월 9일

1 개 추천

The first argument to pca should be n by p, where n is the number of observations. You are supplying it with a p by n matrix. As a result the arguments that are returned are not dimensioned as you expect. I do not see anything in the MATLAB documentation that discusses the distribution (standard normal) of the coefficients. Maybe this is something specific to your application. In any case, if you supply pca with an array, where each row is an observation, then you will be off to a good start.
I also suggest that in your code, you do not use the variable name eval, for eigenvalues. eval is a MATLAB function that evaluates an expression. You did not get any error message as MATLAB assumes you want to use eval as a variable name rather than as a function. It is at the least confusing to read the code if you know what the eval function does, and also if at some point further you actually wanted to use eval as a function you would have problems.

댓글 수: 5

Jaime  de la Mota
Jaime de la Mota 2019년 7월 9일
First of all, thank you very much for your answer. I will change the name of the eigenvalues to anything else.
However, I think that I am not understanding very well the message. The help says as follows: "Rows of X correspond to observations and columns correspond to variables". In my case, there are 501 rows and 1000 columns. I have 501 observations since there are 501 points in X and 501 points in Y. The number of random variables is equal to the number of realizations, in this case 1000; therefore, n=501 and p=1000; that is not my problem. My problem is that the help says "The coefficient matrix is p-by-p", which is not in my application, there is a 1000-by-501 or p-by-n. Also "Principal component scores are the representations of X in the principal component space" and in my case score is a 501-by501 not a 501-by-1000.
I thank you again for your answer, but I am afraid it is not what I am looking for.
Jon
Jon 2019년 7월 9일
편집: Jon 2019년 7월 9일
You need to have more observations than variables for the pca analysis to work as intended.In your case (as you describe it) you have 1000 variables and only 501 observations. So you have less observations than variables.
If you replace the first line of code in your snippet with, for example the following, then you will have n = 2001 observations and p = 1000 variables, so that n>=p and the dimensions of the output arguments will be as described in the MATLAB documentation.
[X,Y] = meshgrid(0:0.0005:1,0:0.0005:1);
I noticed that when I increased the grid resolution, so that n>p as in the above, this resulted in complex values in the "realizacion_mat". I'm not sure if this is a problem for you, maybe there are some other issues you need to check too, but, you should have n>=p
Indeed, With only 500 realizations the size of the matrices is the expected one. Thanks for your answer. However, if it is not much to ask, I would like to inquire a second thing. When I generated the original process, I did it as:
for j=1:number_realizations
realizacion=zeros(tam, 1);
for i=1:tam
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
v_a = normrnd(0,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
realizacion=realizacion+sqrt(eval(i))*evec(:,i)*v_a;
end
realizacion_mat(:,j)=realizacion
clear('realizacion')
end
So I would like to know why when I perform the KL expansion over
realizacion_mat
I don't obtain normrnd(0,1) variables when I plot the histograms of the coeffs.
Aditionally, this is the document that I am following https://arxiv.org/pdf/1509.07526.pdf
In page 10, just before section 6, it defines the random variables as standard.
Thanks again.
Jon
Jon 2019년 7월 9일
Hi I'm not familiar with the theoretical background for your problem, and have not used principle components analysis in this particular context, so I do not have an immediate answer regarding why they are not standard normal variables. I'm sorry, I do not have time to dig deeper, but I would guess that there is a scaling factor somewhere that is not consistent between the two implementations (MATLAB pca, and the reference that you are working from).
Jaime  de la Mota
Jaime de la Mota 2019년 7월 10일
Don't worry.
You have helped me enough.
Thanks again.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

질문:

2019년 7월 9일

댓글:

2019년 7월 10일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by