How can I increase the calculation speed in using eig()?

Hi, here is a problem. I want to do a POD or PCA by matlab. But I don't want to use the inner function like princomp(), because it has some limitations. So I will try it by myself. In this process, I need to get the eigenvectors and eigenvalues of a covariance matrix which is a symmetric matrix. As this covariance matrix is huge(78505*78505), it takes me a long time to calculate it by eig(). Now 24 hours have passed away, it hasn't been finished. The use of cpu and memory can be seen in the figure. So I want to know whether there is another way to accelerate such calculation or change to another method to get the eigenvectors and eigenvalues quickly. Thank you very much!

댓글 수: 4

What other properties does the matrix have? For example is it positive definite? Is it sparse? Is it tridiagonal?
This covariance matrix is symmetrical and positive definite. It's not sparse or tridiagonal.
Matt J
Matt J 2015년 12월 31일
편집: Matt J 2015년 12월 31일
It's not sparse or tridiagonal.
It strains credulity that you can't approximate some/many entries of the co-variance matrix as zero. Where do 78505 variables, all so tightly correlated with each other, come from?
In any case, it seems unwise to attack a problem in such a brute force manner. Even though you clearly have a powerful computer with above average RAM, you are consuming much of that memory just holding the covariance matrix, with too little memory left over to do any processing.
The only likely path to useful advice from us is explaining more how this specific covariance matrix was derived. Then we can all start thinking about how the particulars of that process can be exploited.
Considering the last comments, I need to introduce the problem more clearly. I've done a numerical simulation on a fluid field. I get the pressure field under every moment. In this pressure field, I have got pressure values in 78505 position. This is to say I have 78505 pressure values under every moment. Totally I have 120 moments. So I want to reduce the dimension from 120 to 10. Then I use the matrix (120*78505) to get the covariance matrix (78505*78505) by multiplying the matrix (120*78505) with its transposition. So this is how the covariance matrix comes from.

댓글을 달려면 로그인하십시오.

 채택된 답변

Matt J
Matt J 2016년 1월 2일
편집: Matt J 2016년 1월 2일
Then I use the matrix (120*78505) to get the covariance matrix (78505*78505) by multiplying the matrix (120*78505) with its transposition. So this is how the covariance matrix comes from.
If the 120 x 78505 matrix is A and the covariance matrix is C=A.'*A, then C only has 120 potentially non-zero eigenvalues. You can obtain the non-trivial eigenvalues and the corresponding eignvectors from the economy-SVD of A.' as follows,
[U,S]=svd(A.',0);
The eigenvalues are given by
eigenvalues= diag(S).^2;
and the corresponding eigenvectors are the columns of U. Naturally, this is a much simpler and faster computation than what you were attempting brute force.

댓글 수: 6

Thank you for your help. I've learned the related information about svd which is very helpful to me. svd is another effective way to conduct the pca which costs less time.
But later I found that, in svd, the cpu use is only 3% percent and only a single thread has been used. However, this function can also be fulfilled by eigs(A,120) which would use 50% percent of CPU and maybe much faster.
Matt J
Matt J 2016년 1월 4일
편집: Matt J 2016년 1월 4일
In my post, A was a 120x78505 non-square matrix. So, eigs would not apply to that A. The svd method that I showed you runs in 2 sec. on my machine, which I think is a lot less powerful than yours.
>> A=rand(120,78505);
>> tic; [U,S]=svd(A.',0); toc
Elapsed time is 2.333444 seconds.
Do you really need more speed?
If you meant that A was your covariance matrix (I was calling that C), then I doubt eigs could offer any real benefit, firstly because C is not sparse according to you and secondly because calculating C in the first place requires a lot of time and RAM. Note that my method avoids explicit construction of the covariance matrix entirely.
Here is a misunderstanding. I need to conduct the row reduction not the column reduction. So it should be [u,s,v]=svd(A) not [u,s]=svd(A.'). Then I conduct another matrix A (120*96455) which costs me nearly 30 minutes to get it done by svd. However it's much quicker than eig().
Try using
[u,s,v] = svd(A, 'econ')
This will only compute the first 120 singular vectors in u and v. Otherwise, v is of size 96455x96455 instead of the 96455x120 you actually need.
Matt J
Matt J 2016년 1월 5일
편집: Matt J 2016년 1월 5일
I need to conduct the row reduction not the column reduction. So it should be [u,s,v]=svd(A) not [u,s]=svd(A.').
There's no salient difference. If V,S,U is the decomposition of A, then the decomposition of A.' is U,S,V. In both cases, the eigenvectors of the covariance matrix are the columns of U (and V is not needed).

댓글을 달려면 로그인하십시오.

추가 답변 (3개)

John D'Errico
John D'Errico 2015년 12월 31일
help eigs

댓글 수: 1

eigs() can only get several primary eigenvectors and eigenvalues (6 in default), but I want to get all the eigenvectors and eigenvalues. So eig() has to be used.

댓글을 달려면 로그인하십시오.

Walter Roberson
Walter Roberson 2015년 12월 31일

0 개 추천

I do not know the efficiency order, but in that particular case of symmetric positive definite then you can use Cholesky decomposition (or possibly QR)
Ah, apparently in theory the Cholesky decomposition can be done in order 2/3 * N^3; http://modis-atmos.gsfc.nasa.gov/new_reference/data/papers/Stamnes_et_al_A.1988.pdf
For your matrix that would still be over order 3*10^14 operations and that is Just Plain Going To Take Time.

댓글 수: 1

Thank you for your answer. I think it's a huge task for me to understand it as I'm not major in maths. However, I'll try my best to get to know it.

댓글을 달려면 로그인하십시오.

Christine Tobler
Christine Tobler 2016년 1월 1일

0 개 추천

If you need all the eigenvalues and all the eigenvectors, there's not faster way than eig, I'm afraid. Here's an older thread about using the distributed computing toolbox for a large dense eigenvalue problem, if this is available to you:
By the way, do you need all eigenvectors, or would all eigenvalues, with the option to compute selected eigenvectors later, be sufficient?

댓글 수: 2

Thank you for your help. In fact, I only need all eigenvalues with several primary eigenvectors corresponding to several largest eigenvalues. This is to say, I need all the eigenvalues but not all the eigenvectors. Considering this situation, is there any method to accelerate the calculation?
Use the parallel computation toolbox. This needs a processor with parallel cores.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Linear Algebra에 대해 자세히 알아보기

제품

질문:

2015년 12월 31일

댓글:

2025년 2월 4일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by