Parallel rank calculation for sparse matrices -- suggestions?

조회 수: 3 (최근 30일)
Barry
Barry 2022년 8월 30일
댓글: Bruno Luong 2022년 9월 6일
I need to calculate the rank of large (> 1 terabyte of non-zero elements) sparse matrices with MATLAB. Exploring the Parallel Toolbox, but can't seem to find anything that convinces me what is offers will be helpful. If I'm wrong, can someone point me in the right direction? Ideally, I'd just want to take my existing code that has "r=sprank(A)" in it and have that library call run in parallel, perhaps with some additional annotation as needed. Would using a GPUARRAY help here, for example? Doesn't seem like it, but perhaps I'm wrong. Hoping someone here can help me out. Thanks!
Barry Fagin
Professor of Computer Science
barry.fagin@afacademy.af.edu
  댓글 수: 9
Matt J
Matt J 2022년 9월 5일
A = sparse(tall,skinny)
How tall? How skinny? What currently rules out the approach already discussed below of using rank(full(A.'*A))?
Bruno Luong
Bruno Luong 2022년 9월 6일
"After doing a little digging, I now believe sprank() merely counts the number of non-zero columns. This is not what I require."
It does something more intelligent than that it match the row to column through Dulmange Mendelsohn permutation. In your case might be all the non-zero column can be matched. But that is exactly what structural rank means.

댓글을 달려면 로그인하십시오.

채택된 답변

Bruno Luong
Bruno Luong 2022년 8월 30일
이동: Bruno Luong 2022년 8월 30일
Something I don't get : sprank doc tells when you run witth threadpool it runs in parallel.
  댓글 수: 5
Bruno Luong
Bruno Luong 2022년 8월 31일
IMO the doc should clarify this.
But then I still have some doubt. If user runs multiple sprank in parallel, it will take less worker to work on one sprank. So at the end do we really speed up the whole thing?
Edric Ellis
Edric Ellis 2022년 9월 1일
Yes, the "extended capability" description could probably do with some refinement to make it clear exactly what works. (Today, there are a good number of MATLAB functions that cannot be run on a thread pool worker, and this "extended capability" really means simply that the function can run on a thread pool worker).
The performance situation is ... complicated, unfortunately. There is no single simple answer as to whether running N copies of a given function concurrently on workers is faster than running N copies sequentially on the client. It depends on all sorts of details of the implementation of that function - but primarily whether the function is already intrinsically multithreaded by MATLAB itself.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Matt J
Matt J 2022년 8월 30일
Is the matrix square or is it tall and thin? If the later, then it may be an easier computation to compute rank(A.'*A).
  댓글 수: 9
Barry
Barry 2022년 9월 5일
편집: Barry 2022년 9월 5일
I like the A.'*A trick. Or one could ...wait for it ...
... use MATLAB's rref() command, which works on sparse matrices, and then count the nonzero rows. Sigh.
But that in turn begs the question of roundoff errors. Time to move to a new question, which I'll post shortly.
--BF
Matt J
Matt J 2022년 9월 5일
편집: Matt J 2022년 9월 5일
I would take caution before relying on rref. It has been often found to lack robustness, see e.g.,

댓글을 달려면 로그인하십시오.


Bruno Luong
Bruno Luong 2022년 9월 5일
For thin and tall sparse matrix A of size (m x n), m>>n and n in the order of 1000s, it might be possible to compute the rank using q-less qr, which is better than rank(A'*A) which has the drawback of square the condition number.
A=sprand(100000,10,0.1)*sprand(10,100,0.1);
R=qr(A,0);
rank(full(R))
ans = 10

카테고리

Help CenterFile Exchange에서 Linear Algebra에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by