Many small Eigenvalue Decompositions in parallel on GPU?

Question

ervinshiznit 2015년 8월 16일

2
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234052-many-small-eigenvalue-decompositions-in-parallel-on-gpu

댓글: kunx 2022년 1월 22일

I have some code that involves a couple billion 3x3 and 4x4 eigenvalue decompositions. I have run this code with parfors on the CPU and the runtime is just barely bearable, but I'd really like to speed this up.

I have a GTX 780 available. I realize that a GPU is generally better suited for large matrix operations than a large number of small matrix operations. I looked at pagefun, which looks like the best way that Matlab has to run many small matrix operations in parallel. However, the functions available for pagefun are all element by element operations, with a few exceptions such as mtimes, rdivide, and ldivide. Unfortunately eig is not one of those functions.

Is there any other way to run this code on the GPU?

댓글 수: 2
없음 표시없음 숨기기

Matt J 2015년 8월 16일

편집: Matt J 2015년 8월 16일

MATLAB Online에서 열기

Are you sure you mean "several thousand"? My old machine from 2008 can do 10000 such decompositions without breaking a sweat,

>> tic; for i=1:10000, eig(rand(4)); end; toc
Elapsed time is 0.196188 seconds.

ervinshiznit 2015년 8월 16일

Oops. I just said "several thousand" without actually looking at how many times I'm calling eig. Looking at it, it's actually 2,200,570,000 calls to eig.

I'll edit the original post

Of course this code involves other calculations as well which contribute to the runtime, but the eig is the slowest portion.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Brian Neiswander 2015년 8월 18일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234052-many-small-eigenvalue-decompositions-in-parallel-on-gpu#answer_189663

MATLAB Online에서 열기

The "pagefun" function does not currently support the function "eig". However, note that the "eig" function will accept GPU arrays generated with the "gpuArray" function:

X = rand(1e3,1e3);
G = gpuArray(X);
Y = eig(G);

Depending on your data, this can be faster than the non-GPU approach but it is not parallelized across the pages.

It is possible to implement your own CUDA kernel using the CUDAKernel object or MEX functions. This allows for you to process custom functions using a distribution scheme of your choice. See the links below for more information:

http://www.mathworks.com/help/distcomp/parallel.gpu.cudakernel.html

http://www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu.html#bsic5ih-1

댓글 수: 2
없음 표시없음 숨기기

ervinshiznit 2015년 8월 19일

I already tried gpuArray. It's far too slow, the transfer times to and from the GPU kill me. It does provide a speedup for larger matrices, but not 3x3 or 4x4.

CUDA kernels will not work for me because that's a lot of development time that I do not have. Looks like I'm just stuck with the runtimes.

Birk Andreas 2019년 7월 16일

So, its already 2019 and there are already some MAGMA eigenvalue functions implemented. However, still no eig for pagefun...

What prevents the progress?

Could you give an estimate, when it will be implemented?

It would really be very welcome!

댓글을 달려면 로그인하십시오.

Answer 2

Joss Knight 2015년 8월 21일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234052-many-small-eigenvalue-decompositions-in-parallel-on-gpu#answer_189981

편집: Joss Knight 2015년 8월 21일

MATLAB Online에서 열기

Have you tried just concatenating your matrices in block-diagonal form and calling eig? You may then be limited by memory, but the eigenvalues and vectors of a block-diagonal system are just the union of the eigenvalues and vectors of the blocks:

N = 1000;
A = rand(3,3,N);
maskCell = mat2cell(ones(3,3,N),3,3,ones(N,1));
mask = logical(blkdiag(maskCell{:}));
Ablk = gpuArray.zeros(3*[N,N]);
Ablk(mask) = A(:);
[Vblk,Dblk] = eig(gpuArray(Ablk));
V = reshape(Vblk(mask), [3 3 N]);
D = reshape(Dblk(mask), [3 3 N]);

You should then find that A(:,:,i)*V(:,:,i) == V(:,:,i)*D(:,:,i) as required. Because of the way eigendecomposition works, I would expect the extra unnecessary zeros not to affect the performance much, the system should converge straightforwardly and parallelize well.

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Joss Knight 2015년 8월 24일

Also, I see that the GTX 780 has a terrible double-precision performance of 166 GFlops versus 3977 for single precision. Try running your code in single precision.

kunx 2022년 1월 22일

thank you. your direction is very helpful.

댓글을 달려면 로그인하십시오.

Answer 3

James Tursa 2015년 8월 20일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/234052-many-small-eigenvalue-decompositions-in-parallel-on-gpu#answer_189923

If you just need the eigenvalues, you might look at this FEX submission by Bruno Luong:

http://www.mathworks.com/matlabcentral/fileexchange/27680-multiple-eigen-values-for-2x2-and-3x3-matrices

Maybe you can expand it for 4x4 as well.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

ervinshiznit 2015년 8월 21일

I know, but like I said in a comment to Brian's answer, transfer times of 3x3 and 4x4 matrices to the GPU kill me. I was saying that maybe I should do an explicit formula on the CPU, not the GPU. But your answer of doing a block diagonal matrix might work out.

Joss Knight 2015년 8월 24일

편집: Joss Knight 2015년 8월 24일

Why do you need to transfer 3x3 and 4x4 matrices to the GPU independently? Just transfer it all as one 3D array. You have to anyway to use pagefun.

댓글을 달려면 로그인하십시오.

Many small Eigenvalue Decompositions in parallel on GPU?

댓글 수: 2
없음 표시없음 숨기기

답변 (3개)

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Many small Eigenvalue Decompositions in parallel on GPU?

댓글 수: 2 없음 표시없음 숨기기

답변 (3개)

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기