Hi all, I need to diagonalize a lot of matrices. The problem is similar to:
A = rand(5000, 5000, 500); %this snipped is just a demo. It is real logic in the program
EVs = zeros(5000, 500);
for idx = 1:500
EVs(:, idx) = eig(A(:,:,idx));
end
This is fine on CPUs and easily scalable with parfor and MDCS. As eig is faster on GPUs I tried this
A = rand(5000, 5000, 500); %this snipped is just a demo. It is real logic in the program
EVs = zeros(5000, 500, 'gpuArray');
for idx = 1:500
B = gpuArray(A(:, :, idx));
EVs(:, idx) = eig(B);
end
EVs = gather(EVs);
This does not lead to a much better performance. Is there a way to get around the gpuArray statement in each loop? Some kind of pagefun with eig would be the solution I guess. (unfortunately, eig is not supported by pagefun)
Best wishes Niklas

댓글 수: 1

Matt J
Matt J 2019년 7월 16일
Birk Andreas's comment moved here:
Please Mathworks, implement eig for use with pagefun as soon as possible!!!

댓글을 달려면 로그인하십시오.

 채택된 답변

Matt J
Matt J 2018년 10월 16일
편집: Matt J 2018년 10월 16일

0 개 추천

You need to build A directly on the GPU, for example,
EVs = zeros(5000, 500, 'gpuArray');
A=gpuArray.rand(5000,5000,500);
for idx = 1:500
B = A(:, :, idx);
EVs(:, idx) = eig(B);
end
EVs = gather(EVs);
For the case of your real A, you have to examine what operations you are currently using to build A on the host, and which of those operations would not also be available on the GPU.

댓글 수: 3

Niklas
Niklas 2018년 10월 16일
편집: Niklas 2018년 10월 16일
Hi Matt,
thanks, I guess that does the trick by avoiding unnecessary communication with the GPU. Unfortunately, the speedup is not as big as I expected:
Elapsed time is 118.457193 seconds. <- CPU time without parfor
Elapsed time is 107.845443 seconds. <- GPU on GeForce 1080Ti with 12GB
Used code:
mSize = 1000;
runs = 200;
A = rand(mSize, mSize, runs);
tic
EVC = zeros(mSize, runs);
for idx = 1:runs
EVC(:, idx) = eig(A(:,:,idx));
end
toc
tic
EVG = zeros(mSize, runs, 'gpuArray');
A = gpuArray(A);
for idx = 1:runs
G = A(:,:,idx);
EVG(:, idx) = eig(G);
end
EVG = gather(EVG);
toc
Niklas
Niklas 2018년 10월 16일
편집: Niklas 2018년 10월 16일
Using bigger matrices the speedup is higher.
Elapsed time is 1006.431223 seconds. <- CPU
Elapsed time is 295.168202 seconds. <- GPU
Unfortunately, nvidia-smi shows a low usage of the GPU. Maybe I will write a CUDA snipped to deal with it.
Matt J
Matt J 2018년 10월 16일
Yeah, I can't see that there would be a lot of parallelism in eigenvalue computation.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 GPU Computing에 대해 자세히 알아보기

제품

릴리스

R2018a

질문:

2018년 10월 16일

댓글:

2019년 7월 16일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by