GPU backslash performance much slower than CPU

Meme Young

2020 12월 27

1 답변

조회 수: 5 (30일)

1 개 추천

test Gpu backslash (2).zip

I am doing numerical power flow caclulation by modifying the functions of matpower, an open source toolbox. By modifying its function newtonpf.m, GPU computation can be implemented. However, I found that GPU performance is much much slower than CPU. When calculating the built-in case3012wp of matpower, the matrix in newtonpf.m will be :

A: 5725 * 5725 sparse double, b: 5725 * 1 double.

The process of A \ b in the 1st iteration of newtonpf() will generally take around 0.01 sec on my i7-10750H + RTX 2070super MSI-GL65.

But if A and b are changed into GPU arrays, the process of A \ b will take the following time if A is the following types:

full double, 0.8 sec

sparse double, 4 sec

full single, 0.1 sec

(sparse single is not supported)

So why is the diference in performance? I thought GPU could do things much faster than CPU.

Files are attached as follows. Atest is sparse and Agpu is a sparse gpu array. All are doubles.

댓글 수: 9
이전 댓글 7개 표시 이전 댓글 7개 숨기기

kant 2022년 5월 26일

I also have this problem for my matlab code? Has the problem been solved？

Matt J 2022년 5월 26일

편집: Matt J 2022년 5월 26일

@kant It has been concluded that this is expected behavior, but see below.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

답변 (1개)

Matt J 2020년 12월 27일

0 개 추천

This thread looks relevant. It appears that sparse mldivide on the GPU is not expected to be faster.

https://www.mathworks.com/matlabcentral/answers/500526-solution-of-large-sparse-matrix-systems-using-gpu-mldivide

댓글 수: 13
이전 댓글 11개 표시 이전 댓글 11개 숨기기

Meme Young 2020년 12월 30일

What do you mean sparse solver algorithm Mr Knight? like pcg()? I have tried it is not as efficient as this way: reordering using amd(), LU decomp, and two backslashes based on the decomp, especially when coping with the type of sparse matrix that I uploaded

Joss Knight 2021년 1월 10일

편집: Joss Knight 2021년 1월 10일

Yes, PCG, GMRES, CGS, LSQR, QMR, TFQMR, BICG, BICGSTAB. Try them all, play with tolerance, iterations and preconditioning - something is likely to work. I'm not an expert in this field but this is what the sparse community tend to do.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

카테고리

도움말 센터 및 File Exchange에서 Linear Algebra에 대해 자세히 알아보기

제품

Parallel Computing Toolbox

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

GPU backslash performance much slower than CPU

댓글 수: 9 이전 댓글 7개 표시 이전 댓글 7개 숨기기

답변 (1개)

댓글 수: 13 이전 댓글 11개 표시 이전 댓글 11개 숨기기

카테고리

제품

태그

참고 항목

Community Treasure Hunt

댓글 수: 9
이전 댓글 7개 표시 이전 댓글 7개 숨기기

댓글 수: 13
이전 댓글 11개 표시 이전 댓글 11개 숨기기