Using GPU on multiple nested loops

조회 수: 2 (최근 30일)
Jhinhwan Lee
Jhinhwan Lee 2019년 8월 26일
답변: Raunak Gupta 2019년 8월 30일
The following code is slow for large ncnt (typically >2000)
and I want to use my GPU for the outermost (iplane) loop.
Can you give me an hint? (I have an NVIDIA RTX 8000.)
nxs=101;
nys=101;
nzs=101;
ncnt=100;
xnmin=-1.0;
xnmax= 1.0;
ynmin=-1.0;
ynmax= 1.0;
znmin=-1.0;
znmax= 1.0;
coefftmp=complex(rand(1,ncnt));
igalltmp=rand(3,ncnt);
vktmp=rand(1,3);
wfrtmp=complex(zeros(nxs,nys,nzs));
tic
for iplane=1:ncnt % GPU loop
ee=exp(2.*pi*complex(0.,1.));
vkg=vktmp+double(igalltmp(:,iplane)');
ekx=ee^vkg(1);
eky=ee^vkg(2);
ekz=ee^vkg(3);
coefft=coefftmp(iplane);
for iz=1:nzs
z=znmin+(znmax-znmin)*double(iz-1)/double(nzs-1);
ekzz=ekz^z;
for iy=1:nys
y=ynmin+(ynmax-ynmin)*double(iy-1)/double(nys-1);
ekyy=eky^y;
for ix=1:nxs
x=xnmin+(xnmax-xnmin)*double(ix-1)/double(nxs-1);
ekxx=ekx^x;
wfrtmp(ix,iy,iz)=wfrtmp(ix,iy,iz)+coefft*ekxx*ekyy*ekzz;
end
end
end
end
wfr(ispin,:,:,:)=wfrtmp/sqrt(Vcell);
toc
  댓글 수: 4
Walter Roberson
Walter Roberson 2019년 8월 26일
If you want to use GPU, you are going to have to rewrite your code to be vectorized.
I suggest you consider
zvec = linspace(znmin, znmax, nzs);
yvec = linspace(ynmin, ynmax, nys);
xvec = linspace(xnmin, xnmax, nxs);
[X, Y, Z] = ndgrid(xvec, yvec, zvec);
before any looping. After that you can do things like
coeff .* ekz.^Z .* eky.^Y .* ekx.^X
Jhinhwan Lee
Jhinhwan Lee 2019년 8월 26일
Thanks! I did something basically the same and it is more than ten times faster now.
I also found including the X, Y and Z in the argument of the exp function slightly better: In the cases of kx=1, 2, 3, ... ekx=exp(2.*pi*complex(0.,1.)*kx)=1 and ekx^x==1 no matter what x is, while exp(2.*pi*complex(0.,1.)*kx*x) depends on x (unless k=0) as expected.
coeff*exp(2.*pi*complex(0.,1.)*(kx*X+ky*Y+kz*Z))

댓글을 달려면 로그인하십시오.

답변 (1개)

Raunak Gupta
Raunak Gupta 2019년 8월 30일
Hi,
For speeding up the code you need to first vectorize the three loops inside the main loop as they are independent of each other. As mentioned in the comments you can use linspace and ndgrid for doing exponentiation for all three variables independently.
The above part only vectorizes the code but to actually use GPU you can create the initial arrays using gpuArray. This may also significantly fasten up the code. The function that you have used inside the code is supported for gpuArray but if you want to use any specific function you can check about all the supported function here.

카테고리

Help CenterFile Exchange에서 GPU Computing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by