Indexing after knnsearch with GPU is slow

조회 수: 3 (최근 30일)
xu fan
xu fan 2022년 1월 3일
댓글: Joss Knight 2022년 1월 4일
I was trying to speed up my model using GPU. Part of the code running on CPU is like the following
m = 500;
n = 500;
dx = 2;
x = 1 : n; %col
y = 1 : m; %row
[xx yy] = meshgrid(x,y);
%
A = ones(m,n);
A(1,:) = 0;
%
indwatershed = find(A==1); % watershed pixels
indchannel = find(A==0); % channel pixels
%
xchannel = xx(indchannel);
ychannel = yy(indchannel);
PC = [xchannel ychannel];
%
xwatershed = xx(indwatershed);
ywatershed = yy(indwatershed);
PW = [xwatershed ywatershed];
%
tic
[loc, mdxy] = knnsearch(PC,PW); % find the nearest channel pixel to each watershed pixel
toc
%
tic
Y1 = ychannel(loc); % indexing
toc
The time cost for knnsearch and indexing is:
Elapsed time is 0.098330 seconds.
Elapsed time is 0.001529 seconds.
Then I transfer this code to GPU like the following:
m = 500;
n = 500;
dx = 2;
x = 1 : n; %col
y = 1 : m; %row
% all the parameters are transfered to GPU
m = gpuArray(m);
n = gpuArray(n);
dx = gpuArray(dx);
x = gpuArray(x);
y = gpuArray(y);
%
[xx yy] = meshgrid(x,y);
%
A = ones(m,n,'gpuArray');
A(1,:) = 0;
%
indwatershed = find(A==1); % watershed pixels
indchannel = find(A==0); % channel pixels
%
xchannel = xx(indchannel);
ychannel = yy(indchannel);
PC = [xchannel ychannel];
%
xwatershed = xx(indwatershed);
ywatershed = yy(indwatershed);
PW = [xwatershed ywatershed];
%
tic
[loc, mdxy] = knnsearch(PC,PW); % find the nearest channel pixel to each watershed pixel
toc
%
tic
Y1 = ychannel(loc); % indexing
toc
And the time cost for knnsearch and indexing is:
Elapsed time is 0.005452 seconds.
Elapsed time is 0.145393 seconds.
This means that knnsearch is mush faster on GPU than CPU, but the following indexing is much slower.
Then I add a wait() function between knnsearch and the indexing:
dev = gpuDevice; % new lines
m = 500;
n = 500;
dx = 2;
x = 1 : n; %col
y = 1 : m; %row
%
m = gpuArray(m);
n = gpuArray(n);
dx = gpuArray(dx);
x = gpuArray(x);
y = gpuArray(y);
%
[xx yy] = meshgrid(x,y);
%
A = ones(m,n,'gpuArray');
A(1,:) = 0;
%
indwatershed = find(A==1); % watershed pixels
indchannel = find(A==0); % channel pixels
%
xchannel = xx(indchannel);
ychannel = yy(indchannel);
PC = [xchannel ychannel];
%
xwatershed = xx(indwatershed);
ywatershed = yy(indwatershed);
PW = [xwatershed ywatershed];
%
tic
[loc, mdxy] = knnsearch(PC,PW); % find the nearest channel pixel to each watershed pixel
toc
%
tic
wait(dev)
toc
%
tic
Y1 = ychannel(loc); % indexing
toc
I get:
Elapsed time is 0.007852 seconds.
Elapsed time is 0.146666 seconds.
Elapsed time is 0.000470 seconds.
The wait() function took a long time! But all the arrays and parameters are working in GPU. How can this happen? I will be so appreciated if anyone can help to resolve this problem.

채택된 답변

Joss Knight
Joss Knight 2022년 1월 3일
wait just asks the GPU to finish executing any pending operations, in this case, the call to knnsearch. Your previous timing code was invalid because you did not do this; instead the cost of call to knnsearch was bundled in with the indexing call, since the variable loc needed to finish being computed before that line of code could be executed.
In general, use gputimeit for accurately timing GPU code.
  댓글 수: 2
xu fan
xu fan 2022년 1월 4일
Hello Joss. Thank you for your answers. I have tested gputimei:
f = @() knnsearch(PC,PW);
t = gputimeit(f)
and it returns 0.1260s.
I have also checked the explanation of wait(gpudevice) in the mannual:
"wait(gpudev) blocks execution in MATLAB® until the GPU device identified by the GPUDevice object gpudev completes its calculations. This can be used before calls to toc when timing GPU code that does not gather results back to the workspace. When gathering results from a GPU, MATLAB automatically waits until all GPU calculations are complete, so you do not need to explicitly call wait in that situation."
I'm not quite understand how GPU calculation works. When I run the code with a breakpoint at the line of wait(), I can find the result "loc" (i.e., the results of knnsearch) in the workspace. But when I subsquently use "loc" for indexing, the programs seems to freeze for about 0.1s, which I think is the wait() time. My question is what this extra waiting time represent? If this waiting time is spend for the communication between the gpu and the local workspace and if it is possible to be avoided? Many thanks!
Joss Knight
Joss Knight 2022년 1월 4일
GPU operations run asynchronously where possible. This means that the computation runs in the background while MATLAB continues to process the next line of code. If you attempt to display the contents of an array that is pending evaluation (in the debugger or the workspace for instance) then MATLAB will automatically complete execution in order to provide that data - in other words, it will call wait for you. In general this behaviour should be entirely hidden from the user - it's an internal optimisation. However, it does cause potential confusion when you attempt to time your code.
This page of the documentation gives some tips on how to time your code correctly.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Image Processing and Computer Vision에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by