GPU knnsearch performs slower than CPU for large matrices?

조회 수: 6 (최근 30일)
Ishan Phadke
Ishan Phadke 2024년 3월 17일
편집: Ishan Phadke 2024년 3월 21일
I am currently running a knnsearch on the CPU, with a large number of query points (10 million by 3), where I want the index of where each row of the 10 million belongs with respect to a matrix of 2,500 by 3.
Running knnsearch, without the GPU, takes roughly 11 seconds, but with the GPU it takes about 45 seconds. I did notice that lowering the query points to 1 million, led to the GPU taking roughly 1.5 seconds and the CPU now only takes 1.3 seconds.
Ultimately I thought using the GPU would do the knnsearch faster, but it doesn't seem to be the case. Am I implementing this correctly? Any other advice on how to get indeces from knnsearch faster, either with GPU or without is greatly appreciated!
Example code below:
Note: I will mention that my "X" and "Y" for my implementation are not generated by randn, but are the same dimensions. I don't think that would change the interpretation of what I am trying to do, but just thought I'd mention it.
rng default
%%% 10 million queries
%%Using CPU
X=randn(2500,3);
Y=randn(10000000,3);
c=@() knnsearch(X,Y);
tcpu=timeit(c);
%%Using GPU
gx=gpuArray(X);
gy=gpuArray(Y);
g=@() knnsearch(gx,gy);
tgpu=gputimeit(g);
%%% Now just 1 million queries
%%Using CPU
X=randn(2500,3);
Y=randn(1000000,3);
c2=@() knnsearch(X,Y);
tcpu2=timeit(c2);
%%Using GPU
gx=gpuArray(X);
gy=gpuArray(Y);
g2=@() knnsearch(gx,gy);
tgpu2=gputimeit(g2);

답변 (1개)

Damian Pietrus
Damian Pietrus 2024년 3월 19일
Hello Ishan,
There is some overhead when moving variables between the CPU and the GPU. For shorter running calculations, this data overhead can account for a decent percentage of your overall compute time. As a comparison, try measuring both the overall GPU time and the time just for the GPU computation. This will give you an idea of how much of the total time is spent just on data transfer overhead.
  댓글 수: 1
Ishan Phadke
Ishan Phadke 2024년 3월 21일
편집: Ishan Phadke 2024년 3월 21일
How would I measure just the time for the GPU computation instead of the overall GPU time? Would this be using something like tic/toc before executing wait() for the GPU to complete, or is there a better way to do this?
I also have a related follow up question, but if this is better as a separate question, let me know. Does the GPU perform better for calculations with "more square matrices"? By "more square" I simply mean the two dimensions of a m by n matrix are closer to eachother. I tried running the knnsearch but looping through chunks of the large "gy" vector and this ran faster than the non-looped version on the GPU. May just be a fluke, but this was surprising to me, so thought I'd ask.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Classification Ensembles에 대해 자세히 알아보기

태그

제품


릴리스

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by