gpuArray performance on 'xcorr' function

Question

SangMin 2019년 12월 31일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/498559-gpuarray-performance-on-xcorr-function

댓글: SangMin 2020년 1월 2일

채택된 답변: Walter Roberson

MATLAB Online에서 열기

Hi,

I am tring to improve a performance on my script, which is using 'xcorr' function heavily.

I found that 'xcorr' function supports gpuArray and I tried it. However, it seems the performance is not good.

I did three simple examples

t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));  
X = gpuArray(x);     
tic
[r,lags] = xcorr(X,X,'normalized');     
r = gather(r);
toc
% Elapsed time is 0.017178 seconds.
t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));  
tic
[r,lags] = xcorr(x,x,'normalized');     
r = gather(r);
toc
% Elapsed time is 0.004627 seconds.
t = 0:0.001:10-0.001;
x = cos(2*pi*10*t) + randn(size(t));  
X = gpuArray(single(x));    
tic
[r,lags] = xcorr(X,X,'normalized');     
r = gather(r);
toc
% Elapsed time is 0.015555 seconds.

just normal array is much faster than gpuArray.

To test my GPU, I used gpuBench.

For 'single' type data, GPU is much faster!

What should I do to increase the performance on 'xcorr' function?

(I have several thousond of array and each array has 10k elements.)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2019년 12월 31일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/498559-gpuarray-performance-on-xcorr-function#answer_408273

편집: Walter Roberson 2019년 12월 31일

To increase the performance of xcorr double precision on GPU, you should obtain a different GPU. The GeForce GTX 1060 you have runs its double precision at 1/32 of the single precision rate, which is the slowest kind of double precision that NVIDIA offers.

https://devtalk.nvidia.com/default/topic/995849/does-the-gtx1060-support-double-precision-/

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Walter Roberson 2020년 1월 1일

MATLAB Online에서 열기

When I do those timing tests, the times I see are quite variable.

Times are a little less variable if I construct a better test -- but double precision GPU is still the slowest.

t = 0:0.001:10-0.001;
Xd = cos(2*pi*10*t) + randn(size(t));
Xs = single(Xd);
Xgd = gpuArray(Xd);
Xgs = gpuArray(single(Xd));
N = 100;
td = zeros(N,1);
ts = zeros(N,1);
tgd = zeros(N,1);
tgs = zeros(N,1);
fd = @() xc_cpu(Xd);
fs = @() xc_cpu(Xs);
fgd = @() xc_gpu(Xgd);
fgs = @() xc_gpu(Xgs);
for K = 1 : N; td(K) = timeit(fd, 0); end
for K = 1 : N; ts(K) = timeit(fs, 0); end
for K = 1 : N; tgd(K) = gputimeit(fgd, 0); end
for K = 1 : N; tgs(K) = gputimeit(fgs, 0); end
plot([td, ts, tgd, tgs]);
legend({'double (CPU)', 'single (CPU)', 'double (GPU)', 'single (GPU)'});
function [r, lags] = xc_cpu(X)
    [r, lags] = xcorr(X, X, 'normalized');
end
function [r, lags] = xc_gpu(X)
     [r, lags] = xcorr(X, X, 'normalized');
     r = gather(r);
end

However!! If you change the upper bound fro 10-0.001 to 100-0.001 then you will get quite a different graph, with double precision on CPU becoming the slowest, and single precision on GPU becoming the fastest. This suggests that for arrays of the size you were using, that transfer and synchronization times were overwhelming the GPU gain.

SangMin 2020년 1월 2일

Thanks for the bench script. As you mentioned, when I change the upper bound, the performance of GPU gets better (or performance of CPU gets worse). However, in my case, CPU still shows best reulsts. Probably, just my CPU is much better than my GPU.

Thanks!

댓글을 달려면 로그인하십시오.

gpuArray performance on 'xcorr' function

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

gpuArray performance on 'xcorr' function

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기