Effective GPU Bandwidth Nvidia Quadro 6000
조회 수: 13 (최근 30일)
이전 댓글 표시
Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:
Test : 2048 x 2048
Elapsed CPU time is : 0.109062 sec
Elapsed GPU time is : 0.007661 sec
Elapsed GPU time with CPU transfer is : 0.079723 sec
Speed up : 14.236 without memory transfer
1.36801 with memory transfer
Test : 4096 x 4096
Elapsed CPU time is : 0.356208 sec
Elapsed GPU time is : 0.026819 sec
Elapsed GPU time with CPU transfer is : 0.29406 sec
Speed up : 13.2819 without memory transfer
1.21134 with memory transfer
Test : 8192 x 8192
Elapsed CPU time is : 1.30381 sec
Elapsed GPU time is : 0.121605 sec
Elapsed GPU time with CPU transfer is : 1.17194 sec
Speed up : 10.7217 without memory transfer
1.11252 with memory transfer
If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s
Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?
Benchmark used to measure the bandwidth:
sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
data = randi([0 255], sizes(ii), 1, 'uint8');
for rr=1:repeats
timer = tic();
gdata = gpuArray(data);
sendTimes(ii) = min(sendTimes(ii), toc(timer));
timer = tic();
data2 = gather(gdata);
gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))
댓글 수: 0
답변 (2개)
Edric Ellis
2013년 3월 19일
Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.
We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.
댓글 수: 0
Domenico
2013년 3월 19일
댓글 수: 1
Edric Ellis
2013년 3월 19일
Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.
참고 항목
카테고리
Help Center 및 File Exchange에서 GPU Computing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!