GPU performance with short vectors
조회 수: 12 (최근 30일)
이전 댓글 표시
Hello - I see GPU computation underperforming when used for vector manipulation with short lengths.
>> a = rand(1000000, 100,'gpuArray');
>> b= gather(a);
>> tic; for i=1:100 ; eval('q = zeros(1000000,1);for i = 1:100; q = b(:,i)+q;end') ; end;doc
Elapsed time is 45.489811 seconds.
>>tic; for i=1:100 ; eval('qq = zeros(1000000,1);for i = 1:100; q = a(:,i)+q;end') ; end;toc
Elapsed time is 0.875140 seconds.
same when done for short vectors see GPU computation under performing:
>> a = rand(200, 100,'gpuArray');
>>b= gather(a);
>> tic; for i=1:100 ; eval('q = zeros(200,1);for i = 1:100; q = b(:,i)+q;end') ; end;doc
Elapsed time is 0.021727 seconds.
>>tic; for i=1:100 ; eval('qq = zeros(200,1);for i = 1:100; q = a(:,i)+q;end') ; end;toc
Elapsed time is 0.833865 seconds.
Any insight will be appreciated.
Thank you.
댓글 수: 0
채택된 답변
Joss Knight
2016년 4월 20일
편집: Joss Knight
2016년 4월 20일
Computation in a GPU core is significantly slower than in a modern CPU core. It makes up for that by having a lot of them - thousands. If you don't give it thousands of things to do at once, you're never going to beat the CPU.
In your simple computation above you are unnecessarily using a loop. This may have been for illustrative purposes, but if it reflects your actual code, you will gain back your performance by removing the loop, i.e.
q = sum(a, [], 2);
댓글 수: 0
추가 답변 (1개)
참고 항목
카테고리
Help Center 및 File Exchange에서 GPU Computing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!