3D gpuArray vs cells of 2D gpuArrays major speed difference!

Question

0 개 추천

Can anybody explain why these codes have drastically different runtimes?

I have a shared setup routine

clear all
y = gpuArray.rand(1000, 1000, 'single');
W = cell(1, 5);
WFull = gpuArray.zeros(1000, 1000, 5);
for j = 1:5
   W{j} = gpuArray.rand(1000, 1000, 'single');
   WFull(:,:,j) = W{j};
end

Version 1 (finishes in 1.4 seconds on my machine)

z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
   for j = 1:size(W)
      z(:,:,j) = W{j}*y;
   end
end
toc

vs. Version 2 (finishes in 39 seconds on my machine... 27x times slower)

z = gpuArray.zeros(1000, 1000, 5);
tic
for i = 1:1000
   for j = 1:size(WFull, 3)
      z(:,:,j) = WFull(:,:,j)*y;
   end
end
toc

Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Matt J 2013년 5월 24일

편집: Matt J 2013년 5월 24일

MATLAB Online에서 열기

2 개 추천

Do you think that slicing large 3D gpuArrays is just really slow compared to looking up cell array values?

Yes, it is faster to look-up a cell than to pull a slice out of a 3D array, and that's true for normal arrays as well, as long as there is a small number of slices/cells. Of course, you should really be including the time needed to allocate memory to each W{j} in your comparison.

Another reason is that you have a syntax error in your for-loop over W{j}. It's only doing 1 loop iteration instead of 5,

   >> for j=1:size(W), j, end 
j =
       1

This is biasing the comparison to some degree.

댓글 수: 2
없음 표시 없음 숨기기

Dan Ryan 2013년 5월 24일

MATLAB Online에서 열기

I caught a couple of other issues where I had left 'single' off of the gpuArray creation for some items and had it present for others... I changed

size(W)

to

size(W, 2)

and now the comparison is much closer.

Here is the new code:

clear all
y = gpuArray.rand(1000, 1000, 'single');
z = gpuArray.zeros(1000, 1000, 5, 'single');
W = cell(1, 5);
for j = 1:5
   W{j} = gpuArray.rand(1000, 1000, 'single');
end
tic
for i = 1:500
   for j = 1:size(W, 2)
      z(:,:,j) = W{j}*y;
   end
end
toc
clear all
y = gpuArray.rand(1000, 1000, 'single');
z = gpuArray.zeros(1000, 1000, 5, 'single');
WMat = gpuArray.rand(1000, 1000, 5, 'single');
tic
for i = 1:500
   for j = 1:size(WMat, 3)
      z(:,:,j) = WMat(:,:,j)*y;
   end
end
toc

What is really strange to me is that the execution time is very nonlinear in terms of the number of loops, i. There must be some sort of memory flush going on when i gets large, not really sure why though...

i = 100 -> runtimes are 0.10 and 0.14 seconds

i = 200 -> runtimes are 0.73 and 1.98 seconds

i = 500 -> runtimes are 10.3 and 11.7 seconds (notice the large jump for version 1!)

i = 1000 -> runtimes are 26.3 and 28.0 seconds!

Have any clue about this highly nonlinear trend? I don't see why GPU memory would come into play since I am basically just writing over existing values and performing the exact same computations in every iteration!

Dan Ryan 2013년 5월 30일

MATLAB Online에서 열기

James Lebak from mathworks helped me out with a really good tip:

use a

wait(gpuDevice)

command before the

toc

command when timing the GPU speeds.

Now the timings increase linearly with number of loop iterations and the two implementations give very similar results. Good to know!

댓글을 달려면 로그인하십시오.

3D gpuArray vs cells of 2D gpuArrays major speed difference!

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

제품

태그

Community Treasure Hunt

3D gpuArray vs cells of 2D gpuArrays major speed difference!

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

제품

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기