Using gpuArrays to speed up a simulation (utilizing an NVIDIA GPU)

조회 수: 2 (최근 30일)
Erez
Erez 2018년 1월 17일
댓글: Jan 2018년 1월 17일
I have a Matlab simulation which updates an array :
Array=zeros(1,1000)
as follows:
for j=1:100000
Array=Array+rand(1,1000)
end
My question is the following: This loop is linear, so it cannot be parralelized for each slot in the array, but different slots are updated independently. So, naturally Matlab performs array operations such as this in parralell using all the cores of the CPU.
I wish to get the same calculation performed on my NVIDIA GPU, in order to speed it up (utilizing the larger number of cores there).
The problem is: that naively doing
tic
Array=gpuArray(zeros(1,1000));
for j=1:100000
Array=Array+gpuArray(rand(1,1000));
end
toc
results in the calculation time being 8 times longer!
What am I doing wrong?

채택된 답변

Jan
Jan 2018년 1월 17일
rand(1, 1000) is created on the CPU and than copied to the graphics board. This communication is slow. Better create the random values directly on the GPU: https://www.mathworks.com/help/distcomp/examples/generating-random-numbers-on-a-gpu.html
Nevertheless, the code is not meaningful with random numbers. It might be useful to show us the real problem.
  댓글 수: 2
Erez
Erez 2018년 1월 17일
편집: Erez 2018년 1월 17일
Thank you. It is still not so clear to me, after reading your link, how to generate just a simple array of 1000 uniformly distributed random numbers (between [0,1]) for each iteration of the loop above? Can you please demonstrate what exactly would the adapted simple code look like? My purpose is to understand the basics at the moment.
Jan
Jan 2018년 1월 17일
@Erez: Sorry if I ask, but did you read the link? There you find:
Typically these numbers are generated using the functions rand,
randi, and randn. Parallel Computing Toolbox™ provides three
corresponding functions for generating random numbers directly
on a GPU: gpuArray.rand, gpuArray.randi, and gpuArray.randn.
Try:
tic
Array = gpuArray(zeros(1,1000));
for j = 1:100000
Array = Array + gpuArray.rand(1,1000);
end
toc
Concerning your original question: "So, naturally Matlab performs array operations such as this in parralell using all the cores of the CPU." This is not true under some circumstances. You can check this with the TaskManager: Adding a [1 x 1000] vector to another is a very cheap job. Doing this with AVX code can add multiple doubles in each instruction. Starting a thread on each core of the CPU would be far too expensive. Therefore I assume that the loop is processed on one core only:
for j = 1:100000
Array = Array + rand(1, 1000);
end
This might be different for rand(16, 10000). For e.g. the sum() command the limit is 88999: While sum(1, 88999) uses one core only, sum(1, 89000) runs on 2 cores - and is slower on some machines. This limit is a rough guess only and it depends on the CPU how much time starting a thread needs.
Note that your operation Array + rand(1, 1000) spends the most time with the creation of random numbers, what is much more expensive than the cheap addition. Therefore the huge number of GPU cores might not be a substantial boost.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 GPU Computing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by