How to output a vector that is the sum of each slice or page from a 3D array on GPU

조회 수: 3 (최근 30일)
i7 quadcore, GTX Geforce 960 running CUDA driver, Matlab 2014b
My process I am trying to accomplish:
  1. Send a 3D array to the GPU
  2. Distribute each 'page' (:,:,i) for summing on the GPU
  3. Return the output vector to the CPU
% My stab at it:
array = ones(3,3,5,'gpuArray'); % create a 3x3x5 array of ones on GPU
array = pagefun(@sum,array); % for each page, sum all elements
array = gather(array); % return resulting 1x1x5 array to CPU
% Desired output: array = 1x1x5 vector of 9's
% This throws an error that pagefun does not like the summation function.
On the CPU, a similar process works just fine. It works in a PARFOR or FOR loop just fine on the GPU, but this does not vectorize the process for speed. Would a CUDA kernel using a PTX file be more able to do something like this? Is there a better way to do this? Is this more suitable for a cluster rather than a GPU?
Help appreciated, Will

채택된 답변

Edric Ellis
Edric Ellis 2016년 5월 16일
There's a much simpler solution that involves only a single summation, like so:
array = ones(3, 3, 5, 'gpuArray');
result = sum(reshape(array, [], size(array, 3)));
result = gather(result);
In this case, result will not be the same shape as in your solution, but you can fix this if you wish like so
result = reshape(result, 1, 1, []);
That way should be considerably faster on the GPU (I'm assuming your real data is much larger) because reshape can be done without manipulating the data (unlike rot90), and there's only a single call to sum.

추가 답변 (1개)

Will Kinsman
Will Kinsman 2016년 5월 13일
편집: Will Kinsman 2016년 5월 13일
Solution:
array = ones(3,3,5,'gpuArray'); % create a 3x3x5 array of ones on GPU
array = sum(array); % sum all column vectors to create 1x3x5
array = rot90(array); % rot all pages to create 3x1x5
array = sum(array); % sum all column vectors to create 1x1x5
array = gather(array); % return resulting 1x1x5 array to CPU
unfortunately I asked the question using the sum function as a simpler proxy to determine how to do perform corrcoef on each page, however the question in the title above has been answered so I will mark this as closed.
  댓글 수: 1
Will Kinsman
Will Kinsman 2016년 5월 14일
I was also successful in my corrcoef function, it decomposing the non-gpu functions to using gpu-only functions

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 GPU Computing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by