MATLAB Answers

Reduction variables on the GPU II and arrayfun: cannot assign to parent function variable?

조회 수: 3(최근 30일)
Hello,
This is (hopefully) a simple reduction variable question for performing parallel GPU operations onto a single value. I have read the tutorial on stencil processing and frankly do not understand why this does not work.
A simple example is below (not intended to be actually used, stand in for more complicated operations). Here, I am taking some vector array, using a gpu arrayfun to get the difference between neighbors, and then trying to sum those differences to a single variable. Since the difference operation is order independent, and the result is summed onto a single variable, I figured a comination of arrayfun + a reduction variable using nested functions would be the best way to start.
function v = reductionVariableLoopTest()
x = gpuArray.rand(100,1);
v = gpuArray.zeros(1);
function d = difFun(ind)
d = x(ind+1) - x(ind);
end
function sumFun(ind)
v = v + difFun(ind);
end
vect = gpuArray.colon(1,length(x)-1);
arrayfun(@sumFun,vect);
end
However, this gives the error: Assignment of parent function variable(s): 'v' by 'sumFun' is not allowed.
Now, I know I could get around this by simply using
y = arrayfun(difFun,vect);
v = sum(y);
but this misses the whole point of using a reduction variable. The order independent on-gpu difFun should be extremely fast, and the use of the shared variable v should be both fast and memory friendly.
Any thoughts?
Cheers,
-Dan

  댓글 수: 0

로그인 to comment.

채택된 답변

Joss Knight
Joss Knight 4 Dec 2018
No, you can only read from uplevel variables, and then only one element at a time. You cannot write to them. That is not the intention of GPU arrayfun.
Generally we encourage you to vectorize your code and use MATLAB's own element-wise, reduction and accumulation operations. They are well optimized. If you need to go further than that then you're into the realm of writing your own kernels in CUDA C++. Alternatively, GPU Coder can be used to attempt to combine loops to form kernels that combine various parallel algorithms.

  댓글 수: 0

로그인 to comment.

More Answers (0)

이 질문에 답변하려면 로그인을(를) 수행하십시오.


Translated by