arrayfun on GPU with each call working from common block of data

Question

Jonathan 2015년 8월 8일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/232855-arrayfun-on-gpu-with-each-call-working-from-common-block-of-data

답변: Edric Ellis 2015년 8월 12일

I am using arrayfun to perform a convolution on a set of images with different kernels. Instead of looping over each kernel, I use array fun. This works fine on a CPU, especially since the anonymous function can access variables from the parent function (e.g. each set of images is the same, but arrayfun runs over an array of kernels).

The problem with doing the same thing with arrayfun on the GPU is:

arrayfun/gpu doesn't support anonymous functions accessing parent variables, so how do I pass the same static block of data to each evaluation of arrayfun?
It also doesn't seem to support passing in a single struct or cell containing some common block of data.

Again, this works on a CPU, but how to do this on a GPU in matlab? In the below, images and filts are 3D matrix variables in the parent function, and the arrayfun executed over idx=1:n.

    Ac = arrayfun(@(i) convFilts(images, filts, i), b, idx, 'UniformOutput', false);
    function A = convFilts(images, filt, idx)
        A = convn(images, filt(:,:,idx), 'valid');
    end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2015년 8월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/232855-arrayfun-on-gpu-with-each-call-working-from-common-block-of-data#answer_188680

편집: Matt J 2015년 8월 8일

GPU parallelization is only effective when there are no large data sets shared by the parallel threads. The many multi-processors that a graphics card has for parallel computation each have very small amounts of cache-like memory for shared data. That is why you are finding obstacles to what you are trying to do on the GPU. The toolbox doesn't expect you to use it for that kind of parallelization.

댓글 수: 2
없음 표시없음 숨기기

Jonathan 2015년 8월 8일

편집: Jonathan 2015년 8월 8일

That is not correct. GPUs are very effective at operating on large blocks of data. However, the routine/kernel must be written such that each thread operates on a portion of the larger problem.

I am not proposing to do anything here that violates that model. Really, it is no different than each individual call to "convn". I am only saying I wish to dispatch a set of convn calls simultaneously that have different small kernels.

In fact, there is a Matlab function, pagefun, which does a similar operation. However, it supports only a very limited set of functions (such as rot90, but not convn).

I could write this myself very efficiently as a CUDA kernel, but I am hoping that Mathworks has done this already, since that is what I pay for.

Matt J 2015년 8월 9일

편집: Matt J 2015년 8월 9일

GPUs are very effective at operating on large blocks of data. However, the routine/kernel must be written such that each thread operates on a portion of the larger problem.

I guess what I was really trying to say is that gpuArray.arrayfun is not smart enough, I don't believe, to partition fixed data, shared by your anonymous function, among threads. It only knows how to divide up your idx data element-wise. The way arrayfun would need to partition the shared data for effective acceleration is, I imagine, too specific to the operation you're trying to perform for arrayfun to accomodate it in a generic way.

I could write this myself very efficiently as a CUDA kernel, but I am hoping that Mathworks has done this already, since that is what I pay for.

You could try writing a CUDA kernel object. The toolbox at least gives you a way to focus your coding effort to the kernel only, and not the stubs needed to shuttle data back and forth to the card.

댓글을 달려면 로그인하십시오.

Answer 2

Edric Ellis 2015년 8월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/232855-arrayfun-on-gpu-with-each-call-working-from-common-block-of-data#answer_189051

arrayfun on the GPU cannot access the parent workspace of anonymous functions, but it can access the parent workspace for nested function handles. There's a detailed example in the documentation.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

arrayfun on GPU with each call working from common block of data

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

arrayfun on GPU with each call working from common block of data

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기