How do I optimize this code to run efficiently on the GPU?

조회 수: 3 (최근 30일)
Jan
Jan 2013년 11월 27일
편집: Joss Knight 2013년 11월 28일
Dear Matlab user,
I try making optimization of function listed below for GPU computing. I try many version of GPU algorithm but look for me that always is GPU slower. I really appreciate any suggestion or help.
%%Declaration of variables
K=4;
C11n = rand(K,508032);
[x1, x2] = size(C11n);
C22b = zeros(x1,x1*(x2/2),'double');
C2 = zeros(K,K,x2/2,'double');
E=eye(x1);
A=reshape(C11n,K,2,x2/2);
AT=permute(A,[2 1 3]);
%%CPU code
tic
for k=1:x2/2
C2(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
%%GPU code
% Declaration of variables
C22 = gpuArray(zeros(K,K,x2/2,'double'));
E=gpuArray(eye(x1));
A=gpuArray(reshape(C11n,K,2,x2/2));
tic
for k=1:x2/2
C22(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
with best regards
Jan

답변 (2개)

Ashish Uthama
Ashish Uthama 2013년 11월 27일
A quick 'air' code using pagefun:
tic
M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2));
M = pagefun(@mtimes, M, M);
C22 = repmat(E,[1 1 x2/2])-M;
toc
I would be curious to know if this works for you, and what times you get on your hardware.
  댓글 수: 1
Jan
Jan 2013년 11월 27일
편집: Jan 2013년 11월 27일
Thank you for idea, I will try and let you know...
I apologise, but first time I wrote bad code, I forgot for inversion of matrix, now is code corrected.
BTW, pagefun, help me, it is 10x times speed up (M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2)); ). Now I need figure out how do it quick inversion on every page of 3D matrix. I will inform you.

댓글을 달려면 로그인하십시오.


Joss Knight
Joss Knight 2013년 11월 28일
편집: Joss Knight 2013년 11월 28일
Are your matrices always 4x2? This results in AT*A being 2x2, so you can just calculate your inverses manually:
function Ainv = batch2x2inv(A)
% Grab each matrix element as a vector
a = A(1,1,:);
b = A(1,2,:);
c = A(2,1,:);
d = A(2,2,:);
% Compute determinants
det = a.*d - b.*c;
% Construct inverse
Ainv = bsxfun(@rdivide, [d -b; -c a], det);
end
...and the relevant chunk of your code also uses pagefun as Ashish suggests:
AT = pagefun(@transpose, A);
ATA = pagefun(@mtimes, AT, A);
invATA = batch2x2inv(ATA);
pinvA = pagefun(@mtimes, invATA, AT);
residual = pagefun(@mtimes, A, pinvA);
C22 = bsxfun(@minus, E, residual);
Your code now runs 6x faster than the CPU on my machine.

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by