Speeding up code which integrates a 2D gpuArray matrix using Simpson's Rule

조회 수: 1 (최근 30일)
I have a (real) 2D gpuArray, which I am using as part of a larger code, and now am trying to also integrate the array using the Composite Simpson Rule inside my main loop (several 10000 iterations at least). A MWE looks like the following:
%%%%%%%%%%%%%%%%%% MAIN CODE %%%%%%%%%%%%%%%%%%
Ny = 501; % Dimensions of matrix M
Nx = 501; %
dx = 0.1; % Grid spacings
dy = 0.2; %
M = rand(Ny, Nx, 'gpuArray'); % Initialise a matrix
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D(M, dx, dy, Nx, Ny); % Now integrate M
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% INTEGRATOR %%%%%%%%%%%%%%%%%%
function I = simpsons_integration_2D(F, dx, dy, Nx, Ny)
% Integrate the 2D function F with Nx columns and Ny rows, and grid spacings
% dx and dy using Simpson's rule.
% Integrate along x direction (vertically) --> IX is a vector afterwards
sX = sum( F(:,1:2:Nx-2) + 4*F(:,2:2:(Nx-1)) + F(:,3:2:Nx) , 2);
IX = dx/3 * sX;
% Integrate along y direction --> I is a scalar afterwards
sY = sum( IX(1:2:Ny-2) + 4*IX(2:2:(Ny-1)) + IX(3:2:Ny) , 1);
I = dy/3 * sY;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The operation of performing the integration is around 850 µs, which is currently a significant part of my code. This was measured using
f = @() simpsons_integration_2D(M, dx, dy, Nx, Ny);
t = gputimeit(f)
Is there a way to reduce the execution time for integrating the gpuArray matrix?
(The graphics card is the Nvidia Quadro P4000)

채택된 답변

Jan
Jan 2021년 2월 10일
편집: Jan 2021년 2월 10일
These are faster at least on CPU - I do not have graphics hardware in my laptop which supports GPU arrays:
function I = simpsons_integration_2D_v2(F, dx, dy, Nx, Ny)
% Avoid summation of large intermediate arrays:
sX = F(:,1) + 2 * sum(F(:, 3:2:(Nx-2)), 2) + ...
4 * sum(F(:, 2:2:(Nx-1)), 2) + F(:,Nx);
sY = sX(1) + 2 * sum(sX(3:2:(Ny-2))) + ...
4 * sum(sX(2:2:(Ny-1))) + sX(Ny);
I = dx * dy / 9 * sY;
end
function I = simpsons_integration_2D_v3(F, dx, dy, Nx, Ny)
% Perform summation as dot product:
v = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1];
sX = F * v';
if Nx ~= Ny
v = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
end
sY = v * sX;
I = dx * dy / 9 * sY;
end
Compared timeit output on my mobil i5:
0.00111591015 original
0.00066621015 without large intermediate array
0.00012221015 as matrix multiplication
So this is faster on my old 2 core i5 than on your GPU. Maybe this gives a speedup on your GPU also. I assume, in version v3 the vector v must be created on the GPU also.
  댓글 수: 3
Jan
Jan 2021년 2월 11일
편집: Jan 2021년 2월 11일
On the CPU it is slightly faster to perform it in one step:
s = vY * F * vX.';
I = s * dx * dy / 9;
But this might be an the effect of the JIT acceleration.
Another approach, which is 50% slower than the v3 version on the CPU:
vX = [1, repmat([4, 2], 1, (Nx - 3) / 2), 4, 1].';
vY = [1, repmat([4, 2], 1, (Ny - 3) / 2), 4, 1];
vXY = vY.' * vX.';
function I = simpsons_integration_2D_v5(F, dx, dy, vXY)
s = F(:).' * vXY(:);
I = s * dx * dy / 9;
end
Thomas Barrett
Thomas Barrett 2021년 2월 11일
편집: Thomas Barrett 2021년 2월 11일
Hi @Jan. Thank you. Along with the help of rahnema1 over on SE (https://stackoverflow.com/q/66140278/12921500), we arrived at a quick method by expanding the computation into a dot product. Since none of Nx, Ny, dx or dy are changing in the main loop, much of this can be precalculated into a single vector Simp_coeff as follows:
function Simp_Coeffs = generate_coeffs_for_SimpsonIntegration(dx, dy, Nx, Ny)
% Generate coefficients for integrating a 2D function with Nx columns and Ny rows,
% and grid spacings dx and dy, using Simpson's rule.
% If Nx and Ny are odd the integral can be calculated directly using Simpson's
% composite rule. If Nx/Ny are even, the composite Simpson's rule is used for the
% first (n-3) = even strips, and the Simpson's 3/8 rule is used for the final three strips.
% Credit for this goes here: https://scicomp.stackexchange.com/a/25669/38005
% and here: https://stackoverflow.com/q/66140278/12921500
if rem(Nx,2) % If number of columns is odd
SimpVec1 = (dx/3)*[1, repmat([4, 2], 1, (Nx-3)/2), 4, 1];
else % If number of columns is even
SimpVec1 = dx*[1/3, repmat([4, 2]/3, 1, (Nx-6)/2), 4/3, [17/3 9 9 3]/8 ];
end
if rem(Ny,2) % If number of rows is odd
SimpVec2 = (dy/3)*[1, repmat([4, 2], 1, (Ny-3)/2), 4, 1];
else % If number of rows is even
SimpVec2 = dy*[1/3, repmat([4, 2]/3, 1, (Ny-6)/2), 4/3, [17/3 9 9 3]/8 ];
end
% Convert from matrix to vector, ready to perform dot product later
Simp_Coeffs = reshape(SimpVec1 .* SimpVec2.', 1, []);
end
where I have now accounted for the possibility of an even number of points being supplied (Simpson's composite rule only works for odd number of points usually).
The integrator function now looks like:
function I = simpsons_integration_2D_dotProduct(F, Simp_coeffs)
% Perform integration using Simpson's composite rule as dot product
I = Simp_coeffs * F(:);
end
and the main code looks like:
Simp_Coeffs = generate_coeffs_for_SimpsonIntegration(dx, dy, Nx, Ny);
Simp_Coeffs = gpuArray(Simp_Coeffs); % Move coefficients to GPU
%%% Main Loop %%%
for k = 1:10000
% M = function1(M) % Apply some other functions to M
% ... etc ...
I = simpsons_integration_2D_dotProduct(F, Simp_coeffs); % Now integrate M
end
%%%%%%%%%%%%%%%%%
This reduces the time taken to 77 µs down from 230 µs on my GPU (again using gputimeit, averaged over 1000 runs), which is more than a factor of x11 speedup over the 850 µs of my code in the OP.
Thank you very much, I have learned a lot with this exercise.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Mathematics에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by