Is it possible to speed up this loop or avoid from using?
조회 수: 1 (최근 30일)
이전 댓글 표시
I give an example code below including 4 loops. How can I speed up this procedure? thanks for any help and ideas.
clc;clear all
%%inputs
dx=1; dy=1; xo=5;
f1=5;
f2=10;
f3=15;
kx=1:50;
ky=1:50;
k=rand(50,50);
z=rand(50,50);
%%%
%%%% how can I improve this example below for running faster ?
tic
for j=1:50
for i=1:50
total=0;
for m=1:50
beta= (m-1)*dy;
for n=1:50
alpha= (n-1)*dx;
f4 =(1-exp(-(xo+ 2.*pi.*k(j,i)).*z(m,n)));
f5 = exp(-2.*pi.*(kx(i).*alpha + ky(j).*beta));
total = total+ f1.*f2.*f3.*f4.*f5;
end
end
L(j,i)=total;
end
end
toc
댓글 수: 3
Jan
2019년 5월 29일
편집: Jan
2019년 5월 29일
If speed matters, the first thing you should do is to remove the clear all. The editor marks the line "L(j,i) = total" with an orange line and explains, that the iterative growing of an array wastes time. So pre-allocate before the loop:
L = zeros(50, 50)
But this does not influence the runtime substantially in this case.
채택된 답변
Matt J
2019년 5월 29일
편집: Matt J
2019년 5월 29일
dx=1; dy=1; xo=5;
f1=5;
f2=10;
f3=15;
kx=1:50;
ky=1:50;
k=rand(50,50);
z=rand(50,50);
%%%
ky=ky(:);
z=reshape(z,[1,1,size(z)]);
alpha=reshape((0:49)*dx,1,1,1,[]);
beta=reshape((0:49)*dy,1,1,[]);
f4 = (1-exp(-(xo+ 2.*pi.*k).*z));
f5 = exp(-2.*pi.*(kx.*alpha + ky.*beta));
L=(f1.*f2.*f3).*sum( sum( f4.*f5, 3) ,4);
댓글 수: 8
Jan
2019년 5월 29일
편집: Jan
2019년 5월 29일
+1. I could not measure a benefit, but splitting the exp part should give a little bit more speed:
z = reshape(z, [1, 1, 50, 50]);
alpha = reshape((0:49) * dx, 1, 1, 1, 50);
beta = reshape((0:49) * dy, 1, 1, 50);
f4 = 1 - exp(-(xo + 2 * pi * k) .* z);
f5 = exp(-2 * pi * kx .* alpha) .* exp(-2 * pi * ky(:) .* beta);
L = (f1 * f2 * f3) * sum(f4 .* f5, [3,4]);
Or for Matlab <= R2016b:
z = reshape(z, [1, 1, 50, 50]);
alpha = reshape((0:49) * dx, 1, 1, 1, 50);
beta = reshape((0:49) * dy, 1, 1, 50);
f4 = 1 - exp(-(xo + 2 * pi * k) .* z);
f5 = bsxfun(@times, exp(-2 * pi * kx .* alpha), ...
exp(-2 * pi * ky(:) .* beta));
L = (f1 * f2 * f3) * sum(sum(f4 .* f5, 3), 4);
Check, if the bsxfun version is faster or slower even in Matlab >= R2016b. My best timing was 0.056 seconds. 1100 times faster than the original version!
Matt, nice!
추가 답변 (1개)
Jan
2019년 5월 29일
편집: Jan
2019년 5월 29일
Start with vectorizing the innermost loop:
dx = 1; dy = 1; xo = 5;
f1 = 5;
f2 = 10;
f3 = 15;
kx = 1:50;
ky = 1:50;
k = rand(50,50);
z = rand(50,50);
tic
L = zeros(50, 50); % Pre-allocate
for j = 1:50
for i = 1:50
total = 0;
for m = 1:50
beta = (m-1) * dy;
% for n = 1:50
alpha = ((1:50)-1)*dx;
f4 = (1-exp(-(xo + 2.*pi.*k(j,i)) .* z(m, 1:50)));
f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
total = total + sum(f4 .* f5);
% end
end
L(j, i) = f1 * f2 * f3 * total;
end
end
toc
This reduces the runtime from 62 to 2 seconds already. Do you see, how it works? I just moved the index vector from for n = 1:50 inside the code by replacing all "n" by "1:50". I've moved the constant f1 * f2 * f3 out of the loop - actually it could be multiplied outside all loops also. In addition a sum() is needed around f4 .* f5.
Now do the same for for m also:
tic
L = zeros(50, 50); % Pre-allocate
for j = 1:50
for i = 1:50
total = 0;
% for m = 1:50
beta = ((1:50).'-1) * dy;
% for n = 1:50
alpha = ((1:50)-1)*dx;
f4 = (1-exp(-(xo + 2.*pi.*k(j,i)) .* z(1:50, 1:50)));
f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
total = total + sum(sum(f4 .* f5));
% end
% end
L(j, i) = f1 * f2 * f3 * total;
end
end
toc
Now the index vector for beta must be transposed and for f5 the auto-expanding is applied.
This needs 0.22 seconds. With some simplifications and moving the definitions of alpha and beta before the loops:
L = zeros(50, 50); % Pre-allocate
alpha = (0:49) * dx;
beta = (0:49).' * dy;
for j = 1:50
for i = 1:50
f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
% For Matlab < R2018b without auto-expanding:
%f5 = exp(-2 * pi * (bsxfun(@plus, kx(i) * alpha, ky(j) .* beta)));
L(j, i) = sum(sum(f4 .* f5));
end
end
L = f1 * f2 * f3 * L;
0.17 seconds.
Now it's time to use some maths: exp(a + b) = exp(a) .* exp(b). Because the exp() function is very expensive, it is much cheaper to evaluate it for the two vectors instead of the matrix:
L = zeros(50, 50); % Pre-allocate
alpha = (0:49) * dx;
beta = (0:49).' * dy;
for j = 1:50
for i = 1:50
f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
% f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
f5 = exp(-2 * pi * kx(i) * alpha) .* exp(-2 * pi * ky(j) .* beta);
L(j, i) = sum(sum(f4 .* f5));
end
end
L = f1 * f2 * f3 * L;
0.14 seconds. But the 2nd part of the f5 calculations does not depend on the inner loop, so move it outside:
L = zeros(50, 50); % Pre-allocate
alpha = (0:49) * dx;
beta = (0:49).' * dy;
for j = 1:50
f5_j = exp(-2 * pi * ky(j) .* beta);
for i = 1:50
f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
% f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
f5 = exp(-2 * pi * kx(i) * alpha) .* f5_j;
L(j, i) = sum(sum(f4 .* f5));
end
end
L = f1 * f2 * f3 * L;
0.12 seconds. I hoped it is faster.
Now the next performance boost is to vectorize the outer loops also: See Matt J's answer.
참고 항목
카테고리
Help Center 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!