Is it possible to speed up this loop or avoid from using?

Question

e_oksum 2019년 5월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/464423-is-it-possible-to-speed-up-this-loop-or-avoid-from-using

편집: Jan 2019년 5월 29일

I give an example code below including 4 loops. How can I speed up this procedure? thanks for any help and ideas.

clc;clear all
%%inputs
dx=1; dy=1; xo=5;
f1=5;
f2=10;
f3=15;
kx=1:50;
ky=1:50;
k=rand(50,50);
z=rand(50,50);
%%%
%%%% how can I improve this example below for running faster ? 
tic
for j=1:50
    for i=1:50
        total=0;
        for m=1:50
            beta= (m-1)*dy;   
            for n=1:50
                alpha= (n-1)*dx;
                f4 =(1-exp(-(xo+ 2.*pi.*k(j,i)).*z(m,n)));
                f5 = exp(-2.*pi.*(kx(i).*alpha + ky(j).*beta));
                total = total+ f1.*f2.*f3.*f4.*f5;
            end
        end
        L(j,i)=total;
    end
end
toc

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

e_oksum 2019년 5월 29일

yes ı have but it connects to many other functions, thus providing that is not efficient here..

the for loops that ı give is only an example where the inputs going to the loops are changed while an iterative procedure..thus it becomes important to me to reduce the time of loops given here..

Jan 2019년 5월 29일

편집: Jan 2019년 5월 29일

MATLAB Online에서 열기

If speed matters, the first thing you should do is to remove the clear all. The editor marks the line "L(j,i) = total" with an orange line and explains, that the iterative growing of an array wastes time. So pre-allocate before the loop:

L = zeros(50, 50)

But this does not influence the runtime substantially in this case.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2019년 5월 29일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/464423-is-it-possible-to-speed-up-this-loop-or-avoid-from-using#answer_377026

편집: Matt J 2019년 5월 29일

MATLAB Online에서 열기

dx=1; dy=1; xo=5;
f1=5;
f2=10;
f3=15;
kx=1:50;
ky=1:50;
k=rand(50,50);
z=rand(50,50);
%%%
ky=ky(:);
z=reshape(z,[1,1,size(z)]);
alpha=reshape((0:49)*dx,1,1,1,[]);
beta=reshape((0:49)*dy,1,1,[]);
f4 = (1-exp(-(xo+ 2.*pi.*k).*z));
f5 = exp(-2.*pi.*(kx.*alpha + ky.*beta));
L=(f1.*f2.*f3).*sum(  sum( f4.*f5, 3)  ,4);

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

e_oksum 2019년 5월 29일

many thanks to Matt J and Juan,,it works -- nice

Jan 2019년 5월 29일

편집: Jan 2019년 5월 29일

MATLAB Online에서 열기

+1. I could not measure a benefit, but splitting the exp part should give a little bit more speed:

z     = reshape(z, [1, 1, 50, 50]);
alpha = reshape((0:49) * dx, 1, 1, 1, 50);
beta  = reshape((0:49) * dy, 1, 1, 50);
f4    = 1 - exp(-(xo + 2 * pi * k) .* z);
f5    = exp(-2 * pi * kx .* alpha) .* exp(-2 * pi * ky(:) .* beta);
L     = (f1 * f2 * f3) * sum(f4 .* f5, [3,4]);

Or for Matlab <= R2016b:

z     = reshape(z, [1, 1, 50, 50]);
alpha = reshape((0:49) * dx, 1, 1, 1, 50);
beta  = reshape((0:49) * dy, 1, 1, 50);
f4    = 1 - exp(-(xo + 2 * pi * k) .* z);
f5    = bsxfun(@times, exp(-2 * pi * kx .* alpha), ...
                       exp(-2 * pi * ky(:) .* beta));
L     = (f1 * f2 * f3) * sum(sum(f4 .* f5, 3), 4);

Check, if the bsxfun version is faster or slower even in Matlab >= R2016b. My best timing was 0.056 seconds. 1100 times faster than the original version!

Matt, nice!

댓글을 달려면 로그인하십시오.

Answer 2

Jan 2019년 5월 29일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/464423-is-it-possible-to-speed-up-this-loop-or-avoid-from-using#answer_377040

편집: Jan 2019년 5월 29일

MATLAB Online에서 열기

Start with vectorizing the innermost loop:

dx = 1; dy = 1; xo = 5;
f1 = 5;
f2 = 10;
f3 = 15;
kx = 1:50;
ky = 1:50;
k  = rand(50,50);
z = rand(50,50);
tic
L = zeros(50, 50);  % Pre-allocate
for j = 1:50
    for i = 1:50
        total = 0;
        for m = 1:50
            beta = (m-1) * dy;   
            % for n = 1:50
                alpha = ((1:50)-1)*dx;
                f4    = (1-exp(-(xo + 2.*pi.*k(j,i)) .* z(m, 1:50)));
                f5    = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
                total = total + sum(f4 .* f5);
            % end
        end
        L(j, i) = f1 * f2 * f3 * total;
    end
end
toc

This reduces the runtime from 62 to 2 seconds already. Do you see, how it works? I just moved the index vector from for n = 1:50 inside the code by replacing all "n" by "1:50". I've moved the constant f1 * f2 * f3 out of the loop - actually it could be multiplied outside all loops also. In addition a sum() is needed around f4 .* f5.

Now do the same for for m also:

tic
L = zeros(50, 50);  % Pre-allocate
for j = 1:50
    for i = 1:50
        total = 0;
        % for m = 1:50
            beta = ((1:50).'-1) * dy;   
            % for n = 1:50
                alpha = ((1:50)-1)*dx;
                f4    = (1-exp(-(xo + 2.*pi.*k(j,i)) .* z(1:50, 1:50)));
                f5    = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
                total = total + sum(sum(f4 .* f5));
            % end
        % end
        L(j, i) = f1 * f2 * f3 * total;
    end
end
toc

Now the index vector for beta must be transposed and for f5 the auto-expanding is applied.

This needs 0.22 seconds. With some simplifications and moving the definitions of alpha and beta before the loops:

L     = zeros(50, 50);  % Pre-allocate
alpha = (0:49) * dx;
beta  = (0:49).' * dy;
for j = 1:50
    for i = 1:50
        f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
        f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
        % For Matlab < R2018b without auto-expanding:
        %f5 = exp(-2 * pi * (bsxfun(@plus, kx(i) * alpha, ky(j) .* beta)));
        L(j, i) = sum(sum(f4 .* f5));
    end
end
L = f1 * f2 * f3 * L;

0.17 seconds.

Now it's time to use some maths: exp(a + b) = exp(a) .* exp(b). Because the exp() function is very expensive, it is much cheaper to evaluate it for the two vectors instead of the matrix:

L     = zeros(50, 50);  % Pre-allocate
alpha = (0:49) * dx;
beta  = (0:49).' * dy;
for j = 1:50
    for i = 1:50
        f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
        % f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
        f5   = exp(-2 * pi * kx(i) * alpha) .* exp(-2 * pi * ky(j) .* beta);
        L(j, i) = sum(sum(f4 .* f5));
    end
end
L = f1 * f2 * f3 * L;

0.14 seconds. But the 2nd part of the f5 calculations does not depend on the inner loop, so move it outside:

L     = zeros(50, 50);  % Pre-allocate
alpha = (0:49) * dx;
beta  = (0:49).' * dy;
for j = 1:50
    f5_j = exp(-2 * pi * ky(j) .* beta);
    for i = 1:50
        f4 = (1 - exp(-(xo + 2 * pi * k(j,i)) .* z));
        % f5 = exp(-2 * pi * (kx(i) * alpha + ky(j) .* beta));
        f5   = exp(-2 * pi * kx(i) * alpha) .* f5_j;
        L(j, i) = sum(sum(f4 .* f5));
    end
end
L = f1 * f2 * f3 * L;

0.12 seconds. I hoped it is faster.

Now the next performance boost is to vectorize the outer loops also: See Matt J's answer.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

e_oksum 2019년 5월 29일

your an expert...wow and wow..

댓글을 달려면 로그인하십시오.

Is it possible to speed up this loop or avoid from using?

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Is it possible to speed up this loop or avoid from using?

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기