Extremely slow nested for loop
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi,
I have number of functions that I use to find roots to Neumann problem. One of the functions contains nested for loops which runs forever.
Please see below:
The function consists of three nested for loops as:
function [Lsn] = coefFourier_ll_log(k,ll)
N=length(ll);
ll_n=coefFourier1(k*ll);
lambda=zeros(1,N-1); lambda(N/2+1:N-1)=-0.5./(1:N/2-1);
lambda(1:N/2-1)=fliplr(lambda(N/2+1:N-1));
M=N/2-1; Lsn=zeros(N-1,N-1);
for s=-M:M
for n=-M:M
rmin=max(-M,max(-M-n,s-M));
rmax=min(s+M,min(M-n,M));
for r=rmin:rmax
Lsn(s+M+1,n+M+1)=Lsn(s+M+1,n+M+1)+ll_n(s-r+M+1)*ll_n(n+r+M+1)*lambda(r+M+1);
end
end
end
Lsn=0.5*Lsn;
This function receives k(double), ll(1D array) and returns Lsn(Matrix). For the given time profile, the system size is 512, which means M is 255.
How I can vectorize it to increase its speed? I need to run this group of functions for couple of hundreds times and eventually with even bigger system size.
Thank you for your time.
댓글 수: 0
채택된 답변
Jan
2018년 3월 21일
Start with a slightly cleaned version:
function Lsn = coefFourier_ll_log(k,ll)
N = length(ll);
ll_n = coefFourier1(k*ll);
lambda = zeros(1, N-1);
lambda(N/2+1:N-1) = -0.5 ./ (1:N/2-1);
lambda(N/2-1:-1:1) = lambda(N/2+1:N-1); % Instead of FLIPLR
M = N/2-1;
M1 = M + 1;
Lsn = zeros(N-1, N-1);
for s = -M:M
for n = -M:M
rmin = max(-M, max(-M-n, s-M));
rmax = min(s+M, min(M-n, M));
Lsn(s+M1, n+M1) = Lsn(s+M1, n+M1) + ...
sum(ll_n(s+M1-rmin:s+M1-rmax) .* ...
ll_n(n+M1+rmin:n+M1+rmax) .* ...
lambda(M1+rmin:M1+rmax));
end
end
Lsn = 0.5 * Lsn;
Here the inner loop from rmin:rmax way vectorized. Please post the timing before and after the changes measure by tic/toc. The profiler disables the JIT acceleration, such that it is less useful. Providing some test input of a relevant size (e.g. created by rand) would allow us to run the tests by our own.
댓글 수: 5
Jan
2018년 3월 22일
While rmin and rmax have a specific structure, which could be exploited, the actual indices are e.g. s-rmin+M1 and change with each iteration. Therefore I do not see a way to use the pattern in the outer loops.
It would be useful if you explain, what the operation does. Maybe it is much faster to calculate it by conv or conv2.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!