Hi,
Is there a way to speed this up even more? I did what I could already: everything is precomputed, preallocated etc. and compiled to .mex.
First, just to give you an idea about sizes of continers:
bessels = ones(1201, 1201, 101); % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
maxN = 100;
levels = maxN + 1;
xElements = 1201;
Aj1 = complex(ones(101, 101);
Aj2 = Aj1;
Code:
parfor i = 1 : xElements
for j = 1 : xElements
umn = complex(zeros(levels, levels)); % cleaning
for n = 0:maxN
mm = 1;
for m = -n:2:n
nn = n + 1; % for indexing
if m < 0
umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
end
if m > 0
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
end
if m == 0
umn(nn, mm) = bessels(i, j, nn);
end
mm = mm + 1; % for indexing
end % m
end % n
beta1 = sum(sum(Aj1.*umn));
betaSumSq1(i, j) = abs(beta1).^2;
beta2 = sum(sum(Aj2.*umn));
betaSumSq2(i, j) = abs(beta2).^2;
end % j
end % i
Best regards, Alex

댓글 수: 1

Alex Kurek
Alex Kurek 2016년 8월 24일
편집: Alex Kurek 2016년 8월 24일
Nobody knows? So far everything I try is not speeding this up.
This if prof Profiler:

댓글을 달려면 로그인하십시오.

 채택된 답변

per isakson
per isakson 2016년 8월 24일
편집: per isakson 2016년 8월 24일

1 개 추천

Your code choked my computer. R2016a, no parallel toolbox, no mex. However, regarding the innermost loop
for m = -n:2:n
end
  • Move nn = n + 1; outside the loop. Helps a little bit (tic,toc).
  • Replace the three if-statements by a if-elseif-elseif-else-end. Again small effect. However, 30% faster when profiling.
  • Split the loop into two loops one with m<0 and one with m>0 and remove the if-statements. Note: odd and even n. Again small effect. However, when profiling the elapse time was cut in half.
Conclusion: The "JIT/Accelerator" don't need my help in these cases. However, the "JIT/Accelerator" is less effective together with the profiler.

댓글 수: 5

Alex Kurek
Alex Kurek 2016년 8월 24일
편집: Alex Kurek 2016년 8월 24일
Ok, now it looks like this:
parfor j = 1 : xElements
for i = 1 : xElements
umn = complex(zeros(levels, levels)); % cleaning
besselIJ = bessels(i, j, :);
for n = 0:maxN
nn = n + 1; % for indexing
currentBessel = besselIJ(nn);
mm = 1;
for m = -n:2:n
if m < 0
umn(nn, mm) = currentBessel * negMcontainer(i, j, abs(m));
elseif m == 0
umn(nn, mm) = currentBessel;
else
umn(nn, mm) = currentBessel * posMcontainer(i, j, m) ;
end
mm = mm + 1;
end % m
end % n
beta1 = sum(sum(Aj1.*umn));
betaSumSq1(i, j) = abs(beta1).^2;
beta2 = sum(sum(Aj2.*umn));
betaSumSq2(i, j) = abs(beta2).^2;
end % i
end % j
I have 2016a, mex and parallel and only moving nn = n+1 outside the loop really helped. What helped more, is changing the code co columnwise (first two lines).
One small question: if I split the loop into two to remove if-statements, haw can I efficiently solve the odd/even problem?
Best regards and thanx, Alex
"efficiently solve the odd/even problem" &nbsp
There might be a trade-off between number of lines of code and speed. Accept more lines of code.
Assumption: The order in which the elements of umn are calculated is not important.
I would start with this approach
for n = 0 : 2 : maxN
for m = - n : 2 : -2
end
umn( m equal to 0 )
for m = 2 : 2 : n
end
end
for n = 1 : 2 : maxN
for m = - n : 2 : -1
end
for m = 1 : 2 : n
end
end
Thanx again. I just coded it:
bessels = ones(1201, 1201, 101); % 1.09 GB
negMcontainer = ones(1201, 1201, 100);
posMcontainer = negMcontainer;
maxN = 100;
levels = maxN + 1;
xElements = 1201;
Aj1 = complex(ones(101, 101));
Aj2 = Aj1;
betaSumSq1 = zeros(xElements, xElements);
betaSumSq2 = betaSumSq1;
for j = 1 : xElements
umn = complex(zeros(levels, levels)); % cleaning
if rem(j, 25) == 0
disp(j);
end
for i = 1 : xElements
for n = 0 : 2 : maxN
nn = n + 1;
mm = 1;
for m = - n : 2 : -2
umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
mm = mm + 1;
end
umn(nn, mm) = bessels(i, j, nn);
for m = 2 : 2 : n
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m) ;
mm = mm + 1;
end
end
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
for m = - n : 2 : -1
umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
mm = mm + 1;
end
m = 1:2:n;
numOfEl = ceil(n/2);
umn(nn, mm:mm+numOfEl-1) = bessels(i, j, nn) * posMcontainer(i, j, m);
% for m = 1 : 2 : n
% umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
% mm = mm + 1;
% end
end
beta1 = sum(sum(Aj1.*umn));
betaSumSq1(i, j) = abs(beta1).^2;
beta2 = sum(sum(Aj2.*umn));
betaSumSq2(i, j) = abs(beta2).^2;
end % i
end % j
The speed in .mex and parfor is exectly the same, as before. But it opens a possibility of vectorization. I vectorized last loop (as pasted aboce) and it slowed down a bit. Have I done something wrong?
Best regards, Alex
per isakson
per isakson 2016년 8월 27일
편집: per isakson 2016년 8월 28일
I don't know what to say. I would have guessed that your vectorization would be faster than the loop. However, loops are much faster in current releases of Matlab than they were when we learned that loops to should always be avoided.
Lesson learned
  • It's difficult to be smarter than the "JIT/Acceletator"
  • The function, profile, cannot always be trusted
Thanx. It is possible to speed-up vectorized version by ~17% by doing this:
tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
numOfEl = ceil(n/2);
umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, 1:2:n);
end
end
end
toc % 1.059022 seconds
So m is not allocated in every iteration. But still, its 2x slower, than loops.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

제품

질문:

2016년 8월 22일

편집:

2016년 8월 28일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by