Why is sum considerably slower that adding each individual element together, when using large loops?
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
I have some code that includes loops that run at least 1e7 times (often up to 1e9 times) and I am trying to improve its performance.
My slowest step, somewhat surprisingly, is indexing into a variable and summing those numbers. I need to do this index and sum every time within the loop as the variable changes within the loop too.
The code below is a simplification of the problem.
Method 1
a = 1:16;
idx = [1 6 11 16];
N = 1e7;
A0 = zeros(1,N);
tic
for k = 1:N
A = sum(a(idx));
a = a + 0.001;
A0(k) = A;
end
toc
Elapsed time is 5.948268 seconds.
Method 2
If I instead replace the first line of the loop with a nested loop that does an iterative addition, then I get a 10x improvement in performance. However, I am not too keen on using another loop, as in my actual code, I have another couple of loops too, so my code will begin to get even messier.
a = 1:16;
B0 = zeros(1,N);
tic
for k = 1:N
B = 0;
for j = 1:numel(idx)
B = B + a(idx(j));
end
a = a + 0.001;
B0(k) = B;
end
toc
Elapsed time is 0.659663 seconds.
Method 3
I get a smaller further improvement if I spell out the individual components of a instead and remove the loop.
a = 1:16;
C0 = zeros(1,N);
tic
for k = 1:N
C = a(1)+a(6)+a(11)+a(16);
a = a + 0.001;
C0(k) = C;
end
toc
Elapsed time is 0.456529 seconds.
However, I need to keep the code as general as impossible - the index idx is dependent on the initial input, so this 3rd option is not possible.
My question is, is there a faster way than method 1, but that is also cleaner / more concise that method 2?
Curiously, the differences in time between Methods 1 and Methods 2 and 3 get progressively worse as N is increased.
채택된 답변
Yes, unfortunately, vector indexing is quite an expensive operation. If idx is constant, as in your example, than you should do as below. If idx is not constant, then the way to optimize the code will depend on how idx varies.
a = 1:16;
idx = [1 6 11 16];
N = 1e7;
A0 = zeros(1,N);
b=a(idx);
tic
for k = 1:N
A = sum(b);
b = b + 0.001;
A0(k) = A;
end
toc
Elapsed time is 0.532537 seconds.
a(idx)=b;
댓글 수: 12
Jack Palmer
2023년 8월 31일
편집: Jack Palmer
2023년 8월 31일
Thanks for the quick response!
idx is a constant, however I think the above code runs into an issue, because a needs to change within the loop and using b = a(idx) outside the loop doesn't allow for a to change.
Sorry. I fixed it.
Thanks!
I think this works in the examples I've given, but I've realised my actual code uses a transformation on a that isn't preserved by the same transformation on b.
For example
a = a([end 1:end-1])
I don't think the above would work in this case.
For example a = a([end 1:end-1])
In that case, the operation is equivalent to a circular convolution. You would never need a loop for that. Matlab has built-in routines for convolution.
To illustrate my previous point:
a = rand(1,2e4);
idx = [1 6 11 16];
N = numel(a);
A0 = zeros(1,N);
%Loop method
tic
for k = 1:N
A = sum(a(idx));
a = a([end, 1:end-1]);
A0(k) = A;
end
toc
Elapsed time is 2.480079 seconds.
%Loop-free method
tic;
b=zeros(size(a));
b(idx)=1;
A1=ifft(conj(fft(a)).*fft(b) ,'symmetric');
toc
Elapsed time is 0.024284 seconds.
difference = norm(A1-A0,inf) %verify they're the same
difference = 2.2204e-15
Thank you Matt!
I think that was a poor example on my part, sorry. I was trying to keep it simplified... Below is the loop I have which I am trying to speed up. I can't see a way of doing this without a loop at the moment.
I've used rand to generate initial variables of the same size as the ones in my full script, so rand will probably cause the signal to decay to 0 or inf before n = N, but in reality in my script this is not the case.
I'm using A to 'normalise' B which then alters A through some propagator C. D then converts the vector A into a single number that is the signal. Everything outside of the loop can be treated as a constant.
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
signal = zeros(N,1);
tic
for n = 1:N
Aidx = A(idx)
Asum = sum(Aidx);
% Asum = A(1) + A(6) + A(11) + A(16); % Elapsed time = 3 seconds
B = B0*Asum;
A = B + C*(A-B);
signal(n) = D*A;
end
toc
The above code timed out above 55 seconds, whereas replacing it with the commeted out line makes it take 3 seconds. If you can spot a way of removing the loop that would solve all my issues.
Thanks!
May be a reasonable work around (last method)
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
signal = zeros(N,1);
A0 = A;
A=A0;
tic
for n = 1:N
Aidx = A(idx);
Asum = sum(Aidx);
B = B0*Asum;
A = B + C*(A-B);
signal(n) = D*A;
end
toc
Elapsed time is 10.773281 seconds.
A=A0;
tic
for n = 1:N
Asum = A(1) + A(6) + A(11) + A(16);
B = B0*Asum;
A = B + C*(A-B);
signal(n) = D*A;
end
toc
Elapsed time is 3.163106 seconds.
A=A0;
tic
Sidx = zeros(1,size(A,1));
Sidx(idx) = 1;
for n = 1:N
B = B0*(Sidx*A);
A = B + C*(A-B);
signal(n) = D*A;
end
toc
Elapsed time is 3.197886 seconds.
Method #2 below doesn't eliminate looping, but has about a 2 sec. run time.
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
[signal0, signal1] = deal(zeros(N,1));
A0 = A;
%Method 1
A=A0;
tic
for n = 1:N
Aidx = A(idx);
Asum = sum(Aidx);
B = B0*Asum;
A = B + C*(A-B);
signal0(n) = D*A;
end
toc
Elapsed time is 15.534546 seconds.
%Method 2
tic;
m=numel(A);
s=accumarray(idx(:),1,[m,1]);
Q=B0*s.';
Q=Q+C*(eye(m)-Q);
for n = 1:N
D=D*Q;
signal1(n) = D*A0;
end
toc
Elapsed time is 1.983712 seconds.
difference=norm(signal0-signal1,inf) %They're the same
difference = 2.2204e-16
Thank you Matt.
That 2nd method has worked wonders.
Good observation that there is a matrix power in this loop.
Unless Q has exclusively unity eigen values, the thing either dies out to 0 or explodes to infinity very quickly. No need to iterate 1e7 iteration to observe that.
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
signal = zeros(N,1);
%Method 2 bis
tic;
m=numel(A);
s=accumarray(idx(:),1,[m,1]);
Q=B0*s.';
Q=Q+C*(eye(m)-Q);
A0 = A;
for n = 1:N
D=D*Q;
if all(D==0)
break
end
if all(isinf(D))
signal1(n:end) = sign(sigal1(n-1))*Inf;
break
end
signal1(n) = D*A0;
end
toc
Elapsed time is 0.011317 seconds.
signal1(n:end) = sign(signal1(n-1))*Inf;
In my previous code, some extreme floating rouding around realin and reamax can defeat the test.
This modified code looks more robust. It must be something related to denormalization numbers that I'm sure the correct to deal with https://www.mathworks.com/help/fixedpoint/ug/floating-point-numbers.html#f32213
I also do the matrix power differently for the sake of variation
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
A0 = A;
%Method 3
tic;
signal1 = zeros(N,1);
m=numel(A);
s=accumarray(idx(:),1,[m,1]);
Q=B0*s.';
Q=Q+C*(eye(m)-Q);
d = eig(Q,'vector');
d = abs(d);
check0 = all(d < 1);
checkinf = any(d > 1);
QnA = A0;
for n = 1:N
QnA=Q*QnA;
DQnA = D*QnA;
signal1(n) = DQnA;
if check0 && abs(DQnA) < realmin
break
end
if checkinf && abs(DQnA) > realmax
signal1(n+1:end) = sign(DQnA)*Inf;
break
end
end
toc
Elapsed time is 0.014308 seconds.
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
