필터 지우기
필터 지우기

explanation of mysterious timing results?

조회 수: 1 (최근 30일)
Jim
Jim 2013년 8월 16일
In some legacy code, there's a pair of for-loops, one nested in the other. I created two alternate versions, one that tries to optimize by reducing the number of inner loop operations, and one that's fully vectorized. I wrote a timing routine that iterates over all three versions 10,000 times:
nIter = 10000;
tm = zeros(1,3);
for itr = 1:nIter
% setup variables for all three versions
NM=18; NM1=NM+1; NM3=NM+3;NM4=NM+4;
T1=1; T2=2; T3=3;
i=NM3 : -1 : 1;
iv(i) = i .* (i-1) / 2;
[a b] = deal(zeros(1, NM3*NM4/2));
% V1: original code
tic;
for M = 0 : NM
M1 = M+1;
RM = M;
for N = M1 : NM
RN = N;
LL = iv(N+1) + M + 1;
a(LL) = -sqrt(((T2*RN+T1) * (T2*RN-T1)) / ...
((RN-RM) * (RN+RM)));
b(LL) = sqrt(((T2*RN+T1) * (RN+RM-T1) * (RN-RM-T1)) / ...
((T2*RN-T3) * (RN+RM) * (RN-RM) ) );
end
end
tm(1) = tm(1) + toc;
% V2: reduced inner loop ops
tic;
for M = 0 : NM
M1 = M+1;
for N = M1 : NM
rs = N+M;
rd = N-M;
rp = rs*rd;
rp1 = (rs-1) * (rd-1);
t2n = 2*N;
t2n1 = t2n+1;
LL = iv(N+1) + M1;
a(LL) = -sqrt(t2n1 * (t2n-1) / rp);
b(LL) = sqrt(t2n1 * rp1 / ((t2n-3) * rp));
end
end
tm(2) = tm(2) + toc;
% V3: vectorized code
tic;
sRow = 0 : NM;
s = sRow(ones(NM1,1),:);
t = s'; % same as [s,t] = meshgrid(0:NM)
ltMask = s < t; % mask for array lower triangle below main diag
rs = s + t;
rs = rs(ltMask);
rd = t - s;
rd = rd(ltMask);
rp = rs .* rd;
rp1 = (rs-1) .* (rd-1);
tCol = t(ltMask);
t2n = 2*tCol;
t2n1 = t2n+1;
i = iv(tCol+1) + s(ltMask)' + 1;
a(i) = -sqrt(t2n1 .* (t2n-1) ./ rp);
b(i) = sqrt(t2n1 .* rp1 ./ ((t2n-3) .* rp));
tm(3) = tm(3) + toc;
end % timing loop
descr = {'V1:orig';'V2:reduced ops';'V3:vectorized'};
for i = 1:3
disp([descr{i} ': ' num2str(tm(i)) ' sec for ' num2str(nIter) ...
' iterations']);
end
disp('Normalized times:');
[s loc] = sort(tm);
disp([repmat(char(9),3,1) char(descr(loc)) repmat(' ',3,1) ...
num2str((s./s(1))', '%0.2f')]);
The vectorized code is currently fastest, by about 2.5 times vs the original. What's very surprising is that the reduced operations version (V2) is reported to be way slower than the other two (by a factor of ~140 vs the vectorized). When I run the Profiler (after disabling all but one CPU), the main culprits are shown to be the two sqrt expressions, with the rp and rp1 computations also eating a good share of time.
The changes I made to get V2 from V1 were: # elimination of two unnecessary variables (RM and RN) # replacement of T1, T2, and T3 with their constant values # addition of some new temporary scalars (rs, rd, rp, rp1, t2n, & t2n1) to allow reduction of the floating-point operation count in the inner loop. (The number of multiplications dropped from 10 to 5, and additions/subractions from 12 to 7).
I'm aware there's some penalty for creating the new scalars, but a factor of 140 seems extreme. Can anyone suggest why these changes have such a high time cost?
  댓글 수: 2
kjetil87
kjetil87 2013년 8월 16일
could you look over your code again so that it is runable with a copy paste?
e.g iv and IV , neither of them are declared as variables before you use them and i suspect maybe you meant for them to be the same?
Also there is some error with the
disp([descr{i} ': ' num2str(tm(i) ' sec for ' num2str(nIter) ...
' iterations']);
Jim
Jim 2013년 8월 19일
Sorry about the typing errors, but they should be corrected now if you want to give it another go.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Debugging and Analysis에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by