Is vectorized code always faster than loops? Any exceptions?

조회 수: 6 (최근 30일)
cr
cr 2011년 7월 27일
[EDIT: 20110727 09:35 CDT - reformat - WDR]
I have a critical chunk of a code that has six nested for-loops. I reduced the innermost three with vectorization and I see that the vectorized version (with exact same config of everything else and same computer) takes twice the run time. I ran each of them a few times and here are the results. Any light on understanding this behaviour is appreciated. Thanks.
% fem_nought is file with loops. Fem_optimised is one with the vectorized equivalent of the innermost 3 loops.
>>fem_optimized
Elapsed time is 10.073242 seconds.
>> fem_optimized
Elapsed time is 9.588474 seconds.
>> fem_optimized
Elapsed time is 9.872822 seconds.
>> fem_nought
Elapsed time is 4.047568 seconds.
>> fem_nought
Elapsed time is 3.678311 seconds.
>> fem_nought
Elapsed time is 3.672811 seconds.
Trimmed versions of both the codes are below: (decl of a lot of variables are removed)
LOOPS version:
for k=1:nel
for ri=1:8
for si=1:8
for mn=1:4
for nm=1:4
for km=1:4
r=.5*(a*p(mn)+r1+r2);
s=.5*(b*p(nm)+s3+s2);
t=.5*(c*p(km)+t1+t5);
a1=-.02*s+0.5*r*(1-r^2)+.05*t;
a2=-.05*t-.5*s;
%...............SHAPE FUNCTUION..........................
N(1)=((r-r2)/(r1-r2))*((s-s4)/(s1-s4))*((t-t5)/(t1-t5));
N(2)=((r-r1)/(r2-r1))*((s-s3)/(s2-s3))*((t-t6)/(t2-t6));
N(3)=((r-r4)/(r3-r4))*((s-s2)/(s3-s2))*((t-t7)/(t3-t7));
N(4)=((r-r3)/(r4-r3))*((s-s1)/(s4-s1))*((t-t8)/(t4-t8));
N(5)=((r-r6)/(r5-r6))*((s-s8)/(s5-s8))*((t-t1)/(t5-t1));
N(6)=((r-r5)/(r6-r5))*((s-s7)/(s6-s7))*((t-t2)/(t6-t2));
N(7)=((r-r8)/(r7-r8))*((s-s6)/(s7-s6))*((t-t3)/(t7-t3));
N(8)=((r-r7)/(r8-r7))*((s-s5)/(s8-s5))*((t-t4)/(t8-t4));
Nr(1)=(1/(r1-r2))*((s-s4)/(s1-s4))*((t-t5)/(t1-t5));
Nr(2)=(1/(r2-r1))*((s-s3)/(s2-s3))*((t-t6)/(t2-t6));
Nr(3)=(1/(r3-r4))*((s-s2)/(s3-s2))*((t-t7)/(t3-t7));
Nr(4)=(1/(r4-r3))*((s-s1)/(s4-s1))*((t-t8)/(t4-t8));
Nr(5)=(1/(r5-r6))*((s-s8)/(s5-s8))*((t-t1)/(t5-t1));
Nr(6)=(1/(r6-r5))*((s-s7)/(s6-s7))*((t-t2)/(t6-t2));
Nr(7)=(1/(r7-r8))*((s-s6)/(s7-s6))*((t-t3)/(t7-t3));
Nr(8)=(1/(r8-r7))*((s-s5)/(s8-s5))*((t-t4)/(t8-t4));
Ns(1)=((r-r2)/(r1-r2))*(1/(s1-s4))*((t-t5)/(t1-t5));
Ns(2)=((r-r1)/(r2-r1))*(1/(s2-s3))*((t-t6)/(t2-t6));
Ns(3)=((r-r4)/(r3-r4))*(1/(s3-s2))*((t-t7)/(t3-t7));
Ns(4)=((r-r3)/(r4-r3))*(1/(s4-s1))*((t-t8)/(t4-t8));
Ns(5)=((r-r6)/(r5-r6))*(1/(s5-s8))*((t-t1)/(t5-t1));
Ns(6)=((r-r5)/(r6-r5))*(1/(s6-s7))*((t-t2)/(t6-t2));
Ns(7)=((r-r8)/(r7-r8))*(1/(s7-s6))*((t-t3)/(t7-t3));
Ns(8)=((r-r7)/(r8-r7))*(1/(s8-s5))*((t-t4)/(t8-t4));
Nt(1)=((r-r2)/(r1-r2))*((s-s4)/(s1-s4))*(1/(t1-t5));
Nt(2)=((r-r1)/(r2-r1))*((s-s3)/(s2-s3))*(1/(t2-t6));
Nt(3)=((r-r4)/(r3-r4))*((s-s2)/(s3-s2))*(1/(t3-t7));
Nt(4)=((r-r3)/(r4-r3))*((s-s1)/(s4-s1))*(1/(t4-t8));
Nt(5)=((r-r6)/(r5-r6))*((s-s8)/(s5-s8))*(1/(t5-t1));
Nt(6)=((r-r5)/(r6-r5))*((s-s7)/(s6-s7))*(1/(t6-t2));
Nt(7)=((r-r8)/(r7-r8))*((s-s6)/(s7-s6))*(1/(t7-t3));
Nt(8)=((r-r7)/(r8-r7))*((s-s5)/(s8-s5))*(1/(t8-t4));
p1(ri,si,k)=a1*N(ri)*Ns(si)*w(mn)*w(nm)*w(km)*.125*a*b*c;
p2(ri,si,k)=a2*N(ri)*Nt(si)*w(mn)*w(nm)*w(km)*.125*a*b*c;
%Elemental Stiffness Matrix......................
ke(ri,si,k) = ke(ri,si,k) + p1(ri,si,k) + p2(ri,si,k);
end
end
end
end
end
end
VECTORIZED VERSION
for k=1:nel
r=.5*(a*p(mn)+r1+r2);
s=.5*(b*p(nm)+s3+s2);
t=.5*(c*p(km)+t1+t5);
Nr = zeros(4,4,4,8);
N = zeros(4,4,4,8);
Ns = zeros(4,4,4,8);
Nt = zeros(4,4,4,8);
for ri=1:8
for si=1:8
%...............SHAPE FUNCTUION..........................
Nr(:,:,:,1)=(1/(r1-r2))*((s-s4)/(s1-s4)).*((t-t5)/(t1-t5));
Nr(:,:,:,2)=(1/(r2-r1))*((s-s3)/(s2-s3)).*((t-t6)/(t2-t6));
Nr(:,:,:,3)=(1/(r3-r4))*((s-s2)/(s3-s2)).*((t-t7)/(t3-t7));
Nr(:,:,:,4)=(1/(r4-r3))*((s-s1)/(s4-s1)).*((t-t8)/(t4-t8));
Nr(:,:,:,5)=(1/(r5-r6))*((s-s8)/(s5-s8)).*((t-t1)/(t5-t1));
Nr(:,:,:,6)=(1/(r6-r5))*((s-s7)/(s6-s7)).*((t-t2)/(t6-t2));
Nr(:,:,:,7)=(1/(r7-r8))*((s-s6)/(s7-s6)).*((t-t3)/(t7-t3));
Nr(:,:,:,8)=(1/(r8-r7))*((s-s5)/(s8-s5)).*((t-t4)/(t8-t4));
N(:,:,:,1) = (r-r2).*Nr(:,:,:,1);
N(:,:,:,2) = (r-r1).*Nr(:,:,:,2);
N(:,:,:,3) = (r-r4).*Nr(:,:,:,3);
N(:,:,:,4) = (r-r3).*Nr(:,:,:,4);
N(:,:,:,5) = (r-r6).*Nr(:,:,:,5);
N(:,:,:,6) = (r-r5).*Nr(:,:,:,6);
N(:,:,:,7) = (r-r8).*Nr(:,:,:,7);
N(:,:,:,8) = (r-r7).*Nr(:,:,:,8);
Ns(:,:,:,1) = N(:,:,:,1)./(s-s4);
Ns(:,:,:,2) = N(:,:,:,2)./(s-s3);
Ns(:,:,:,3) = N(:,:,:,3)./(s-s2);
Ns(:,:,:,4) = N(:,:,:,4)./(s-s1);
Ns(:,:,:,5) = N(:,:,:,5)./(s-s8);
Ns(:,:,:,6) = N(:,:,:,6)./(s-s7);
Ns(:,:,:,7) = N(:,:,:,7)./(s-s6);
Ns(:,:,:,8) = N(:,:,:,8)./(s-s5);
Nt(:,:,:,1) = N(:,:,:,1)./(t-t5);
Nt(:,:,:,2) = N(:,:,:,2)./(t-t6);
Nt(:,:,:,3) = N(:,:,:,3)./(t-t7);
Nt(:,:,:,4) = N(:,:,:,4)./(t-t8);
Nt(:,:,:,5) = N(:,:,:,5)./(t-t1);
Nt(:,:,:,6) = N(:,:,:,6)./(t-t2);
Nt(:,:,:,7) = N(:,:,:,7)./(t-t3);
Nt(:,:,:,8) = N(:,:,:,8)./(t-t4);
kem = .125*a*b*c * N(:,:,:,ri).*w(mn).*w(nm).*w(km) ...
.* ( (-.02*s+0.5*r.*(1-r.^2)+.05*t).*Ns(:,:,:,si) ...
+ (-.05*t-.5*s).*Nt(:,:,:,si));
ke(ri,si,k) = sum(kem(:));
%
end
end
end

채택된 답변

Jan
Jan 2011년 7월 27일
No, vectorized code is not always faster. If the vectorization needs the creation of large temporary arrays, loops are often faster. The allocation of memory is very expensive, because it can cause a garbage collection or even disk swapping.
BTW: Because Nr, N, Ns and Nt are completely overwritten in each iteration. Therefore it is enough and more efficient to allocate them once before the loops.
  댓글 수: 1
cr
cr 2011년 7월 27일
Thanks for your BTW comment. I overlooked that N* was unnecessarily inside the inner loops.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Daniel Shub
Daniel Shub 2011년 7월 27일
I am not sure if vectorization is always faster, but loops are not as expensive as they used to be, thanks to the JIT accelerator. I would guess there might be examples were loops are faster, but I cannot think of one off the top of my head.
  댓글 수: 2
cr
cr 2011년 7월 27일
Can you please throw some light on JIT and since when it existed?
Daniel Shub
Daniel Shub 2011년 7월 27일
I am not the best person to answer that. I would suggest asking it as a new question to get a good answer.

댓글을 달려면 로그인하십시오.


cr
cr 2011년 7월 27일
See my comment accepted answer by Jan Simon. The code I pasted above ran on 4 machines - 3 pcs (R2010a & R2007b) and a mac(R2010a). Two PCs (one R2010a & one R2007b) and the mac took longer with vectorized code (9sec vs 5sec). One PC (R2007b), strangely though, consistently took 5s for vectorized code and 29s for loops. I'm at wits end trying to interpret this now.
With the correction as in the comment mentioned above, the code takes just 1s.

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by