Is vectorized code always faster than loops? Any exceptions?
조회 수: 6 (최근 30일)
이전 댓글 표시
[EDIT: 20110727 09:35 CDT - reformat - WDR]
I have a critical chunk of a code that has six nested for-loops. I reduced the innermost three with vectorization and I see that the vectorized version (with exact same config of everything else and same computer) takes twice the run time. I ran each of them a few times and here are the results. Any light on understanding this behaviour is appreciated. Thanks.
% fem_nought is file with loops. Fem_optimised is one with the vectorized equivalent of the innermost 3 loops.
>>fem_optimized
Elapsed time is 10.073242 seconds.
>> fem_optimized
Elapsed time is 9.588474 seconds.
>> fem_optimized
Elapsed time is 9.872822 seconds.
>> fem_nought
Elapsed time is 4.047568 seconds.
>> fem_nought
Elapsed time is 3.678311 seconds.
>> fem_nought
Elapsed time is 3.672811 seconds.
Trimmed versions of both the codes are below: (decl of a lot of variables are removed)
LOOPS version:
for k=1:nel
for ri=1:8
for si=1:8
for mn=1:4
for nm=1:4
for km=1:4
r=.5*(a*p(mn)+r1+r2);
s=.5*(b*p(nm)+s3+s2);
t=.5*(c*p(km)+t1+t5);
a1=-.02*s+0.5*r*(1-r^2)+.05*t;
a2=-.05*t-.5*s;
%...............SHAPE FUNCTUION..........................
N(1)=((r-r2)/(r1-r2))*((s-s4)/(s1-s4))*((t-t5)/(t1-t5));
N(2)=((r-r1)/(r2-r1))*((s-s3)/(s2-s3))*((t-t6)/(t2-t6));
N(3)=((r-r4)/(r3-r4))*((s-s2)/(s3-s2))*((t-t7)/(t3-t7));
N(4)=((r-r3)/(r4-r3))*((s-s1)/(s4-s1))*((t-t8)/(t4-t8));
N(5)=((r-r6)/(r5-r6))*((s-s8)/(s5-s8))*((t-t1)/(t5-t1));
N(6)=((r-r5)/(r6-r5))*((s-s7)/(s6-s7))*((t-t2)/(t6-t2));
N(7)=((r-r8)/(r7-r8))*((s-s6)/(s7-s6))*((t-t3)/(t7-t3));
N(8)=((r-r7)/(r8-r7))*((s-s5)/(s8-s5))*((t-t4)/(t8-t4));
Nr(1)=(1/(r1-r2))*((s-s4)/(s1-s4))*((t-t5)/(t1-t5));
Nr(2)=(1/(r2-r1))*((s-s3)/(s2-s3))*((t-t6)/(t2-t6));
Nr(3)=(1/(r3-r4))*((s-s2)/(s3-s2))*((t-t7)/(t3-t7));
Nr(4)=(1/(r4-r3))*((s-s1)/(s4-s1))*((t-t8)/(t4-t8));
Nr(5)=(1/(r5-r6))*((s-s8)/(s5-s8))*((t-t1)/(t5-t1));
Nr(6)=(1/(r6-r5))*((s-s7)/(s6-s7))*((t-t2)/(t6-t2));
Nr(7)=(1/(r7-r8))*((s-s6)/(s7-s6))*((t-t3)/(t7-t3));
Nr(8)=(1/(r8-r7))*((s-s5)/(s8-s5))*((t-t4)/(t8-t4));
Ns(1)=((r-r2)/(r1-r2))*(1/(s1-s4))*((t-t5)/(t1-t5));
Ns(2)=((r-r1)/(r2-r1))*(1/(s2-s3))*((t-t6)/(t2-t6));
Ns(3)=((r-r4)/(r3-r4))*(1/(s3-s2))*((t-t7)/(t3-t7));
Ns(4)=((r-r3)/(r4-r3))*(1/(s4-s1))*((t-t8)/(t4-t8));
Ns(5)=((r-r6)/(r5-r6))*(1/(s5-s8))*((t-t1)/(t5-t1));
Ns(6)=((r-r5)/(r6-r5))*(1/(s6-s7))*((t-t2)/(t6-t2));
Ns(7)=((r-r8)/(r7-r8))*(1/(s7-s6))*((t-t3)/(t7-t3));
Ns(8)=((r-r7)/(r8-r7))*(1/(s8-s5))*((t-t4)/(t8-t4));
Nt(1)=((r-r2)/(r1-r2))*((s-s4)/(s1-s4))*(1/(t1-t5));
Nt(2)=((r-r1)/(r2-r1))*((s-s3)/(s2-s3))*(1/(t2-t6));
Nt(3)=((r-r4)/(r3-r4))*((s-s2)/(s3-s2))*(1/(t3-t7));
Nt(4)=((r-r3)/(r4-r3))*((s-s1)/(s4-s1))*(1/(t4-t8));
Nt(5)=((r-r6)/(r5-r6))*((s-s8)/(s5-s8))*(1/(t5-t1));
Nt(6)=((r-r5)/(r6-r5))*((s-s7)/(s6-s7))*(1/(t6-t2));
Nt(7)=((r-r8)/(r7-r8))*((s-s6)/(s7-s6))*(1/(t7-t3));
Nt(8)=((r-r7)/(r8-r7))*((s-s5)/(s8-s5))*(1/(t8-t4));
p1(ri,si,k)=a1*N(ri)*Ns(si)*w(mn)*w(nm)*w(km)*.125*a*b*c;
p2(ri,si,k)=a2*N(ri)*Nt(si)*w(mn)*w(nm)*w(km)*.125*a*b*c;
%Elemental Stiffness Matrix......................
ke(ri,si,k) = ke(ri,si,k) + p1(ri,si,k) + p2(ri,si,k);
end
end
end
end
end
end
VECTORIZED VERSION
for k=1:nel
r=.5*(a*p(mn)+r1+r2);
s=.5*(b*p(nm)+s3+s2);
t=.5*(c*p(km)+t1+t5);
Nr = zeros(4,4,4,8);
N = zeros(4,4,4,8);
Ns = zeros(4,4,4,8);
Nt = zeros(4,4,4,8);
for ri=1:8
for si=1:8
%...............SHAPE FUNCTUION..........................
Nr(:,:,:,1)=(1/(r1-r2))*((s-s4)/(s1-s4)).*((t-t5)/(t1-t5));
Nr(:,:,:,2)=(1/(r2-r1))*((s-s3)/(s2-s3)).*((t-t6)/(t2-t6));
Nr(:,:,:,3)=(1/(r3-r4))*((s-s2)/(s3-s2)).*((t-t7)/(t3-t7));
Nr(:,:,:,4)=(1/(r4-r3))*((s-s1)/(s4-s1)).*((t-t8)/(t4-t8));
Nr(:,:,:,5)=(1/(r5-r6))*((s-s8)/(s5-s8)).*((t-t1)/(t5-t1));
Nr(:,:,:,6)=(1/(r6-r5))*((s-s7)/(s6-s7)).*((t-t2)/(t6-t2));
Nr(:,:,:,7)=(1/(r7-r8))*((s-s6)/(s7-s6)).*((t-t3)/(t7-t3));
Nr(:,:,:,8)=(1/(r8-r7))*((s-s5)/(s8-s5)).*((t-t4)/(t8-t4));
N(:,:,:,1) = (r-r2).*Nr(:,:,:,1);
N(:,:,:,2) = (r-r1).*Nr(:,:,:,2);
N(:,:,:,3) = (r-r4).*Nr(:,:,:,3);
N(:,:,:,4) = (r-r3).*Nr(:,:,:,4);
N(:,:,:,5) = (r-r6).*Nr(:,:,:,5);
N(:,:,:,6) = (r-r5).*Nr(:,:,:,6);
N(:,:,:,7) = (r-r8).*Nr(:,:,:,7);
N(:,:,:,8) = (r-r7).*Nr(:,:,:,8);
Ns(:,:,:,1) = N(:,:,:,1)./(s-s4);
Ns(:,:,:,2) = N(:,:,:,2)./(s-s3);
Ns(:,:,:,3) = N(:,:,:,3)./(s-s2);
Ns(:,:,:,4) = N(:,:,:,4)./(s-s1);
Ns(:,:,:,5) = N(:,:,:,5)./(s-s8);
Ns(:,:,:,6) = N(:,:,:,6)./(s-s7);
Ns(:,:,:,7) = N(:,:,:,7)./(s-s6);
Ns(:,:,:,8) = N(:,:,:,8)./(s-s5);
Nt(:,:,:,1) = N(:,:,:,1)./(t-t5);
Nt(:,:,:,2) = N(:,:,:,2)./(t-t6);
Nt(:,:,:,3) = N(:,:,:,3)./(t-t7);
Nt(:,:,:,4) = N(:,:,:,4)./(t-t8);
Nt(:,:,:,5) = N(:,:,:,5)./(t-t1);
Nt(:,:,:,6) = N(:,:,:,6)./(t-t2);
Nt(:,:,:,7) = N(:,:,:,7)./(t-t3);
Nt(:,:,:,8) = N(:,:,:,8)./(t-t4);
kem = .125*a*b*c * N(:,:,:,ri).*w(mn).*w(nm).*w(km) ...
.* ( (-.02*s+0.5*r.*(1-r.^2)+.05*t).*Ns(:,:,:,si) ...
+ (-.05*t-.5*s).*Nt(:,:,:,si));
ke(ri,si,k) = sum(kem(:));
%
end
end
end
댓글 수: 0
채택된 답변
Jan
2011년 7월 27일
No, vectorized code is not always faster. If the vectorization needs the creation of large temporary arrays, loops are often faster. The allocation of memory is very expensive, because it can cause a garbage collection or even disk swapping.
An example: http://www.mathworks.com/matlabcentral/answers/8461-double-summation-with-vectorized-loops
BTW: Because Nr, N, Ns and Nt are completely overwritten in each iteration. Therefore it is enough and more efficient to allocate them once before the loops.
추가 답변 (2개)
Daniel Shub
2011년 7월 27일
I am not sure if vectorization is always faster, but loops are not as expensive as they used to be, thanks to the JIT accelerator. I would guess there might be examples were loops are faster, but I cannot think of one off the top of my head.
댓글 수: 2
Daniel Shub
2011년 7월 27일
I am not the best person to answer that. I would suggest asking it as a new question to get a good answer.
참고 항목
카테고리
Help Center 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!