Optimize the ordering of nested loop for speed

Suppose I have the following code. Will it be faster version 1 or version 2? What changes is the ordering of the two nested loops
VERSION 1
% bigArray has dim: [npolv,nz,nsv]
% npolv=68961 > nsv=200 > nz=81
% zgrid is [nz,1], kgrid is [nsv,1]
for j=1:nz
for qq=1:nsv
% the output of fun is a vector dim npolv
bigArray(:,j,qq) = fun(zgrid(j),kgrid(qq));
end
end
or VERSION 2
% bigArray has dim: [npolv,nz,nsv]
% npolv=68961 > nsv=200 > nz=81
% zgrid is [nz,1], kgrid is [nsv,1]
for qq=1:nsv
for j=1:nz
% the output of fun is a vector with dim npolv
bigArray(:,j,qq) = fun(zgrid(j),kgrid(qq));
end
end

댓글 수: 4

Both are equally slow.
@Bruno Do you have any suggestion about how to improve the speed? Thanks
I suppose if you ask such question, then the bottleneck is nothing to do owith looping but calling the function inside the loop.
If you want to speedup, you need to vectorize the function FUN that accept ND-array. Changing loop order won't do much.
Or equally FAST. Like you said Bruno, the speed has nothing to do with looping - it's the insides that count. Look at how much time the looping alone spends:
nsv=200;
nz=81;
tic
for j=1:nz
for qq=1:nsv
;
end
end
toc
tic
for qq=1:nsv
for j=1:nz
;
end
end
toc
and you get times for the for loops alone that are so fast they're not even noticeable:
Elapsed time is 0.000221 seconds.
Elapsed time is 0.000075 seconds.
They're in the microseconds range. There is no way a person would notice those absolute elapsed times, much less a difference between those two times. They're just too fast!

댓글을 달려면 로그인하십시오.

 채택된 답변

Image Analyst
Image Analyst 2019년 1월 19일

0 개 추천

Why not use tic before the loop, and toc after the loop?
Because MATLAB is column major, you'll find it's best to put the right most indexes innermost, and the left indexes outermost:
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, col, slice) = ......
end
end
end

댓글 수: 5

I'm curious - is there a major performance change if you write
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which might decrease performance
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = ......
end
end
end
instead of
% Ordered by right most indexes innermost
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, col, slice) = ......
end
end
end
I can test locally with tic/toc but am curious if changing the second/third indices does make a huge performance difference with large matrices (due to column-major indexing).
No, it does not matter what you call the variables. The first one is always the row, second one is always the column, and the third one is always the slice. So even if you have slice as the second index, it's still basically just a column number that you're calling by a different (deceptive) name, and what you call col is really still the third index and represents the slice regardless of you calling it column.
Now if you change the order of the for statement lines, and still have array(row, col, slice), then THAT will make a difference. This:
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which might decrease performance
for k1 = 1 : numRows
for k2 = 1 : numColumns
for k3 = 1 : slices
array(k1, k2, k3) = %......
end
end
end
will be slower than this:
for k3 = 1 : slices
for k2 = 1 : numColumns
for k1 = 1 : numRows
array(k1, k2, k3) = %......
end
end
end
Bruno Luong
Bruno Luong 2021년 7월 27일
편집: Bruno Luong 2021년 7월 27일
Matthew Kehoe "I can test locally with tic/toc but am curious if changing the second/third indices does make a huge performance difference with large matrices (due to column-major indexing)."
Memory access is fully linear with the second code. Wth the first code it's only linear with the most inner loop. Penaly probably depends on the size of your data and what you do inside the loop. My guess is there is some penalty but it's small.
Matthew Kehoe
Matthew Kehoe 2021년 7월 27일
편집: Matthew Kehoe 2021년 7월 27일
@Bruno Luong It appears to make a difference when the loop variables are large.
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 20;
numRows = 30;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
% Elapsed time of method 1 is 0.053500 seconds.
% Elapsed time of method 2 is 0.041477 seconds.
and
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 400;
numRows = 300;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
% Elapsed time of method 1 is 12.153343 seconds.
% Elapsed time of method 2 is 8.756736 seconds.
Your code run on TMW server
% Matrix named array is size(#Rows,#Columns,#Slices)
% Slice and col are swapped which decreases performance
slices = 50;
numColumns = 400;
numRows = 300;
array = zeros(numRows,slices,numColumns);
array2 = zeros(numRows,numColumns,slices);
tic
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array(row, slice, col) = array(row, slice, col) + 5;
end
end
end
end
toc
Elapsed time is 13.637798 seconds.
tic
% Ordered by right most indexes innermost
for ntests = 1:1000
for slice = 1 : slices
for col = 1 : numColumns
for row = 1 : numRows
array2(row, col, slice) = array2(row, col, slice) + 5;
end
end
end
end
toc
Elapsed time is 12.603883 seconds.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Mark McBroom
Mark McBroom 2019년 1월 19일

0 개 추천

  1. use profile tool to determine hot spot in code.
  2. pre-allocate bigArray. On my computer this reduced execution time by 70%

댓글 수: 1

Dear Mark, thanks for your answer. I didn't report it in then sample code in my question but, yes, I had pre-allocated bigArray

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Matrices and Arrays에 대해 자세히 알아보기

질문:

2019년 1월 19일

댓글:

2021년 7월 28일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by