Trouble vectorising a loop

I'm trying to tidy up and improve some of my code and have come across a loop that I am struggling to vectorise. I am calculating the mean value of specific parts of a matrix and replacing certain values within the matrix with the newly calculated values.
For example, if I have a matrix 'A' I would like to replace element A(3,1) with the mean of A(2,1) and A(3,1). Likewise, I would then like to replace A(3,2) with the mean of A(2,2) and A(3,2) and so on (although I don't want to do this for every single column, just specific ones). This process is also repeated at various other locations in the matrix (i.e. also replace A(6,1) with the mean of A(5,1) and A(6,1) and so on).
The looped version of my code is of the following form:
A = rand(7,6);
firstRows=[2;5];
lastRows=[3;6];
columnIndicies=[1 2 3];
A2=A;
for j=1:1:length(lastRows)
A2(lastRows(j),columnIndicies)= ...
mean(A((firstRows(j)):(lastRows(j)),columnIndicies),1);
end
A3 = A2(lastRows,:);
Any comments / assistance would be much appreciated.

 채택된 답변

Matt J
Matt J 2013년 7월 9일
편집: Matt J 2013년 7월 9일

0 개 추천

J=arrayfun(@(a,b) (a:b), firstRows,lastRows,'uni',0);
I=cellfun(@(a,b)ones(1,length(a))*b ,J.',num2cell(1:length(J)),'uni',0);
S=cellfun(@(a,b)ones(1,length(a))/length(a) ,J,'uni',0);
B=sparse([I{:}],[J{:}],[S{:}],length(lastRows),size(A,1));
A2=A;
A2(lastRows,columnIndices)=B*A(:,columnIndices);

댓글 수: 8

Rhys
Rhys 2013년 7월 10일
Thanks Matt, that's definitely an elegant way of doing it.
I had to remove the .' (from J.' on the second line) to get it to work but this answer is the more useful of the two because the number of rows between corresponding 'firstRows' and 'lastRows' elements is not always constant in my application.
However, if I tic toc around the functions I find that the unlooped version is actually a bit slower than the looped example. Is this normal?, I've always approached MatLab with the philosophy that loops are slow...
Jan
Jan 2013년 7월 10일
The idea that loops are slow in general belongs to Matlab before version 6.5. In the last decade Matlabs JIT acceleration show good improvements in handling loops efficiently. But the old rumor of slow loops will never die.
While the processors are very fast today, the RAM is not. The darwback of creating large temporary arrays in a vectorized method can exceed the benefit of the vectorized processing. Then loops are faster.
Muthu Annamalai
Muthu Annamalai 2013년 7월 10일
I hear good things about bsxfun for binary ops on matrices. Could your situation allow you to use that instead of arrayfun/cellfun ?
Matt J
Matt J 2013년 7월 10일
편집: Matt J 2013년 7월 10일
However, if I tic toc around the functions I find that the unlooped version is actually a bit slower than the looped example. Is this normal?, I've always approached MatLab with the philosophy that loops are slow...
Rhys, As Jan says, whether you would outperform a for-loop depends on many factors. We would have to know your actual sizes of A, firstRows, lastRows, etc.... We would have to know how big
mean(lastRows-firstRows)
typically is. We would also have to know where you are putting your tic/toc.
One thing I would stress, for example, is that if firstRows and lastRows are fixed and you plan to be re-applying them to multiple A, then your tic and toc should go around just the last line
tic;
A2(lastRows,columnIndices)=B*A(:,columnIndices);
toc
because the preceding lines are just fixed pre-computations and constitute non-recurring computational costs.. That last line I would expect to perform much faster than re-applying your original for-loop for every new A.
It also helps if you chop out the unused columns before you begin processsing. Below is a timing comparison that I did. Even with the pre-processing needed to set up the sparse matrix B, I see a significant speed-up.
N=5000;
A = rand(N);
firstRows=1:20:N-20;
lastRows=firstRows+10;
columnIndices = 1:10:N ;
A2_1=A; A2_2=A;
A=A(:,columnIndices);
A0=A;
tic
for j=1:1:length(lastRows)
A0(lastRows(j),:)= ...
mean(A((firstRows(j)):(lastRows(j)),:),1);
end
A2_1(:,columnIndices)=A0;
toc;
%Elapsed time is 0.076621 seconds.
tic;
J=arrayfun(@(a,b) (a:b), firstRows,lastRows,'uni',0);
I=cellfun(@(a,b)ones(1,length(a))*b ,J,num2cell(1:length(J)),'uni',0);
S=cellfun(@(a,b)ones(1,length(a))/length(a) ,J,'uni',0);
B=sparse([I{:}],[J{:}],[S{:}],length(lastRows),size(A,1));
A0(lastRows,:)=B*A;
A2_2(:,columnIndices)=A0;
toc;
%Elapsed time is 0.012123 seconds.
Matt, you are correct in saying that the line
A2(lastRows,columnIndices)=B*A(:,columnIndices);
is fast, the array/cellfun calls are the slowest parts of the code. As for some real numbers;
size(A) ~ 1000 100
length(firstRows)=length(lastRows) ~ 100
mean(lastRows-firstRows) ~ 4
Either method doesn't take long to perform, but the method will be called 1000's of times. The tic/toc is currently around the whole piece of code. I get your point about the pre-computations but unfortunately the firstRows and lastRows values are not fixed. They may be the same in some cases but definitely not in all (this is something I'll have to look into further).
Matt J
Matt J 2013년 7월 10일
Rhys, see my most recent version (and timing comparison) above. Chopping out the unused columns of A helped a lot. My version exceeded the for-loops signficantly, even including the cellfun calls.
Rhys
Rhys 2013년 7월 11일
Thanks Matt, chopping the unused columns out of the calculation really does the trick!

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Matt J
Matt J 2013년 7월 9일
편집: Matt J 2013년 7월 9일

0 개 추천

idx=false(size(A));
idx1=idx; idx1(firstRows,columnIndices)=1;
idx2=idx; idx2(lastRows,columnIndices)=1;
A2=A;
A2(idx2)=(A(idx1)+A(idx2))/2;
Jan
Jan 2013년 7월 11일
편집: Jan 2013년 7월 11일

0 개 추천

Another idea:
S = cumsum(A, 1);
B = S(lastRows, :) - S(firstRows - 1, :);
A0(lastRows, :) = bsxfun(@rdivide, B, lastRows - firstRows + 1);
Three problems: 1. I cannot test this currently, and I frequently confuse -1 and +1. 2. The exception firstRows==1 must be handled. 3. CUMSUM accumulates rounding errors and you have to check, if the accuracy of the results satisfies your needs.
Benefit: It avoid calculating the sum of neighboring elements multiple times in case of overlapping intervals.

카테고리

도움말 센터File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

태그

질문:

2013년 7월 9일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by