Improving MATLAB code ideas? - cells and matrices

Hello!
I have 2 sets of 2 matrices: a1,a2 and b1,b2. I calculate then c1 to c4 which are
c1=a1*b1
c2=a1*b2
c3=a2*b1
c4=a2*b2
I can also do this with cells and with for loops:
for j=1:2
for i=1:2
C{i,j}=A{i}*B{j};
end
end
where I have prealocated the cells and defined the cell A to contain all the elements in the set. Now the thing is, that the second way, because it uses for loops is twice as slow as the first method, but the first method (where I type every combination is not flexible because if the elements in a and b increase, I have to type it manually.
Do you know a third way, where I do not face this tradeoff between efficiency versus flexibility? Any ideas?
Much appreciated!

댓글 수: 14

The second method is faster for me...
Boris
Boris 2012년 4월 3일
Hmm, that is strange. Here is my complete code that I am doing the experiments with:
clc
clear all
close all
B_LL0 = ones(4,4);
B_LL1 = ones(4,4);
F_0=randn(4,4);
F_1=randn(4,4);
B_TL00=zeros(4,4);
B_TL01=zeros(4,4);
B_TL10=zeros(4,4);
B_TL11=zeros(4,4);
n=1000;
tic
for i=1:n
B_TL00 = F_0*B_LL0;
B_TL01 = F_1*B_LL0;
B_TL10 = F_0*B_LL1;
B_TL11 = F_1*B_LL1;
end
toc
BT=cell(2,2);
BT{1,1}=zeros(4,4);
BT{2,1}=zeros(4,4);
BT{2,2}=zeros(4,4);
BT{1,2}=zeros(4,4);
F=cell(2,1);
BL=cell(2,1);
F{1}=F_0;
F{2}=F_1;
BL{1}=B_LL0;
BL{2}=B_LL1;
tic
for z=1:n
for j=1:2
for i=1:2
BT{i,j}=F{i}*BL{j};
end
end
end
toc
tic
for z=1:n
BT{1,1}=F{1}*BL{1};
BT{1,2}=F{1}*BL{2};
BT{2,1}=F{2}*BL{1};
BT{2,2}=F{2}*BL{2};
end
toc
Results (in order of appearence):
Elapsed time is 0.006170 seconds.
Elapsed time is 0.012226 seconds.
Elapsed time is 0.011463 seconds.
Boris
Boris 2012년 4월 3일
P.s. Sorry about the clear all, I know some people hate it, delete it first, so that it doesn't ruin your breakpoints if you have such :)
Wither your code:
Elapsed time is 0.004131 seconds.
Elapsed time is 0.007464 seconds.
Elapsed time is 0.006580 seconds.
So not twice as slow but still slower... Its still not really slow though... Are you doing a lot of these?
Boris
Boris 2012년 4월 3일
My office computer is slow, so don't judge the absolute level please :).
Yes I do a lot of these and this is the simplest of them, I have several more that build on the output from these matrices (so ^2) and then I do this 150 times and that is my function, which I should eventually call the function 200 000 times for a Metropolis Hastings Algorithm. Currently with my setup it takes 6 hours, you can guess how much it would take with the 'dynamic' function (the one with cells).
Boris
Boris 2012년 4월 3일
Another problem is, that I would probably have to add tracking one more period (that is carrying ^2 of these) and one more state which is also *2.
So far, writing it out by hand looks as the best way. At least it is a one time investment only (while simulating it say 10 times with a code twice as slow... well, you can guess the tradeoff)
Timing things that take between 6 and 12 ms (or even worse 4 and 8 ms) is not that reliable. You should adjust your matrix and loop sizes so things that a reasonable amount of time (i.e., at least a second).
Boris
Boris 2012년 4월 4일
Could you please elaborate on that? I don't think I understood. What is wrong with something that takes 6 to 12 ms? I am not a Matlab expert so if there is a caveat somewhere when things go too fast, I don't know that.
Yes, timing short things is problematic with a nonrealtime OS. It is possible, even probable, that your OS can do something that adds a few ms to a computation. The resolution of the clock used by tic/toc is finite and on the order of milliseconds.
Boris
Boris 2012년 4월 4일
Okay. I get the point now, but still do not get the perfect relevance to my problem, because I test this in a loop. When I use 100 000 iterations (the results above are with 1000) the numbers more or less multiply by 100 but the general ratio stays the same. In the end, I have already two codes, one which is the nonflexible and a flexible one - for the standard simulation that takes 15 minutes with one of the codes takes a bit less than 30 with the longer one and as far as I tested, the major difference comes from the loops - so I am trying to add the "functionality of the cells" to "the speed of the fixed system" and even though tic/toc are not perfect and even may miscalculate if I compare just one iteration, their general results are robust for the whole function. I just have broken down the problem to its core, so that it is easy for anyone who is willing to help.
Does that answer the measurment of time issue or did I miss the point?
Thanks!
My comment really had nothing to do with the question, just your way of timing things. The 15 min vs 30 min real world difference is what matters. I haven't looked at what is causing the slow down. To really evaluate the efficiency of things it helps to know the sizes of the matrices, their classes (e.g., double), the number of loops, etc. What is fast for small matrices might not be fast for large matrices.
Boris
Boris 2012년 4월 5일
Well, one can always say how everything matters and how the general case is but that doesn't help at all. I've provided a full working, self contained sample code and explained the problem in detail, above you can see everything relevant. If someone can help me make *this* sample code work faster and flexible, that is enough for me, as it is really close to the main part in my real code, e.g. I typically do not work with matrices larger than 4x4, at the very maximum 8x8.
Boris, my code below _will_ do this for you. You just have to tell us how you're generating a1,a2...
If these are outputs from a function call or a data read, then this is simple. Stack the results into matrices similar to what I did below as you generate them (rather than generating a1, a2, ... an).
Boris
Boris 2012년 4월 10일
Thanks for the effort Sean, I am checking it right now.

댓글을 달려면 로그인하십시오.

답변 (2개)

Jonathan Sullivan
Jonathan Sullivan 2012년 4월 3일

0 개 추천

You can use a very handy function created by James Tursa called mtimesx. It is for this exact type of problem. It does vectorized matrix multiplication.
You can setup your problem by first concatenating matrixes a1 and a2 into a variable A. We will concatenate them in the 3rd dimension.
A = cat(3,a1,a2);
We then concatenate matrixes b1 and b2 into variable B. We will do this in the 4th dimension. Since we want to do all combinations of A and B, it is important that A and B are concatenated in different dimensions.
B = cat(4,b1,b2);
Now for the calculation:
C = mtimesx(A,B);
C will be (in your case) a 2x2x2x2 array. It has arrays of matrixes. If you want to access a1*b1, you would call:
a1b1 = C(:,:,1,1);
Likewise, if you wanted to access a2*b1 you would call:
a2b1 = C(:,:,2,1);
James has written a rather well thought out help file at the link I attached above. You'll need to download it. Assuming you run windows, it should compile itself for you the first time you run it. It is very straight forward.

댓글 수: 1

Boris
Boris 2012년 4월 3일
Thank you very much for the effort! I installed a compiler and tried it but unfortunately this is by far the slowest method for me. compared to the other 3 above in the code this one gave double the time from the standard "cell" code:
Elapsed time is 0.006381 seconds. - written out
Elapsed time is 0.012707 seconds. - cells
Elapsed time is 0.011784 seconds. - cells, matrices predefined
Elapsed time is 0.027363 seconds. - mtimesx

댓글을 달려면 로그인하십시오.

Sean de Wolski
Sean de Wolski 2012년 4월 3일

0 개 추천

How are you generating a1,a2,b1,b2,etc.? Store the 'a's along the third dimension of a matrix and concatenate the 'b's columnwise. Example to follow. Then store the results in a numeric matrix rather than a cell. You know all of the 'submatrices' are square. So the results should be storable with each position known. Then run your for-loop accordingly along pages of the 'a' matrix using multiplying by the whole b at once.
clearvars;
sz = 3;
a = cat(3,magic(sz),ones(sz),pi*ones(sz)); %pagewise
b = a;
b = reshape(b,sz,numel(b)/sz); %columnwise
for ii = size(a,3):-1:1
iiv = ((ii-1)*sz+1):ii*sz;
M(iiv,:) = a(:,:,ii)*b;
end
Now the M matrix will contain all the results. This should be pretty optimized for speed right now. Let me know if anything isn't clear here.
More: You could actually skip the for-loop altogether by stacking the transposed submatrices, concatenating them column-wise and then transposing the result:
a = reshape(permute(a,[2 1 3]),sz,[])';
M2 = a*b;
Check
isequal(M2,M)
ans =
1

댓글 수: 3

Boris
Boris 2012년 4월 3일
Thanks! I have to admit that I am not familiar with some of the functions that you used cat(),numel()..., so I will go through the code line by line tomorrow and see how exactly it functions.
To your first question: the matrices are results from some initial condition and then are generated at the end of each iteration - this is an augmented Kalman Filter. For example at the very beginning a1 and a2 are the beta vectors from a transition equation and b1 and b2 are the respective transition matrices.
Boris
Boris 2012년 4월 10일
I tested the suggested routine and the results were unfortunately not much better than the other suggested answers. I went for the second suggestion - the matrix multiplication, avoiding the loop and I have to say, it looks pretty elegant. However I got 0.1363 secs just for the multiplication and afterwards I have to go 'into' the big matrix M2 and extract the resulting answers (c1 to c4), which takes a lot of time.
Elapsed time is 0.006541 seconds. - written out
Elapsed time is 0.012406 seconds. - cells
Elapsed time is 0.011982 seconds. - cells, matrices predefined
Elapsed time is 0.027814 seconds. - mtimesx
Elapsed time is 0.013632 seconds. - the matrix multiplication
With the extraction, (where the first thing that came to my mind is definetly not the fastest way to extract matrices from a big one) I jumped to 'Elapsed time is 0.066421 seconds.'
Boris
Boris 2012년 4월 10일
clc
clear all
close all
B_LL0 = ones(4,4);
B_LL1 = ones(4,4);
F_0=randn(4,4);
F_1=randn(4,4);
B_TL00=zeros(4,4);
B_TL01=zeros(4,4);
B_TL10=zeros(4,4);
B_TL11=zeros(4,4);
n=1000;
tic
for i=1:n
F = cat(1,F_0,F_1);
B_LL = cat(2,B_LL0,B_LL1);
B_TL = F*B_LL;
% ns = size(F_0);
% B_TL00 = B_TL(1:ns,1:ns);
% B_TL01 = B_TL(ns+1:end,1:ns);
% B_TL10 = B_TL(1:ns,ns+1:end);
% B_TL11 = B_TL(ns+1:end,ns+1:end);
end
toc
tic
for i=1:n
B_TL00 = F_0*B_LL0;
B_TL01 = F_1*B_LL0;
B_TL10 = F_0*B_LL1;
B_TL11 = F_1*B_LL1;
end
toc
BT=cell(2,2);
BT{1,1}=zeros(4,4);
BT{2,1}=zeros(4,4);
BT{2,2}=zeros(4,4);
BT{1,2}=zeros(4,4);
F=cell(2,1);
BL=cell(2,1);
F{1}=F_0;
F{2}=F_1;
BL{1}=B_LL0;
BL{2}=B_LL1;
tic
for z=1:n
for j=1:2
for i=1:2
BT{i,j}=F{i}*BL{j};
end
end
end
toc
tic
for z=1:n
BT{1,1}=F{1}*BL{1};
BT{1,2}=F{1}*BL{2};
BT{2,1}=F{2}*BL{1};
BT{2,2}=F{2}*BL{2};
end
toc
tic
for z=1:n
A = cat(3,F_0,F_1);
B = cat(4,B_LL0,B_LL1);
C = mtimesx(A,B);
end
toc

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Logical에 대해 자세히 알아보기

질문:

2012년 4월 3일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by