Is it possible to use Arrayfun across rows
이전 댓글 표시
Hi,
I currently have a FOR LOOP which works its way through a table with almost 20 million records. It is as expected pretty slow, I want to look into alternatives and I wondered if there is a way to use for arrayfun - or another MATLAB function - across rows which will work with high performance. The example below captures the issue of working across rows:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1)
A.Var3(1) = A.Var1(1)
for i = 2:height(A)
if A.Var1(i) == A.Var1(i-1)
A.Var3(i) = A.Var2(i) .* A.Var2(i-1);
else A.Var3(i) = A.Var2(i);
end
end
Any suggestions will be appreciated.
Kind regards,
William
댓글 수: 11
Rik
2020년 10월 6일
arrayfun (and cellfun and structfun) will simply hide the loop. They will not speed up your code, but they will actually cause a slowdown due to the extra overhead. If you want to speed this up, you need to go multi-threaded with parfor or find vectorized operations. In your example you can use logical indexing to perform the multiplication all at once.
William Ambrose
2020년 10월 6일
Walter Roberson
2020년 10월 6일
Michael Croucher
2020년 10월 6일
Is it possible to share your real example somehow please?
William Ambrose
2020년 10월 6일
Rik
2020년 10월 6일
For this example it isn't too difficult:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
B=A;%make a copy to compare
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var2(n) .* A.Var2(n-1);
else
A.Var3(n) = A.Var2(n);
end
end
L = [false;B.Var1(2:end)==B.Var1(1:(end-1))];
ind = find(L);
B.Var3(ind) = B.Var2(ind) .* B.Var2(ind-1);
B.Var3(~L) = B.Var2(~L);
clc,isequal(A,B)
William Ambrose
2020년 10월 6일
편집: William Ambrose
2020년 10월 6일
Please use the editing tools to format your code as code.
I don't see a way here how you could calculate the branches separately. You might have a performance increase by calculating the runs of true and false in A.Var1 == A.Var1, but the extra overhead might not be worth it.
William Ambrose
2020년 10월 6일
Rik
2020년 10월 6일
The longer the runs are, the more efficient calculating the runs will be. So if you have long stretches of true and/or long stretches of false it might be worth looking into. I think the first branch can also be vectorized (e.g. with cumprod), although I haven't tried yet.
William Ambrose
2020년 10월 6일
답변 (1개)
Mohammad Sami
2020년 10월 6일
Something like this will work.
i = [false; A.Var1(1:end-1) == A.Var1(2:end)];
j = find(i);
A.Var3(i) = A.Var2(j) .* A.Var2(j-1);
A.Var3(~i) = A.Var2(~i);
댓글 수: 5
William Ambrose
2020년 10월 6일
Rik
2020년 10월 6일
Mohammad Sami
2020년 10월 6일
편집: Mohammad Sami
2020년 10월 6일
In that case you can use this
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
i = [true; A.Var1(1:end-1) ~= A.Var1(2:end)];
id = cumsum(i);
A.Var3 = grouptransform(A.Var2,id,@cumprod);
The above is assuming that Var1 maynot be in sequence e.g. [1 1 1 2 2 2 4 4 4] e.t.c
If it is always in sequence you can shorten it as follows.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A = grouptransform(A,'Var1',@cumprod,"ReplaceValues",false);
% or explicitly specify which variable to transform if you have other variables
% A = grouptransform(A,'Var1',@cumprod,"Var2","ReplaceValues",false);
William Ambrose
2020년 10월 8일
Mohammad Sami
2020년 10월 8일
Hi William,
For the updated problem as stated, grouptransform with cumprod will work just as well.
My testing shows the result is identical to the expected result.
A =
9×3 table
Var1 Var2 fun_Var2
____ ____ ________
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
Ofcourse if the formula changes, for loop may be more generalizable.
카테고리
도움말 센터 및 File Exchange에서 Performance and Memory에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!