Improvement the speed of array filling

Question

Enrique 2022년 9월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1794440-improvement-the-speed-of-array-filling

편집: Bruno Luong 2022년 9월 4일

I am performing some intensive array work for a simulation . Esentially , I have a 40X27000 (X) array which at first is filled with initial values and after doing some operations is updated , all this procedure is done inside a for loop . After doing profiling , it can be seen that the lines of code used for updating the values are considerably slow.

(As I am not providing all the code , please consider that not described variables are auxilliary variables of compatible size that come from intermediate computing steps )

Inside the for loop I have considered the following options I have not noticed any improvement :

Option A

A = (MM-X(8,:))./VV;
B =  (LL-X(3,:))./FF;
C = G.*(-I)./(1.0+(1.5./X(9,:)).^8.0); %This is slow , but expected
%Continues for C D E ...%
X = X + W .* [A; B; C; D; E] %This last array [A; B; C; D; E] is made up of 40 elements , just showing some for simplicity purposes.

Option B

A = (MM-X(8,:))./VV;
B =  (LL-X(3,:))./FF;
C = G.*(-I)./(1.0+(1.5./X(9,:)).^8.0); 
%Continues for C D E ...%
X(1,:) = X(1,:) + W .* A 
X(2,:) = X(2,:) + W .* B 
X(3,:) = X(3,:) + W .* C 
X(4,:) = X(4,:) + W .* D 
% ... Continues for all variables %

Option C

X(1,:) = X(1,:) + W .* (MM-X(8,:))./VV;
X(2,:) = X(2,:) + W .* (LL-X(3,:))./FF;
X(3,:) = X(3,:) + W .* G .*(-I)./(1.0+(1.5./X(9,:)).^8.0); 
% ... Continues for all variables%

Furthermore , I have tried initializing X as a gpuArray() without any improvement . Thanks for your help

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

Enrique 2022년 9월 4일

편집: Enrique 2022년 9월 4일

MATLAB Online에서 열기

Thanks for your help , I definetively understand your points of view . I tried to give as much context I can because the original file is 800+ lines with hundreds of auxiliary variables . The key of my question is how to optimize this kind of array updating inside a loop.

I can provide you a toy example with more context regarding the dimensionality of the problem .

size = 15000 % On the real problem this should be 30000
X = rand(45,size);
MM = rand(1,size);
VV = rand(1,size);
LL = rand(1,size);
FF = rand(1,size);
I = rand(1,size);
G = 0.823;
W = 5e-3;
t = 0 ;
while t < 10000 % On the real problem this should be 1e6
A = (MM-X(1,:))./VV;
B =  (LL-X(3,:))./FF;
C = G.*(-I)./(1.0+(1.5./X(2,:)).^8.0);
% Problematic Line %
X = X + W .* [A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C;A; B; C;A; B; C;] ; 
% ----------------- %
t = t +1 ;
end

Regarding the line of X update , please keep in mind that I am repeating [A B C ... A B C ] for explanatory purposes , but on the orignal problem each of the 40+ variables is different [A B C D E ... Z ... A1 ... Z2] (I am not looking for a repmat() style solution)

Thanks again

Walter Roberson 2022년 9월 4일

Also, you would improve performance by working with the transpose of your array, so that you are working on columns instead of rows. Memory for any individual column is consecutive and so faster.

dpb 2022년 9월 4일

편집: dpb 2022년 9월 4일

It's tmp that's the bottleneck, not X, it is the one that needs preallocation and direct addressing into when each row/column(*) is defined.

If you were to also follow Walter's point regarding internal array storage order and reorient as column vectors (thus writing contiguous memory as much as possible), it would seem most probable, yes. I was thinking about A with the large array preallocated first. Not sure there whether there would/would not be difference between the two; would have to test to see; the larger memory footprint might override the internal array operations.

@Matt J's idea of the cell array is interesting to do similar with the "divide and conquer" idea although I'd think the cell addressing would add a little over the direct array addressing mode if the vectors were, again, column oriented. Again, probably only profiling/timing will really prove it one way or t'other; you can't predict the outcome of the optimizer's output.

Whether this would end up being a candidate for mex'ing I dunno, either...possibly.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2022년 9월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1794440-improvement-the-speed-of-array-filling#answer_1041770

편집: Matt J 2022년 9월 4일

MATLAB Online에서 열기

Furthermore , I have tried initializing X as a gpuArray() without any improvement.

There should be some improvement. If there isn't, the bottleneck is somewhere else.

% Problematic Line %

X = X + W .* [A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C; A; B; C;A; B; C;A; B; C;] ;

Do you really have such substantial repetition in the update matrix A,B,C,A,B,C... If so, that can be exploited.

[m,n]=size(X);
X=reshape(X,3,n,[]) + W.*[A;B;C]

If not, and if you really do have 40 different expressions for all 40 rows, (I would find it strange/surprising to have so many), the fastest thing would then be to keep the rows of X as a cell array:

X{1} = X{1} + W .* (MM-X{8})./VV;
X{2} = X{2} + W .* (LL-X{3})./FF;
X{3} = X{3} + W .* G .*(-I)./(1.0+(1.5./X{9}).^8.0); 

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기

Matt J 2022년 9월 4일

MATLAB Online에서 열기

OK. Need to put the code inside a function to see full optimization:

runTest()
Elapsed time is 0.706534 seconds.
Elapsed time is 0.070628 seconds.
Elapsed time is 0.089183 seconds.

Still, though, it's important to realize there will be scenarios where in-place updates are not optimized and cell array access turns out to be better, like in this second test:

runTest2()
Elapsed time is 2.356626 seconds.
Elapsed time is 0.898723 seconds.
Elapsed time is 0.276424 seconds.
function runTest()
    X=rand(40,27e5);
    dX=rand(40,27e5);
    
    tic;
    for i=1:40
     X(i,:)=X(i,:)+dX(i,:);
    end
    toc;
    
    Xt=X';dXt=dX';
    tic;
    for i=1:40
     Xt(:,i)=Xt(:,i)+dXt(:,i);
    end
    toc;
    
    X=num2cell(X,2);
    dX=num2cell(dX,2);
    
    tic;
    for i=1:40
     X{i}=X{i}+dX{i};
    end
    toc;
    
end
function runTest2()
    X=rand(40,27e5);
    dX=rand(40,27e5);
    
    I=speye(27e5);
    
    tic;
    for i=1:40
     X(i,:)=X(i,:)*I;
    end
    toc;
    
    Xt=X';dXt=dX';
    tic;
    for i=1:40
     Xt(:,i)=I*Xt(:,i);
    end
    toc;
    
    X=num2cell(X,2);
    dX=num2cell(dX,2);
    
    tic;
    for i=1:40
     X{i}=X{i}*I;
    end
    toc;
    
end

Bruno Luong 2022년 9월 4일

편집: Bruno Luong 2022년 9월 4일

"it's important to realize there will be scenarios where in-place updates are not optimized "

I don't think your second test show the penalty is due to inplace. IMO the speec reduction is because that the Xt(:,i) on the RHS has to be extracted from the original vector to form a new vector for matrix x vector multiplication. Matlab JIT does not go yet into inplace-pointer in operation/function. Where as the cell array loop, no such extraction is needed.

댓글을 달려면 로그인하십시오.

Improvement the speed of array filling

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

채택된 답변

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Improvement the speed of array filling

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

채택된 답변

댓글 수: 9 이전 댓글 7개 표시이전 댓글 7개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기