How can I optimize this recursive for loop?

Question

LeChat 2020년 5월 1일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/522133-how-can-i-optimize-this-recursive-for-loop

편집: LeChat 2021년 4월 5일

Hi,

I have this recursive fo loop and I would like to know if there is some way in Matlab by which I could optimize the computation time:

tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = (1+t(i)-t(1:i))*ones(1,Nr);
    e1 = ones(i,1);
    ex = e1*x(i,:);
    ey = e1*y(i,:);
    exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
    x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
    y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc

Obviously I cannot use a parfor (because of the recursivity of this loop, and I tried to make all the vectors gpuArrays but the computation time seems ~20% longer (even without the gather step).

I was wondering if there were any other way to either make this vectorial or using the pdist function from the Statistics toolbox (or another one called dist but I forgot which toolbox is was from and I don't know the difference with pdist), or something on the GPU...?

Thank you for your help and ideas.

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기

Sindar 2020년 5월 3일

MATLAB Online에서 열기

Sorry about the typos. There's also incorrectly "ex" in the y update line.

With another look, it also looks like tt can be improved. First off, you only use the inverse, so you could calculate that directly instead. Secondly, if I'm reading it correctly, tt has Nr identical columns as a set up for elementwise multiplication with ex/ey/etc.. This isn't actually necessary as Matlab understands that it should expand elementwise multiplication with length-1 dimensions. Try this:

tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tti = 1./(1+t(i)-t(1:i));
    ex = x(i,:)-x(1:i,:);
    ey = y(i,:)-y(1:i,:);
    exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
    x(i+1,:) = x(i,:) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
    y(i+1,:) = y(i,:) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
toc

A few general notes when working on optimization like this:

keep a copy of the original, highly-readable code. This example hasn't gotten too opaque yet, but they often do
check that things are actually giving the same results. Since you've got random numbers, you'll need to set the seed consistently (see here)
occasionally check on a problem at the actual size you want. Sometimes you'll find that the optimization you've worked so hard on is only a significant portion of the small case, and you've only gained 30% of 5% of the actual case
consider memory constraints. My experience is that Matlab is really good at intelligently optimizing halfway decent code, so long as your RAM isn't too small. This can lead to behavior you might not expect. For example, your code appears completely separable in the columns. If you've got plenty of RAM, doing all the calculations simultaneously is almost certainly faster. If you have too little, it might be (dramatically) faster to cut the problem into pieces that easily fit into memory.

Sindar 2020년 5월 7일

MATLAB Online에서 열기

It can sometimes get a little messy, but it may be fastest to generate the datasets in chunks (say, 10 at a time). I have not used parfor much, but I suspect that vectorized calculations may be faster than separate ones in parallel. Something like:

chunk_size = 10; % try to make this a divisor of Nr
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for ind=1:(Nr/chunk_size)
    idx = (ind-1)*chunk_size + (1:chunk_size);
    for i = 1:Nt-1
        tti = 1./(1+t(i)-t(1:i));
        ex = x(i,idx)-x(1:i,idx);
        ey = y(i,idx)-y(1:i,idx);
        exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
        x(i+1,idx) = x(i,idx) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
        y(i+1,idx) = y(i,idx) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
    end
end
toc

you can likely eyeball a good chunk_size by observing memory usage (CTRL-SHIFT-ESC on Windows) and trying to get it close to ~90% of what you think your RAM is (e.g., 32 GB. I don't know how it calculates the percent at the top, but the denominator isn't consistent)

caveat: I've successfully used this framework in a much more convoluted problem where each loop contains a large quantity of intermediate data (imagine exp1 ran up to or large than x). For your problem, it's easy to implement, so worth checking, but it may be entirely useless

LeChat 2021년 4월 2일

This has improved the performances of the computation. Thank you!

Jan 2021년 4월 2일

편집: Jan 2021년 4월 2일

MATLAB Online에서 열기

The results between the originak version and the faster version of Sindar are different:

% Original:
exp1 = exp(-.25*((ex-x(1:i,:)).^2 + (ey-y(1:i,:)).^2) ./ tt);
% Faster:
exp1 = exp((-.25*(ex).^2 + (ey).^2).*tti)

In the 2nd version, the -0.25 is multiplied to the first term only. This reduces the runtime significantly, because the values are growing faster and EXP(Inf) is reached soon, which is much cheaper than finite calculation.

You need:

exp1 = exp((-0.25 * (ex.^2 + ey.^2) .* tti)

The last row of exp1 is 0, so it does not matter in the sum. Omitting it does not reduce the runtime significantly, although the number of EXP calls is reduced. See my answer.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Jan 2021년 4월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/522133-how-can-i-optimize-this-recursive-for-loop#answer_665579

편집: Jan 2021년 4월 2일

MATLAB Online에서 열기

tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = 1 ./ (1 + t(i) - t(1:i-1));
    ex = x(i, :) - x(1:i-1, :);
    ey = y(i, :) - y(1:i-1, :);
    exp1 = exp((-0.25 * tt) .* (ex.^2 + ey.^2)) .* tt.^2;
    x(i+1, :) = x(i, :) + adt * sum(ex .* exp1, 1) + edt * randn(1, Nr);
    y(i+1, :) = y(i, :) + adt * sum(ey .* exp1, 1) + edt * randn(1, Nr);
end
toc

This runs in 10 sec instead of the original 14 sec.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

LeChat 2021년 4월 5일

편집: LeChat 2021년 4월 5일

MATLAB Online에서 열기

Actually I ended up doing the following , following previous comments (and Sindar's advice):

tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
%
Fx = zeros(Nt,Nr);
Fy = zeros(Nt,Nr);
%
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
    tt = (1 + t(i) - t(1:i-1));
    ex = x(i, :) - x(1:i-1, :);
    ey = y(i, :) - y(1:i-1, :);
    exp1 = exp((-0.25./tt) .* (ex.^2 + ey.^2)) ./ tt.^2;
    Fx(i+1,:)=adt*sum(ex.*exp1,1);
    Fy(i+1,:)=adt*sum(ey.*exp1,1);
    x(i+1, :) = x(i, :) + Fx(i+1,:) + edt * randn(1, Nr);
    y(i+1, :) = y(i, :) + Fy(i+1,:) + edt * randn(1, Nr);
end
toc

It gives very similar performances as what you posted Jan, I don't know if doing the sum separately is very interesting (performance-wise)...?

댓글을 달려면 로그인하십시오.

How can I optimize this recursive for loop?

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How can I optimize this recursive for loop?

댓글 수: 9 이전 댓글 7개 표시이전 댓글 7개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 9
이전 댓글 7개 표시이전 댓글 7개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기