How can I optimize this recursive for loop?

Hi,
I have this recursive fo loop and I would like to know if there is some way in Matlab by which I could optimize the computation time:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
e1 = ones(i,1);
ex = e1*x(i,:);
ey = e1*y(i,:);
exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
Obviously I cannot use a parfor (because of the recursivity of this loop, and I tried to make all the vectors gpuArrays but the computation time seems ~20% longer (even without the gather step).
I was wondering if there were any other way to either make this vectorial or using the pdist function from the Statistics toolbox (or another one called dist but I forgot which toolbox is was from and I don't know the difference with pdist), or something on the GPU...?
Thank you for your help and ideas.

댓글 수: 9

LeChat
LeChat 2020년 5월 1일
편집: LeChat 2020년 5월 1일
I think I can simplify the code by replacing:
tt = (1+t(i)-t(1:i))*ones(1,Nr);
e1 = ones(i,1);
ex = e1*x(i,:);
ey = e1*y(i,:);
exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
by this:
tt = (1+t(i)-t(1:i));
exp1 = exp(-.25*((x(i,:)-x(1:i,:)).^2+(y(i,:)-y(1:i,:)).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum((x(i,:)-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum((y(i,:)-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
am I right?
In any case, the computation of the exp1 is what takes the most time (followed by the sum), so I don't know if this change much regarding computation time.
might save some to predefine (ex-x(1:i,:), (ey-y(1:i,:) since they appear in both of the bottlenecks
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
ex = (x(i,:)-x(1:i,:);
ey = (y(i,:)-y(1:i,:);
exp1 = exp(-.25*(ex).^2+(ey).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
Thank helped quite a bit @Sindar (30% computation time improvement), thank you.
Some parenthesis were missing, this is the version without typos:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
ex = x(i,:)-x(1:i,:);
ey = y(i,:)-y(1:i,:);
exp1 = exp((-.25*(ex).^2+(ey).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
Rik
Rik 2020년 5월 3일
If in the future you have an iterative problem of the shape x(n) =x(n-1)+f(n), you can compute the f(n) array and then use cumsum. In this case that is not possible because you have f(n,x).
Sorry about the typos. There's also incorrectly "ex" in the y update line.
With another look, it also looks like tt can be improved. First off, you only use the inverse, so you could calculate that directly instead. Secondly, if I'm reading it correctly, tt has Nr identical columns as a set up for elementwise multiplication with ex/ey/etc.. This isn't actually necessary as Matlab understands that it should expand elementwise multiplication with length-1 dimensions. Try this:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,:)-x(1:i,:);
ey = y(i,:)-y(1:i,:);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,:) = x(i,:) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
toc
A few general notes when working on optimization like this:
  • keep a copy of the original, highly-readable code. This example hasn't gotten too opaque yet, but they often do
  • check that things are actually giving the same results. Since you've got random numbers, you'll need to set the seed consistently (see here)
  • occasionally check on a problem at the actual size you want. Sometimes you'll find that the optimization you've worked so hard on is only a significant portion of the small case, and you've only gained 30% of 5% of the actual case
  • consider memory constraints. My experience is that Matlab is really good at intelligently optimizing halfway decent code, so long as your RAM isn't too small. This can lead to behavior you might not expect. For example, your code appears completely separable in the columns. If you've got plenty of RAM, doing all the calculations simultaneously is almost certainly faster. If you have too little, it might be (dramatically) faster to cut the problem into pieces that easily fit into memory.
Thank you for your advice. I actually chose a Mersenne Twister in Matlab (which I wish I had in 64bits though) using the following commands:
%% Pseudo-Random Number Generator
rng(rng_seed,'twister');
s=rng;
rng(s);
for the sake of reproductibility.
I also found that when I wanted to generate many (Nr>50) datasets at onces, it seems faster to generate one at the time using a parfor loop that compared to generating them all at once like in the previous code. I guess because of my RAM speed...?
I agree with you that sometimes optimization is not worth the work but here I need many (Nr>50 or even 100) and long (Tmax large, and dt<0.01) datasets, so here I guess it is worth digging into it. Also I get to improve my coding skills ;)
Thank you for your insights and help!
It can sometimes get a little messy, but it may be fastest to generate the datasets in chunks (say, 10 at a time). I have not used parfor much, but I suspect that vectorized calculations may be faster than separate ones in parallel. Something like:
chunk_size = 10; % try to make this a divisor of Nr
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for ind=1:(Nr/chunk_size)
idx = (ind-1)*chunk_size + (1:chunk_size);
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,idx)-x(1:i,idx);
ey = y(i,idx)-y(1:i,idx);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,idx) = x(i,idx) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,idx) = y(i,idx) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
end
toc
you can likely eyeball a good chunk_size by observing memory usage (CTRL-SHIFT-ESC on Windows) and trying to get it close to ~90% of what you think your RAM is (e.g., 32 GB. I don't know how it calculates the percent at the top, but the denominator isn't consistent)
caveat: I've successfully used this framework in a much more convoluted problem where each loop contains a large quantity of intermediate data (imagine exp1 ran up to or large than x). For your problem, it's easy to implement, so worth checking, but it may be entirely useless
LeChat
LeChat 2021년 4월 2일
This has improved the performances of the computation. Thank you!
Jan
Jan 2021년 4월 2일
편집: Jan 2021년 4월 2일
The results between the originak version and the faster version of Sindar are different:
% Original:
exp1 = exp(-.25*((ex-x(1:i,:)).^2 + (ey-y(1:i,:)).^2) ./ tt);
% Faster:
exp1 = exp((-.25*(ex).^2 + (ey).^2).*tti)
In the 2nd version, the -0.25 is multiplied to the first term only. This reduces the runtime significantly, because the values are growing faster and EXP(Inf) is reached soon, which is much cheaper than finite calculation.
You need:
exp1 = exp((-0.25 * (ex.^2 + ey.^2) .* tti)
The last row of exp1 is 0, so it does not matter in the sum. Omitting it does not reduce the runtime significantly, although the number of EXP calls is reduced. See my answer.

댓글을 달려면 로그인하십시오.

 채택된 답변

Jan
Jan 2021년 4월 2일
편집: Jan 2021년 4월 2일

0 개 추천

tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = 1 ./ (1 + t(i) - t(1:i-1));
ex = x(i, :) - x(1:i-1, :);
ey = y(i, :) - y(1:i-1, :);
exp1 = exp((-0.25 * tt) .* (ex.^2 + ey.^2)) .* tt.^2;
x(i+1, :) = x(i, :) + adt * sum(ex .* exp1, 1) + edt * randn(1, Nr);
y(i+1, :) = y(i, :) + adt * sum(ey .* exp1, 1) + edt * randn(1, Nr);
end
toc
This runs in 10 sec instead of the original 14 sec.

댓글 수: 1

LeChat
LeChat 2021년 4월 5일
편집: LeChat 2021년 4월 5일
Actually I ended up doing the following , following previous comments (and Sindar's advice):
tic
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
%
Fx = zeros(Nt,Nr);
Fy = zeros(Nt,Nr);
%
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1 + t(i) - t(1:i-1));
ex = x(i, :) - x(1:i-1, :);
ey = y(i, :) - y(1:i-1, :);
exp1 = exp((-0.25./tt) .* (ex.^2 + ey.^2)) ./ tt.^2;
Fx(i+1,:)=adt*sum(ex.*exp1,1);
Fy(i+1,:)=adt*sum(ey.*exp1,1);
x(i+1, :) = x(i, :) + Fx(i+1,:) + edt * randn(1, Nr);
y(i+1, :) = y(i, :) + Fy(i+1,:) + edt * randn(1, Nr);
end
toc
It gives very similar performances as what you posted Jan, I don't know if doing the sum separately is very interesting (performance-wise)...?

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

제품

릴리스

R2019b

질문:

2020년 5월 1일

편집:

2021년 4월 5일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by