How can I optimize this recursive for loop?
이전 댓글 표시
Hi,
I have this recursive fo loop and I would like to know if there is some way in Matlab by which I could optimize the computation time:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
e1 = ones(i,1);
ex = e1*x(i,:);
ey = e1*y(i,:);
exp1 = exp(-.25*((ex-x(1:i,:)).^2+(ey-y(1:i,:)).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum((ex-x(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum((ey-y(1:i,:))./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
Obviously I cannot use a parfor (because of the recursivity of this loop, and I tried to make all the vectors gpuArrays but the computation time seems ~20% longer (even without the gather step).
I was wondering if there were any other way to either make this vectorial or using the pdist function from the Statistics toolbox (or another one called dist but I forgot which toolbox is was from and I don't know the difference with pdist), or something on the GPU...?
Thank you for your help and ideas.
댓글 수: 9
Sindar
2020년 5월 1일
might save some to predefine (ex-x(1:i,:), (ey-y(1:i,:) since they appear in both of the bottlenecks
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tt = (1+t(i)-t(1:i))*ones(1,Nr);
ex = (x(i,:)-x(1:i,:);
ey = (y(i,:)-y(1:i,:);
exp1 = exp(-.25*(ex).^2+(ey).^2)./tt);
x(i+1,:) = x(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ex./tt.^2.*exp1) + edt*randn(1,Nr);
end
toc
LeChat
2020년 5월 3일
Rik
2020년 5월 3일
If in the future you have an iterative problem of the shape x(n) =x(n-1)+f(n), you can compute the f(n) array and then use cumsum. In this case that is not possible because you have f(n,x).
Sindar
2020년 5월 3일
Sorry about the typos. There's also incorrectly "ex" in the y update line.
With another look, it also looks like tt can be improved. First off, you only use the inverse, so you could calculate that directly instead. Secondly, if I'm reading it correctly, tt has Nr identical columns as a set up for elementwise multiplication with ex/ey/etc.. This isn't actually necessary as Matlab understands that it should expand elementwise multiplication with length-1 dimensions. Try this:
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,:)-x(1:i,:);
ey = y(i,:)-y(1:i,:);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,:) = x(i,:) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,:) = y(i,:) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
toc
A few general notes when working on optimization like this:
- keep a copy of the original, highly-readable code. This example hasn't gotten too opaque yet, but they often do
- check that things are actually giving the same results. Since you've got random numbers, you'll need to set the seed consistently (see here)
- occasionally check on a problem at the actual size you want. Sometimes you'll find that the optimization you've worked so hard on is only a significant portion of the small case, and you've only gained 30% of 5% of the actual case
- consider memory constraints. My experience is that Matlab is really good at intelligently optimizing halfway decent code, so long as your RAM isn't too small. This can lead to behavior you might not expect. For example, your code appears completely separable in the columns. If you've got plenty of RAM, doing all the calculations simultaneously is almost certainly faster. If you have too little, it might be (dramatically) faster to cut the problem into pieces that easily fit into memory.
LeChat
2020년 5월 7일
Sindar
2020년 5월 7일
It can sometimes get a little messy, but it may be fastest to generate the datasets in chunks (say, 10 at a time). I have not used parfor much, but I suspect that vectorized calculations may be faster than separate ones in parallel. Something like:
chunk_size = 10; % try to make this a divisor of Nr
tic
a = 2;
Tmax = 1e2;% which I want to increase by 1 order of magnitude
dt = 1e-2;% which I want to decrease by 1 order of magnitude
Nt = Tmax/dt;
t = (0:Nt-1)'*dt;
Nr = 1e1;
x = zeros(Nt,Nr);
y = zeros(Nt,Nr);
eps = 1e-3;
edt = 1e-2;
adt = 1e-4;
for ind=1:(Nr/chunk_size)
idx = (ind-1)*chunk_size + (1:chunk_size);
for i = 1:Nt-1
tti = 1./(1+t(i)-t(1:i));
ex = x(i,idx)-x(1:i,idx);
ey = y(i,idx)-y(1:i,idx);
exp1 = exp((-.25*(ex).^2+(ey).^2).*tti);
x(i+1,idx) = x(i,idx) + adt*sum(ex.*tti.^2.*exp1) + edt*randn(1,Nr);
y(i+1,idx) = y(i,idx) + adt*sum(ey.*tti.^2.*exp1) + edt*randn(1,Nr);
end
end
toc
you can likely eyeball a good chunk_size by observing memory usage (CTRL-SHIFT-ESC on Windows) and trying to get it close to ~90% of what you think your RAM is (e.g., 32 GB. I don't know how it calculates the percent at the top, but the denominator isn't consistent)
caveat: I've successfully used this framework in a much more convoluted problem where each loop contains a large quantity of intermediate data (imagine exp1 ran up to or large than x). For your problem, it's easy to implement, so worth checking, but it may be entirely useless
LeChat
2021년 4월 2일
The results between the originak version and the faster version of Sindar are different:
% Original:
exp1 = exp(-.25*((ex-x(1:i,:)).^2 + (ey-y(1:i,:)).^2) ./ tt);
% Faster:
exp1 = exp((-.25*(ex).^2 + (ey).^2).*tti)
In the 2nd version, the -0.25 is multiplied to the first term only. This reduces the runtime significantly, because the values are growing faster and EXP(Inf) is reached soon, which is much cheaper than finite calculation.
You need:
exp1 = exp((-0.25 * (ex.^2 + ey.^2) .* tti)
The last row of exp1 is 0, so it does not matter in the sum. Omitting it does not reduce the runtime significantly, although the number of EXP calls is reduced. See my answer.
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!