How to pre-allocate changeable size arrays in a for-loop?

조회 수: 28(최근 30일)
Hi all!
I have a quite large script, where I deal with different particle size classes (or intervals) depending on given input. Each of the size classes (i=1:NrInt) will result in a cell array with a certain size (nr rows depends on nr of particles, nr columns depends on nr of chemical elements present). These arrays are then submitted to a number of operations in order to produce correspondent pie charts (1 pie per particle size interval).
I received part of this script from a previous colleague, which I then tried to adapt to my specific problem. After lots of error and trial, the script seems to be working just fine, but I still need to improve speed. As several variables are changing size inside the loop, I'm getting the warning preallocate messages all over the place.
The code for that part is in attachment. Can someone help me out?
Many thanks!
  댓글 수: 3
Adam
Adam 2017년 3월 20일
편집: Adam 2017년 3월 20일
Ah, well, dialog boxes will skew profiler results probably in terms of percentage time taken in each part of the program. If the program is waiting for user input it will be sat there with the timer ticking away on that function, giving an erroneous evaluation of the time spent on the function itself.
Your code is too complicated for me to just glance at in the time I have though and make any valid suggestions.
You really ought to make use of blank lines in code for readability! I wall of text covering 30 lines or more is really hard to read.
I notice you are using cell arrays and these are never good for performance. Maybe they are totally necessary here, but if you can in any way use numeric arrays instead of cell arrays that would likely improve performance.
As for preallocation, sometimes you simply cannot do it if you have no idea how big your array will be beforehand. If you can estimate an upper bound on the size then you can presize it to this and then just trim it down to the smallest size it can be after the loop. Sometimes I do this if I can make a sensible estimate. Otherwise you can tell it to ignore that warning message. You are right to look into the message and try to solve it first though - only disable warnings when you have evaluated them and are happy to ignore them for valid reasons.

댓글을 달려면 로그인하십시오.

채택된 답변

Guillaume
Guillaume 2017년 3월 23일
편집: Guillaume 2017년 3월 24일
I'm with Adam and Dhruvesh, your code is very difficult to parse. Better indentation (select all code and press CTRL+I), more white spaces and comments would greatly help.
At a quick glance, I fail to see which variable cannot be pre-allocated. They all seem to be indexed by i which you know will have NrInt steps.
Like Adam, I wonder if all these cell arrays are necessary. There are also several number to string conversions. That's never going to be fast.
I also noticed several instances of
somevar = find(someexpression)
othervar(somevar) = ...
which can be replaced by
othervar(someexpression) = ...
There's no point in using find to convert the logical array returned by someexpression into explicit indices when you can use that logical array directly for indexing. The find call just slow things down.
Note that if you cannot preallocate (which is perfectly fine) you can get rid of the warning either by right clicking on the squigly line and selecting Suppress ... on ..., or adding %#ok<AGROW> at the end of the line.
  댓글 수: 3
Guillaume
Guillaume 2017년 3월 24일
The code is indeed a lot easier to read.
That the size of the arrays stored in each cell of the cells array differs does not matter. The number of cells of the cell array is fixed at NrInt. Hence all your cell arrays and vectors could be pre-allocated. The only variable whose size is unknown at the beginning of the loop appears to be TotLabel. That one however could be created after the loop.
So any cell array and vector can be predeclared with, e.g.
ClassHistFinal = cell(1, NrInt);
Nrparticles = zeros(1, NrInt); %for vectors
As said, TotLabel can be calculated after the loop has ended with:
TotLabel = [Label{:}];
There are a lot of what looks like intermediary results that are stored in cell arrays. Is it really necessary? Do you really need to keep all the intermediary results ClassNor, ElementsAllZero, etc. If not get rid of the indexing.
I personally prefer building strings with fprintf or fprintf and a format string, rather than string concatenation and num2str. So I'd rather have:
fprintf('Number of particles between %g and %g µm : %d(%g%%)\n', ThresholdVal(i:i+1), NrParticles(i), NrParticles(i)/TotalNrParticles*100)
than your
disp(horzcat(...
That won't have an impact on speed. I just find the former more readable and it's easier to customise the number display

댓글을 달려면 로그인하십시오.

추가 답변(1개)

Dhruvesh Patel
Dhruvesh Patel 2017년 3월 23일
Your code indeed is difficult to read. However here are some general pointers which might help you undersatnd what is going on under the hood when MATLAB resizes an array. The way MATLAB works while resizing an array when more elements are asked for by the for-loop is nicely explained in the following answer. It talks about both, the normal arrays as well as cell arrays.
So, it is always a good idea to take an estimate for the size and pre-allocate using that as this would mean that MATLAB will not have to resize atleast till the size reaches this estimated value. This would improve execution time as well as reduce memory fragmentation. Ideally if you have an upper bound for your loop iterations (looks like its 'NrInt' in your case) you can pre-allocate using that.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by