Which MATLAB operations/functions need speeding up?
조회 수: 26(최근 30일)
표시 이전 댓글
When using MATLAB, I sometimes come across a performance bottleneck caused by functionality that I believe could be much faster. It is worth bringing these observations to the attention of TMW.
Do you know of any such functionality?
Upvote the suggestions of others you'd like to see TMW tackle.
답변(23개)
Matt J
2013년 11월 21일
편집: Matt J
2013년 11월 21일
SUB2IND and IND2SUB are very frequently used functions. Yet they are still implemented only in Mcode, rather than as compiled builtins.
Even better would be to allow numeric arrays to accept lists of subsripts, e.g.,
A({I,J})
could be equivalent to A(sub2ind(size(A),I,J)) instead of having to convert manually to linear indices.
댓글 수: 2
John D'Errico
2015년 6월 1일
I'd very strongly argue that this is how indexing SHOULD be done for scattered indices.
Cedric Wannaz
2013년 11월 20일
편집: Cedric Wannaz
2013년 11월 21일
EDITs
- 11/20, 5:00PM EST - Added comment about OOP.
- 11/20, 5:10PM EST - Added comment about TRY statements.
- 11/21, 8:50AM EST - Added comment about documentation.
I found a series of non-built-in functions with bottlenecks over the years, which were or will be converted to built-ins. When looking at the source code, most times the reason was the presence of "slow" tests like ISMATRIX. I usually rewrote these functions specifically for my applications, without unnecessary (in my context) tests. Even for built-ins though, I would appreciate having the possibility to set JIT directives at the beginning the the code, which allow e.g. to disable these tests (or to enable even heavier ones during the development phase). It could be interesting as well to set directives on a per function call basis, e.g.
outVars = someBuiltin[JIT directives](inVars) ;
[a,b,c] = foo[argchk=0](x,y) ; % Or CSL of params/values,
% or more elaborate.
Along with this could come an "advanced" version of the documentation which shows pseudo-code for each function with directive names for enabling/disabling blocks. One great improvement would be to have an indication about the method/algorithm as well.
I propose to put directives before the parentheses, so there is room for implementing the indexing of the output of functions later, e.g.
lastCol = foo[argchk=0](A,b)(:,end) ;
or even accessing methods/properties of object outputs (here, directives are params/values)..
surfaceArea = grid.getCell[argchk,0](28).getSurfaceArea() ;
Finally, it's a different type of optimization, but TRY statements could take arguments setting max memory usage or max execution time:
try('maxExecTime', 1e4, 'maxMemoryUsage', 5e9)
...
catch ME
...
end
댓글 수: 2
Walter Roberson
2013년 11월 21일
OOP used to be very slow; I hear that it is faster now, but it needs to become competitive in speed, at most a small percentage slower. Otherwise, people are not going to bother learning it and using it.
댓글 수: 0
Royi Avital
2014년 8월 26일
The function `im2col`.
See here:
Those two functions are crucial for "Patch" and locally adaptive image kernels. And they take too long.
I think they should be rewritten using MKL / IPP and get multi threaded care.
댓글 수: 0
Matt J
2013년 11월 20일
편집: Matt J
2013년 11월 20일
ACCUMARRAY
I still have my doubts about whether accumarray is maximally optimized, based on this example. Even if it is as close to optimized on the CPU, there's so much more I could do with it if I had GPU support.
Finally, the 4-argument syntax
accumarray(subs,val,[],fun);
can definitely be better optimized to handle certain choices for fun. Following are some examples of how I workaround the slow behavior of fun=@std and fun=@mean
n = 1e7; % size of daily data
idx = randi(65160,n,1); % 65160 is for 181*360 latitude and longitude
data = randn(n,1);
%%%%%%%%%%%MEAN
tic
accummean = accumarray(idx,data,[],@mean);
toc
%Elapsed time is 2.649816 seconds.
tic
N = accumarray(idx,1);
accummean = accumarray(idx,data)./N;
toc
%Elapsed time is 0.248310 seconds.
%%%%%%%%STD
tic
accumstd = accumarray(idx,data,[],@(x)std(x,1));
toc
%Elapsed time is 4.706466 seconds.
tic;
N = accumarray(idx,1,[]);
idxn=N~=0;
EX= accumarray(idx,data,[]);
EX2= accumarray(idx,data.^2,[]);
accumstd=sqrt(N.*EX2-EX.^2);
accumstd(idxn)=accumstd(idxn)./N(idxn);
toc
%Elapsed time is 0.406712 seconds.
댓글 수: 2
Matt J
2013년 12월 4일
편집: Matt J
2013년 12월 4일
Operations that modify object properties
obj.prop(i)=1
always create temporary deep copies of the entire property even if prop is a huge array and i is a scalar index. This, of course, slows things down.
I believe this might be done to accomodate the flexibility of property set and get methods. See, for example, this thread, but there should be a property Attribute that can tell MATLAB that prop can have normal in-place access behavior like regular MATLAB arrays.
댓글 수: 5
Cedric Wannaz
2013년 12월 10일
편집: Cedric Wannaz
2013년 12월 10일
Hi Matt,
I agree with your claim, but 100 is not that much either. My point is that one shouldn't think "unless I perform millions of subsref/asgn, there is no need for a local copy", because the loss of efficiency can become significant as soon as more than a few "accesses" to properties are performed.
I have the issue with classes that are meant to become new types; the end user might not be able to "vectorize" his/her code, and might have to implement heavy loops. When I am not using local copies of properties which are used more than a few times, I can easily observe a factor two in execution time.
Matt J
2013년 11월 20일
편집: Matt J
2013년 11월 20일
griddedInterpolant
seems under-optimized to me. I often need to do very large interpolation operations on 3D volumes, similar to the following. Professional customized software for my applications always seem to manage to do these kinds of operations much quicker than the 5.2 seconds shown.
N=400;
A=rand(N,N,N);
[x,y,z]=ndgrid(1:N);
f=@(u) u(:)+rand(size(u(:)));
x=f(x); y=f(y); z=f(z);
G=griddedInterpolant(A);
tic;
w=G(x,y,z);
toc;
%Elapsed time is 5.203962 seconds.
Furthermore, when I run the above, the Task Manager shows that my CPU is greatly under-used (only 21% at peak and only 3 of the 12 cores are significantly active).

댓글 수: 3
Kelly Kearney
2013년 11월 21일
No, I'm referring to data that definitely lies on a grid, but not necessarily a perfectly regular one. In my case, this is usually geographic data on a small scale (on the order of 1000km), defined on a regular latitude x longitude x depth grid. At this scale, any translation of the lat/lon coordinates (such as to meters, to match the typical depth unit) will result in a little bit of distortion from a perfectly orthogonal grid.
It seems to me that griddedInterpolant would be the better choice here, with neighbors and connectivity being determined by the organization of the input data. The triangulation built by scatteredInterpolant is not well-suited to datasets like this that are nearly-gridded... it can often return very weird values (for example, values well out of the input data range when using a nearest-neighbor interpolant, as the above example does).
Really, the best solution would be to stick with the lat/lon/depth grid but write a variation on griddedInterpolant that uses geographic distance rather than Euclidean. But I haven't gotten around to writing such a function yet...
Walter Roberson
2013년 11월 21일
This is a bit indirect, but:
There needs to be a way to flush an output buffer, including for serial ports and instrument control. Here, I am using "flush" in the sense of sending the data now instead of waiting for a buffer to fill up or for a device timeout. This is not the same sense of "flush" as in Instrument Control Toolbox's "flushoutput" routine, which aborts pending output and removes it from the queue.
The only way to guarantee the buffer is flushed is to use fclose if you open with A or A. Using a or w flushes the buffer after each write.
which is referring to file I/O. It is not completely clear to me that this applies to opening serial ports (or emulators thereof) by name, as that would fall under the fopen (serial) topic.
One particular place that this sort of flushing is needed is if one is speaking via a Serial over USB device. When data is sent to such a device, the data is not transmitted until the buffer is full or a timeout is reached; that timeout is 20 ms (max 50 cycles/second). USB drivers support a "push" request to send the data immediately, but MATLAB does not offer any method to request it. The Unix Way is for flush() or fflush() to take care of details like that... but those routines are not available in MATLAB. POSIX.1 defines that fseek() requires a write flush, and recommends fseek() to the same location (0 bytes relative to current position) in order to synchronize (such as to allow switching between read and write mode.) My testing appeared to indicate that using MATLAB's fseek() this behavior is not supported.
50 Hz maximum for serial interactions over USB is a substantial bottleneck these days.
댓글 수: 0
Royi Avital
2013년 11월 21일
filter
As published in File Exchange, simple loop unrolling is much faster than the current implementation. Also the GPU implementation isn't as good as expected.
randn
I'd be happy to see a faster random number generator.
Oliver Woodford
2015년 6월 5일
댓글 수: 7
Walter Roberson
2015년 9월 10일
What would be the timing if you replace the function debug with an array of length (1e6 + 1) ? That is it would be converted into an array reference whose value is to be thrown away; hypothetically the JIT might do better on that.
Oliver Woodford
2016년 9월 18일
댓글 수: 1
Steven Lord
2016년 9월 19일
The Release Notes list webread (introduced in release R2014b) and jsondecode and jsonencode (both introduced in release R2016b.) Do they satisfy your needs for reading JSON files?
Chad Greene
2016년 9월 18일
The scatter function is incredibly slow for large datasets. I've had to switch to using Aslak Grinsted's fastscatter almost exclusively.
댓글 수: 2
Hao Zhang
2018년 4월 18일
편집: Stephen23
2018년 4월 18일
Matrix indexing needs speeding up.
For a very simple example, if I run this:
clear; clc; close all; N1=100; N2=100; A=(ones(N1,N2,'single')); C=(ones(N1,N2,'single')); tic; for i=1:1e4 %C(2:end,:)=C(2:end,:)-A(1:end-1,:); C=C-A; end toc;
I got Elapsed time is 0.056711 seconds.
Instead if I run the following:
clear; clc; close all; N1=100; N2=100; A=(ones(N1,N2,'single')); C=(ones(N1,N2,'single')); tic; for i=1:1e4 C(2:end,:)=C(2:end,:)-A(1:end-1,:); %C=C-A; end toc;
I got: Elapsed time is 0.316735 seconds.
That is to say the most of the time Matlab is just doing the matrix indexing, Is there any way to improve this or avoid the indexing but get the same result? This could make my code almost 10 times faster!
댓글 수: 11
James Tursa
2018년 4월 19일
편집: James Tursa
2018년 4월 19일
The R2018a mex situation is much worse that I thought ...
Bruno Luong
2018년 4월 18일
persistent statement
Not sure the reason (internal data hashing?) , but it is slow.
댓글 수: 0
참고 항목
범주
Find more on Matrix Indexing in Help Center and File Exchange
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!