Which MATLAB operations/functions need speeding up?

Question

Oliver Woodford 2013년 11월 20일

3
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up

편집: James Tursa 2018년 4월 19일

When using MATLAB, I sometimes come across a performance bottleneck caused by functionality that I believe could be much faster. It is worth bringing these observations to the attention of TMW.

Do you know of any such functionality?

Upvote the suggestions of others you'd like to see TMW tackle.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Daniel Shub 2013년 11월 20일

Jan provided a number of examples in this answer.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2013년 11월 21일

10
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115920

편집: Matt J 2013년 11월 21일

MATLAB Online에서 열기

SUB2IND and IND2SUB are very frequently used functions. Yet they are still implemented only in Mcode, rather than as compiled builtins.

Even better would be to allow numeric arrays to accept lists of subsripts, e.g.,

A({I,J})

could be equivalent to A(sub2ind(size(A),I,J)) instead of having to convert manually to linear indices.

댓글 수: 2
없음 표시없음 숨기기

Sean de Wolski 2013년 11월 21일

Interesting thought on {I,J}.

John D'Errico 2015년 6월 1일

I'd very strongly argue that this is how indexing SHOULD be done for scattered indices.

댓글을 달려면 로그인하십시오.

Answer 2

Cedric 2013년 11월 20일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115866

편집: Cedric 2013년 11월 21일

MATLAB Online에서 열기

EDITs

11/20, 5:00PM EST - Added comment about OOP.
11/20, 5:10PM EST - Added comment about TRY statements.
11/21, 8:50AM EST - Added comment about documentation.

I found a series of non-built-in functions with bottlenecks over the years, which were or will be converted to built-ins. When looking at the source code, most times the reason was the presence of "slow" tests like ISMATRIX. I usually rewrote these functions specifically for my applications, without unnecessary (in my context) tests. Even for built-ins though, I would appreciate having the possibility to set JIT directives at the beginning the the code, which allow e.g. to disable these tests (or to enable even heavier ones during the development phase). It could be interesting as well to set directives on a per function call basis, e.g.

 outVars = someBuiltin[JIT directives](inVars) ;
 [a,b,c] = foo[argchk=0](x,y) ;                % Or CSL of params/values, 
                                               %  or more elaborate.

Along with this could come an "advanced" version of the documentation which shows pseudo-code for each function with directive names for enabling/disabling blocks. One great improvement would be to have an indication about the method/algorithm as well.

I propose to put directives before the parentheses, so there is room for implementing the indexing of the output of functions later, e.g.

lastCol = foo[argchk=0](A,b)(:,end) ;

or even accessing methods/properties of object outputs (here, directives are params/values)..

surfaceArea = grid.getCell[argchk,0](28).getSurfaceArea() ;

Finally, it's a different type of optimization, but TRY statements could take arguments setting max memory usage or max execution time:

 try('maxExecTime', 1e4, 'maxMemoryUsage', 5e9)
     ...
 catch ME
     ...
 end

댓글 수: 2
없음 표시없음 숨기기

Matt J 2013년 11월 20일

편집: Matt J 2013년 11월 20일

MATLAB Online에서 열기

I propose to put directives before the parentheses, so there is room for implementing the indexing of the output of functions later, e.g.

I realize this is tangential to your main point but, for your non-builtins, you could probably get pretty close to this modifying my IndexableFunction class.

For example, currently you can get the final column of magic(5) by doing the following

    >> F=IndexableFunction(@magic);
    >> F{5}(:,end)
    ans =
        15
        16
        22
         3
         9

However, it should be easy enough to enable the syntax

F{version}{5}(:,end)

which would let you choose which of several versions of the function you wanted to run.

EDIT:

To some degree, this can even work with builtins now. For example, you could have

F=IndexableFunction(fbuiltin, fwrapper)

where fbuiltin() is some built-in function and fwrapper() is a wrapper for fbuiltin() that applies the heavier tests that you were mentioning. Then F{1}{args} could call fbuiltin and F{2}{args} could call fwrapper.

Cedric 2013년 11월 21일

This is neat actually. I will have a look tomorrow. Thanks!

댓글을 달려면 로그인하십시오.

Answer 3

Oliver Woodford 2013년 11월 20일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115814

MATLAB Online에서 열기

BSXFUN with builtin functions

Bsxfun is useful in that it allows us to vectorize elementwise operations between arrays of different sizes, without using repmat, and the additional time and memory that that requires, to make the arrays the same size. It is therefore disappointing to find that the repmat approach can sometimes be significantly faster, even when using builtin functions. For example:

>> A = uint64(rand(10, 1, 100) * (2 ^ 64));
>> B = uint64(rand(10, 100, 1) * (2 ^ 64));
>> disp(timeit(@() bsxfun(@bitxor, A, B)))
  0.0124
>> disp(timeit(@() bitxor(repmat(A, [1 100 1]), repmat(B, [1 1 100]))))
 2.7831e-04

In this case the repmat approach offers a 45 times speed up! (Note that I'm using the repmat from the Lightspeed Toolbox, as mentioned in another answer).

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Steven Lord 2016년 9월 19일

In release R2016b we support implicit expansion in a number of operators and functions.

Matt J 2016년 9월 19일

편집: Matt J 2016년 9월 19일

In release R2016b we support implicit expansion in a number of operators and functions.

That's great! Although, I can imagine people sticking with bsxfun for quite some time to stay compatible with older versions.

댓글을 달려면 로그인하십시오.

Answer 4

Walter Roberson 2013년 11월 21일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115894

OOP used to be very slow; I hear that it is faster now, but it needs to become competitive in speed, at most a small percentage slower. Otherwise, people are not going to bother learning it and using it.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 5

Royi Avital 2014년 8월 26일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_149854

The function `im2col`.

See here:

http://www.mathworks.com/matlabcentral/answers/151941-efficient-implementation-of-im2col-and-col2im

Those two functions are crucial for "Patch" and locally adaptive image kernels. And they take too long.

I think they should be rewritten using MKL / IPP and get multi threaded care.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 6

Oliver Woodford 2015년 4월 13일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_175029

CROSS

It is shameful how slow this basic function is. I wrote my own (still in standard MATLAB), and it was 5 times faster. But as a builtin it could be much faster still.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 7

Matt J 2013년 11월 20일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115860

편집: Matt J 2013년 11월 20일

MATLAB Online에서 열기

ACCUMARRAY

I still have my doubts about whether accumarray is maximally optimized, based on this example. Even if it is as close to optimized on the CPU, there's so much more I could do with it if I had GPU support.

Finally, the 4-argument syntax

accumarray(subs,val,[],fun);

can definitely be better optimized to handle certain choices for fun. Following are some examples of how I workaround the slow behavior of fun=@std and fun=@mean

n = 1e7; % size of daily data 
idx = randi(65160,n,1); % 65160 is for 181*360 latitude and longitude 
data = randn(n,1);
%%%%%%%%%%%MEAN
tic 
accummean = accumarray(idx,data,[],@mean); 
toc 
%Elapsed time is 2.649816 seconds.
tic
 N = accumarray(idx,1); 
 accummean = accumarray(idx,data)./N;
toc
%Elapsed time is 0.248310 seconds.
%%%%%%%%STD
tic 
accumstd = accumarray(idx,data,[],@(x)std(x,1)); 
toc 
%Elapsed time is 4.706466 seconds.
tic;
N = accumarray(idx,1,[]); 
idxn=N~=0; 
EX= accumarray(idx,data,[]); 
EX2= accumarray(idx,data.^2,[]); 
accumstd=sqrt(N.*EX2-EX.^2); 
accumstd(idxn)=accumstd(idxn)./N(idxn); 
toc 
%Elapsed time is 0.406712 seconds.

댓글 수: 2
없음 표시없음 숨기기

Oleg Komarov 2013년 11월 20일

MATLAB Online에서 열기

Another thing that slows accumarray is that the @fun is called even if there is only one element. This behaviour is also relevant for accuracy of results, e.g. say you wanted to accumulate

f = @(x) sum(x +.3 -.2 -.1), 
isequal(f(10),10) % 0

Oliver Woodford 2013년 11월 20일

편집: Oliver Woodford 2013년 11월 20일

Yes, accumarray is definitely ripe for improvement. If the 'fun' argument could also be a function name (string) then such optimisations would be possible internally. Same for bsxfun.

댓글을 달려면 로그인하십시오.

Answer 8

Matt J 2013년 12월 4일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_117268

편집: Matt J 2013년 12월 4일

MATLAB Online에서 열기

Operations that modify object properties

obj.prop(i)=1

always create temporary deep copies of the entire property even if prop is a huge array and i is a scalar index. This, of course, slows things down.

I believe this might be done to accomodate the flexibility of property set and get methods. See, for example, this thread, but there should be a property Attribute that can tell MATLAB that prop can have normal in-place access behavior like regular MATLAB arrays.

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Cedric 2013년 12월 9일

편집: Cedric 2013년 12월 9일

MATLAB Online에서 열기

Hi Matt. There is no need to perform a lot of subsasgn to observe a significant difference. Look at the following..

Test class:

 classdef LocalVsProp
    properties
        aProp
    end
    methods
        function obj = LocalVsProp( n )
            obj.aProp = zeros( n, 1 ) ;
        end
        function obj = add1_prop( obj )
            for k = 1 : length( obj.aProp )
                obj.aProp(k) = obj.aProp(k) + 1 ;
            end
        end
        function obj = add1_local( obj )
            aProp_ = obj.aProp ;
            for k = 1 : length( aProp_ )
                aProp_(k) = aProp_(k) + 1 ;
            end
            obj.aProp = aProp_ ;
        end
    end
 end

Test script:

 propLen = 100 ;
 nTest   = 1000 ;
 t_prop  = 0 ;
 t_local = 0 ;
 lvp = LocalVsProp( propLen ) ;
 for tId = 1 : nTest
    tic ;
    lvp.add1_prop() ;
    t_prop = t_prop + toc ;
    tic ;
    lvp.add1_local() ;
    t_local = t_local + toc ;    
 end
 fprintf( 'Prop: %.2es  Local: %.2es\n', t_prop, t_local ) ;

Results without profiler ( => JIT):

 >> LocalVsProp_test
 Prop: 3.60e-01s  Local: 4.92e-02s

Results from profiler ( => no JIT):

Matt J 2013년 12월 10일

편집: Matt J 2013년 12월 10일

There is no need to perform a lot of subsasgn to observe a significant difference. Look at the following..

Hi Cedric,

But you do appear to be doing lots of subsasgn operations (propLen of them) before assigning aProp_ back into obj.aProp.

My claim is that when you change only one of the elements obj.aProp(k), you would save no time using add1_local, and the execution time you do get would still be unnecessarily long.

Cedric 2013년 12월 10일

편집: Cedric 2013년 12월 10일

Hi Matt,

I agree with your claim, but 100 is not that much either. My point is that one shouldn't think "unless I perform millions of subsref/asgn, there is no need for a local copy", because the loss of efficiency can become significant as soon as more than a few "accesses" to properties are performed.

I have the issue with classes that are meant to become new types; the end user might not be able to "vectorize" his/her code, and might have to implement heavy loops. When I am not using local copies of properties which are used more than a few times, I can easily observe a factor two in execution time.

댓글을 달려면 로그인하십시오.

Answer 9

Oliver Woodford 2013년 11월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115809

편집: Oliver Woodford 2013년 11월 20일

REPMAT

Repmat is unnecessarily slow. Internally it produces an index array, which it then uses to create the output array. This is a gross waste of memory and performance. It is well known that repmat can be sped up significantly. Indeed, a much faster implementation can be found in the Lightspeed Toolbox. (I use this as standard, and thoroughly recommend it).

Given how often repmat is used (over 500 times in vanilla MATLAB functions alone), this function really is ripe for conversion to a fast builtin function.

Update: As of R2013b, REPMAT is a builtin function.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Matt J 2013년 11월 20일

편집: Matt J 2013년 11월 20일

repmat can still be faster than bsxfun. See the other answer regarding bsxfun.

Only because some bsxfun behavior is still not optimized, as you pointed out in that answer.

Regardless of speed considerations, repmat always forces one to use up memory for redundant copies of data. It's an unappealing principle. The better way, IMHO, would be to further empower bsxfun and render repmat unnecessary.

Also, it's not practical to do that to all the functions that I use which still use repmat, especially those that come bundled with MATLAB.

Impractical for you, yes. I meant that the MATLAB developers should do it.

Oliver Woodford 2013년 11월 20일

Copies are unappealing, but elementwise functions which make use of SSE instructions for multiple elements will run faster with the copy than calling the function once per element. The improvement of bsxfun I envisaged would do some copying internally to exploit this. But yes, if bsxfun were always faster then it should definitely replace repmat. However, given the amount of legacy code out there, it seems sensible to make repmat fast anyway. I'm glad TMW have obliged. Now bring on the improvement to bsxfun!

댓글을 달려면 로그인하십시오.

Answer 10

Oliver Woodford 2013년 11월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115811

편집: Oliver Woodford 2015년 6월 6일

MATLAB Online에서 열기

IMFILTER with separable filters on multi-channel images.

This particular use case of imfilter is actually slower than using conv2. This is no longer the case in R2015a.

For example:

>> A = rand(1000, 1000, 3);
>> f = ones(7, 1);
>> f_ = f * f';
>> disp(timeit(@() imfilter(A, f_)))
  0.1918
>> disp(timeit(@() imfiltsep(A, f, f)))
  0.0526

where

function I = imfiltsep(I, fy, fx)
% Compute the padding indices
[H, W, C] = size(I);
sympadding = @(N, n) [floor(n/2)+1:-1:2 1:N N-1:-1:N-ceil(n/2-0.5)];
Y = sympadding(H, numel(fy));
X = sympadding(W, numel(fx));
% For each channel separately
for c = C:-1:1
  % Compute the padded array
  J = I(Y,X,c);
  % Convolve
  J = conv2(fy(:), fx(:), J, 'valid');
  % Insert into array
  I(:,:,c) = J;
end
end

Imfilter should really be at least as fast as conv2, if not faster, given that it uses Intel's IPP. There is no excuse for it being almost 4 times slower (in this particular case).

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Oliver Woodford 2013년 11월 20일

W.r.t. comment #10, my case is slightly different: 2D separable filter, convolving a multi-channel image. Note how common this case is, e.g. Gaussian blurring!

Certainly lots of improvements to the performance of the IPT have been made, which is great. But when there's room for further improvement, I think it's worth highlighting. As you say, faster is always better.

Sadly I don't have the PCT. I wonder if the GPU version does exploit separability in this case.

Oliver Woodford 2015년 6월 6일

This is no longer the case in R2015a; imfilter is slightly faster than using conv2. I don't know from which release this improvement has been available though. I will investigate. But if somebody knows, please say.

댓글을 달려면 로그인하십시오.

Answer 11

Matt J 2013년 11월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115870

편집: Matt J 2013년 11월 20일

MATLAB Online에서 열기

griddedInterpolant

seems under-optimized to me. I often need to do very large interpolation operations on 3D volumes, similar to the following. Professional customized software for my applications always seem to manage to do these kinds of operations much quicker than the 5.2 seconds shown.

      N=400;
      A=rand(N,N,N);
      [x,y,z]=ndgrid(1:N);
      f=@(u) u(:)+rand(size(u(:)));
      x=f(x); y=f(y); z=f(z);
      G=griddedInterpolant(A);
      tic;
       w=G(x,y,z);
      toc;
      %Elapsed time is 5.203962 seconds.

Furthermore, when I run the above, the Task Manager shows that my CPU is greatly under-used (only 21% at peak and only 3 of the 12 cores are significantly active).

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Kelly Kearney 2013년 11월 21일

MATLAB Online에서 열기

In addition to the speed issues, griddedInterpolant should be expanded to allow non-rectilinear gridded data as input; as it currently stands, one is forced to use scatteredInterpolant to deal with such data, which is both horrifically slow and often returns wildly-out-of-range values along the borders of the convex hull (which I consider a bug, though the Mathworks debates me on that point).

For example:

 % Original data
 lon = linspace(-85, -78.5, 200);
 lat = linspace(22, 28.5, 200);
 depth = linspace(-100, 0, 10);
 data = permute(flow(length(lon)), [1 3 2]);
 idxz = round(linspace(1,size(data,3), length(depth))); 
 data = data(:,:,idxz);
 % Translate to meter-based coords, non-rectilinear
 E = referenceEllipsoid('earth'); % WGS84, m
 [long, latg, height] = ndgrid(lon, lat, depth);
 lambda = degtorad(long);
 phi    = degtorad(latg);
 [x1, y1, z1] = geodetic2ecef(phi, lambda, height, E);
 % Translate to meter-based coords, approximate representation, rectilinear
 dlon = distance(lat(1), lon(1), lat(1), lon, E);
 dlat = distance(lat(1), lon(1), lat, lon(1), E);
 [x2, y2, z2] = ndgrid(dlon, dlat, depth);
 % Grid to interpolate onto
 height2 = sort(-rand(size(data))*100, 3);
 [x3, y3, z3] = geodetic2ecef(phi, lambda, height2, E);
 % Interpolation tests
 tic
 Fs = scatteredInterpolant(x1(:), y1(:), z1(:), data(:), 'linear', 'none');
 t(1) = toc;
 tic
 Fg = griddedInterpolant(x2, y2, z2, data, 'linear', 'none');
 t(2) = toc;
 tic
 new1 = Fs(x3, y3, z3);
 t(3) = toc;
 tic
 new2 = Fg(x2, y2, height2);
 t(4) = toc;
 fprintf(['Time to build interpolant:\n scattered: %f\n gridded:   %f\n'...
          'Time to interpolate new data:\n scattered: %f\n gridded:   ' ...
          '%f\n'], t);

Results:

 Time to build interpolant:
  scattered: 12.749742
  gridded:   0.002571
 Time to interpolate new data:
  scattered: 2.084123
  gridded:   0.046301

Matt J 2013년 11월 21일

griddedInterpolant should be expanded to allow non-rectilinear gridded data as input...as it currently stands, one is forced to use scatteredInterpolant to deal with such data

I'm not sure I understand that point, Kelley. By "non-rectilinear" gridded data, don't you just mean scattered data? If so, then there's no obvious reason that griddedInterpolant could help you.

Kelly Kearney 2013년 11월 21일

No, I'm referring to data that definitely lies on a grid, but not necessarily a perfectly regular one. In my case, this is usually geographic data on a small scale (on the order of 1000km), defined on a regular latitude x longitude x depth grid. At this scale, any translation of the lat/lon coordinates (such as to meters, to match the typical depth unit) will result in a little bit of distortion from a perfectly orthogonal grid.

It seems to me that griddedInterpolant would be the better choice here, with neighbors and connectivity being determined by the organization of the input data. The triangulation built by scatteredInterpolant is not well-suited to datasets like this that are nearly-gridded... it can often return very weird values (for example, values well out of the input data range when using a nearest-neighbor interpolant, as the above example does).

Really, the best solution would be to stick with the lat/lon/depth grid but write a variation on griddedInterpolant that uses geographic distance rather than Euclidean. But I haven't gotten around to writing such a function yet...

댓글을 달려면 로그인하십시오.

Answer 12

Walter Roberson 2013년 11월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115899

MATLAB Online에서 열기

This is a bit indirect, but:

There needs to be a way to flush an output buffer, including for serial ports and instrument control. Here, I am using "flush" in the sense of sending the data now instead of waiting for a buffer to fill up or for a device timeout. This is not the same sense of "flush" as in Instrument Control Toolbox's "flushoutput" routine, which aborts pending output and removes it from the queue.

http://blogs.mathworks.com/loren/2006/04/19/high-performance-file-io/ response 5

The only way to guarantee the buffer is flushed is to use fclose if you open with A or A. Using a or w flushes the buffer after each write.

which is referring to file I/O. It is not completely clear to me that this applies to opening serial ports (or emulators thereof) by name, as that would fall under the fopen (serial) topic.

One particular place that this sort of flushing is needed is if one is speaking via a Serial over USB device. When data is sent to such a device, the data is not transmitted until the buffer is full or a timeout is reached; that timeout is 20 ms (max 50 cycles/second). USB drivers support a "push" request to send the data immediately, but MATLAB does not offer any method to request it. The Unix Way is for flush() or fflush() to take care of details like that... but those routines are not available in MATLAB. POSIX.1 defines that fseek() requires a write flush, and recommends fseek() to the same location (0 bytes relative to current position) in order to synchronize (such as to allow switching between read and write mode.) My testing appeared to indicate that using MATLAB's fseek() this behavior is not supported.

50 Hz maximum for serial interactions over USB is a substantial bottleneck these days.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 13

Royi Avital 2013년 11월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115903

filter

As published in File Exchange, simple loop unrolling is much faster than the current implementation. Also the GPU implementation isn't as good as expected.

randn

I'd be happy to see a faster random number generator.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Oliver Woodford 2013년 11월 27일

The question is roughly "what functions aren't as fast as they could be" rather than "... aren't as fast as we'd like them to be". In the case of randn is there any reason to believe it could be faster?

댓글을 달려면 로그인하십시오.

Answer 14

Oliver Woodford 2013년 11월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115921

편집: Oliver Woodford 2013년 11월 21일

MATLAB Online에서 열기

Linear 2D interpolation for multi-channel images

A common computation in computer vision, and many other fields, is sampling of a multi-channel image at non-integer pixel locations, using linear interpolation. MATLAB has no reasonably fast way of doing this. I use this functionality everyday in my work, and it is often an algorithmic bottleneck.

To demonstrate, I have the following benchmark

function test_interp2
A = rand(1000, 1000, 5);
X = rand(1e4, 1) * 1000;
Y = rand(1e4, 1) * 1000;
T = [timeit(@() ojw_interp2(A, X, Y)) timeit(@() interp2_(A, X, Y)) timeit(@() gridinterp2(A, X, Y)) timeit(@() gridinterp2_(A, X, Y))];
disp(T / min(T));
end
function B = interp2_(A, X, Y)
for a = size(A, 3):-1:1
  B(:,a) = interp2(A(:,:,a), X, Y);
end
end
function B = gridinterp2(A, X, Y)
F = griddedInterpolant(A);
B = F([reshape(repmat(Y, 1, 5), [], 1) reshape(repmat(X, 1, 5), [], 1) reshape(repmat(1:5, numel(X), 1), [], 1)]);
end
function B = gridinterp2_(A, X, Y)
for a = size(A, 3):-1:1
  F = griddedInterpolant(A(:,:,a));
  B(:,a) = F(Y, X);
end
end

the output of which is (on my PC)

1.0000 63.1709 12.9653 27.6083

This shows that ojw_interp2, which is my own C++ mex implementation (parallelized using OpenMP but not exploiting SSE instructions or IPP, so not especially optimized) is much faster than available MATLAB functionality (unless I'm missing another approach).

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Matt J 2013년 11월 21일

편집: Matt J 2013년 11월 21일

MATLAB Online에서 열기

You could just do this couldn't you

F = griddedInterpolant(A);
for a = size(A, 3):-1:1
  B(:,a) = F( Y, X);
end

similar to what you've done with interp2. It's not such a big loop if you have only 5 channels and better than replicating all that data. Also, it should be faster than interp2, since interp2 calls griddedInterpolant internally.

Oliver Woodford 2013년 11월 21일

No, you need to construct the interpolant in the loop. I've added that example to the benchmark in the answer. You'll see that replicating the data is faster.

댓글을 달려면 로그인하십시오.

Answer 15

Oliver Woodford 2013년 11월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_115925

편집: Oliver Woodford 2015년 8월 11일

MATLAB Online에서 열기

TYPECAST

Typecast uses a deep copy rather than a shared data copy, as explained by James Tursa in his typecastx FEX submission. This makes it much slower than necessary for large arrays. On my system:

>> A = rand(1e6, 1);
>> disp(timeit(@() typecast(A, 'uint8')) / timeit(@() typecastx(A, 'uint8')))
284.0158

That is an amazing speed up! I don't use typecast often, but it has been a critical component in a couple of methods (both times the typecast data was large, and was not then updated, making the pre-emptive copy unnecessary), and to know it could easily be that much faster is frustrating.

Thanks to Daniel, Jan Simon and James Tursa for leading me to this.

Update: As of R2014b, TYPECAST no longer copies the data in memory, so is much faster.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Oliver Woodford 2015년 8월 10일

Since R2014b, typecast no longer copies: http://blogs.mathworks.com/steve/2015/08/07/typecast-got-faster-in-r2014b/

Quite a few of the issues raised here have been fixed. Could TMW be reading this question? :)

댓글을 달려면 로그인하십시오.

Answer 16

Ilham Hardy 2015년 4월 13일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_175047

MATLAB Online에서 열기

datenum

and

cell2mat

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Oliver Woodford 2015년 6월 1일

Is there any reason to believe these functions are slower than they could be?

댓글을 달려면 로그인하십시오.

Answer 17

Oliver Woodford 2015년 6월 5일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_181667

MATLAB Online에서 열기

It needs a NOP function - a builtin function which does absolutely nothing, but extremely quickly.

In compiled languages like C++ people often place checks in their code, which are compiled out in Release builds, but left in for Debug builds, to find sources of errors more easily.

People sometimes also like to do this in MATLAB, for example as described here. However, the "compiled out" code (the checking function is replaced with a function which does nothing) still takes quite a long time to run.

Consider the following code:

function test()
%debug = @(varargin) disp(varargin{:});
debug = @nop;
c = 1;
for a = 1:1e6
  c = c + 1;
  debug(c);
end
end
function nop(varargin)
end

In R2015a the profiler outputs this:

The NOP function still takes around 20 times as long as a scalar addition. It would be great if it took 0s.

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

Philip Borghesani 2015년 9월 10일

편집: Philip Borghesani 2015년 9월 11일

MATLAB Online에서 열기

It is a bit uglier but try this in R2015b:

function test()
%debug = @(varargin) disp(varargin{:});
dodebug = 0;
c = 1;
for a = 1:1e6
  c = c + 1;
  if dodebug ,debug(c); end;
end
end

On my machine there is little difference between this and a commented out version (0.008 seconds vs 0.0055 seconds).

Walter Roberson 2015년 9월 10일

What would be the timing if you replace the function debug with an array of length (1e6 + 1) ? That is it would be converted into an array reference whose value is to be thrown away; hypothetically the JIT might do better on that.

댓글을 달려면 로그인하십시오.

Answer 18

Jan 2015년 10월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_195310

편집: Jan 2015년 10월 9일

See http://mathworks.com/matlabcentral/answers/77843#answer_87648

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 19

Oliver Woodford 2015년 11월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_200627

Matrix operations involving the transpose of a sparse matrix, especially J' * J and J' * K, where J is a very tall matrix. Constructing the transpose is costly in time, and also memory, in these cases. It would be much better to do the computation without explicitly doing the transpose.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 20

Oliver Woodford 2016년 9월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_235171

MATLAB Online에서 열기

MATLAB has no documented functions for reading JSON files, and the internal, undocumented method:

matlab.internal.webservices.fromJSON(fileread('myfile.json'))

is slow. In fact, it is over 10x slower than my own mexed version of a C++ library for doing the same task.

Given the ubiquity of JSON nowadays, MATLAB could be doing a lot better.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Steven Lord 2016년 9월 19일

The Release Notes list webread (introduced in release R2014b) and jsondecode and jsonencode (both introduced in release R2016b.) Do they satisfy your needs for reading JSON files?

댓글을 달려면 로그인하십시오.

Answer 21

Chad Greene 2016년 9월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_235172

The scatter function is incredibly slow for large datasets. I've had to switch to using Aslak Grinsted's fastscatter almost exclusively.

댓글 수: 2
없음 표시없음 숨기기

Walter Roberson 2016년 9월 18일

Chad, which version are you using?

One of the changes in the just-released R2016b is documented as

"Graphics Display: Render plots with large numbers of markers faster using less memory

Graphics that contain large numbers of markers have improved performance and use less memory."

It is not immediately clear whether that includes scatter() but it would seem probable that it should.

Chad Greene 2016년 9월 20일

Oh, good to know! I haven't checked out R2016b yet.

댓글을 달려면 로그인하십시오.

Answer 22

Hao Zhang 2018년 4월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_315774

편집: Stephen23 2018년 4월 18일

MATLAB Online에서 열기

Matrix indexing needs speeding up.

For a very simple example, if I run this:

clear;
clc;
close all;
N1=100;
N2=100;
A=(ones(N1,N2,'single'));
C=(ones(N1,N2,'single'));
tic;
for i=1:1e4
    %C(2:end,:)=C(2:end,:)-A(1:end-1,:);
    C=C-A;
end
toc;

I got Elapsed time is 0.056711 seconds.

Instead if I run the following:

clear;
clc;
close all;
N1=100;
N2=100;
A=(ones(N1,N2,'single'));
C=(ones(N1,N2,'single'));
tic;
for i=1:1e4
    C(2:end,:)=C(2:end,:)-A(1:end-1,:);
    %C=C-A;
end
toc;

I got: Elapsed time is 0.316735 seconds.

That is to say the most of the time Matlab is just doing the matrix indexing, Is there any way to improve this or avoid the indexing but get the same result? This could make my code almost 10 times faster!

댓글 수: 11
이전 댓글 9개 표시이전 댓글 9개 숨기기

James Tursa 2018년 4월 18일

편집: James Tursa 2018년 4월 18일

I was prompted to respond to this thread. Note that in R2018a the memory model for complex variables has changed to the "interleaved" format (matching Fortran/C/C++/BLAS/LAPACK etc). So all mex routines that rely on the old "separate" format either won't work (e.g. if they use hacks) or will run significantly slower (because of forced copy-in/copy-out data mechanics). In particular, the SHAREDCHILD mex routine mentioned above will not work in R2018a. I will plan to release updated versions of my mex routines compatible with R2018a at some point in the future, but that is not likely to happen for at least a couple of months.

Side Note: Fortran allows non-contiguous indexing very similar to MATLAB. In Fortran, in addition to the base memory address, "descriptor" indexing information gets silently passed around in function calls so that all routines have this information, much like Walter describes above. This is for the new explicit interface. If the old implicit interface is used, then this forces Fortran to use a copy-in/copy-out method for passing arguments around, similar to what MATLAB does with sub-array indexing.

Jan 2018년 4월 19일

MATLAB Online에서 열기

 > The problem is not Matlab here, but the need of creating temporary
 > subarrays repeatedly.
 But isn't that a problem with Matlab? ;) Why must indexing trigger
 the creation of a new array?

You have the same problem in other languages. E.g. C: You can create an MxN matrix either by allocating an array of M pointers, which store the sub-vectors of length N, or a contiguous block of M*N elements and store the dimensions separately. Then accessing the rows (or columns respectively) is cheap, but the columns (or rows) is expensive. If such a sub-vector is used repeatedly, e.g. for a matrix multiplication, it is cheaper to create a temporary vector to store the data in a contiguous block. Remember, that the CPU cache contain a "cache-line" of 64 bytes usually, such that accessing contiguous memory block is much cheaper.

It is not a problem of the language, but of storing a rectangular array in sequential memory, when the CPU is optimized to access contiguous blocks of memory.

With a FPGA you could create an architecture, which accesses rows and columns of a matrix with the same speed. See e.g. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.2797&rep=rep1&type=pdf

James Tursa 2018년 4월 19일

MATLAB Online에서 열기

@Hao Zhang: SHAREDCHILD will not work with non-contiguous sub-arrays, and should generate an error if you try. E.g.,

>> A = reshape(1:12,3,4)
A =
     1     4     7    10
     2     5     8    11
     3     6     9    12
>> A(1:end-1,:)
ans =
     1     4     7    10
     2     5     8    11
>> sharedchild(A,[1 inf],[-1 inf])
??? Error using ==> sharedchild
Indexing must be contiguous

Do you have a case where SHAREDCHILD did not generate an error for a non-contiguous sub-array? If so, this is a bug that needs to be fixed.

James Tursa 2018년 4월 19일

편집: James Tursa 2018년 4월 19일

The R2018a mex situation is much worse that I thought ...

https://www.mathworks.com/matlabcentral/answers/396103-mxcreateshareddatacopy-no-longer-supported-in-r2018a

댓글을 달려면 로그인하십시오.

Answer 23

Bruno Luong 2018년 4월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/106903-which-matlab-operations-functions-need-speeding-up#answer_315924

persistent statement

Not sure the reason (internal data hashing?) , but it is slow.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Which MATLAB operations/functions need speeding up?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (23개)

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 11
이전 댓글 9개 표시이전 댓글 9개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Which MATLAB operations/functions need speeding up?

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (23개)

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기