Why is x(:) so much slower than reshape(x,N,1) with complex arrays?

Question

Matt J 2021년 7월 27일

7
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/887219-why-is-x-so-much-slower-than-reshape-x-n-1-with-complex-arrays

편집: Matt J 2022년 5월 26일

The two for loops below differ only in the flattening operation used to obtain A_1D . Why is the run time so much worse with A_3D(:) than with a call to reshape()?

Nx = 256;
Ny = 256;
Nz = 128;
N = Nx*Ny*Nz;
A0 = rand(N,1);
tic
for k = 1:20
    B = reshape( A0, [Nz,Ny,Nx] ) ;
    A_3D = fftn(B);
    A_1D = reshape( A_3D, N,1); %<--- Version 1
end
toc
Elapsed time is 3.770859 seconds.
tic
for k = 1:20    
    B = reshape( A0, [Nz,Ny,Nx] ) ;
    A_3D = fftn(B);
    A_1D = A_3D(:); %<--- Version 2
end
toc
Elapsed time is 5.056827 seconds.

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

Stephen23 2021년 7월 28일

편집: Stephen23 2021년 7월 28일

@Bruno Luong: does RESHAPE also copy the data?

If not, then does this mean that one array in memory can be linked to two or more meta-headers (with different array sizes)?

Bruno Luong 2021년 7월 28일

I must admit that understanding why/when MATLAB make data copy become obscure to me since few years now. I did not come to a full understanding of how it works.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2021년 7월 28일

4
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/887219-why-is-x-so-much-slower-than-reshape-x-n-1-with-complex-arrays#answer_755564

The following simple test seems to support @Bruno Luong's conjecture that (:) results in data copying. The data of B1 resulting from reshape() has the same data pointer location as A, but B2 generated with (:) points to different data.

format debug
A=complex(rand(2),rand(2))
A = 
Structure address = 7f3f47f4e0e0
m = 2
n = 2
pr = 7f3fcb0112e0

   0.5114 + 0.6181i   0.5881 + 0.4450i
   0.5713 + 0.9018i   0.3682 + 0.8103i
B1=reshape(A,4,1),
B1 = 
Structure address = 7f3fcf1f4be0
m = 4
n = 1
pr = 7f3fcb0112e0

   0.5114 + 0.6181i
   0.5713 + 0.9018i
   0.5881 + 0.4450i
   0.3682 + 0.8103i
B2=A(:)
B2 = 
Structure address = 7f3f47e45a20
m = 4
n = 1
pr = 7f3faff0b980

   0.5114 + 0.6181i
   0.5713 + 0.9018i
   0.5881 + 0.4450i
   0.3682 + 0.8103i

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

Matt J 2021년 7월 28일

편집: Matt J 2021년 7월 28일

Mathworks tech support got back to me. As @Bruno Luong predicted, they claim this to be a feature since R2015b. Apparently, because subsref indexing operations generally result in data copying (paraphrasing), it was decided this would be true for A0(:) as a special case as well. Why this is only true for complex A0 and not real A0, I did not get a clear answer on.

I understand that you are observing the differences in performances between reshape and colon operation.

Since MATLAB R2015b, the colon operator, A0(:) is an indexing operation. For the provided code, MATLAB is going through every row and column, which is not computationally fast.

On the other hand, the ‘reshape’ command will only change the property of the created array, which is a rather fast process.

For your interests, I have also timed the code across different releases of MATLAB. The result is documented below:

MATLAB 8.3.0.85671 (R2014a)

Colon operator: 7.1884e-07

Reshape: 1.0690e-06

MATLAB 8.5.0.204617 (R2015a)

Colon operator: 6.4574e-07

Reshape: 1.0706e-06

MATLAB 8.6.0.267246 (R2015b)

Colon operator: 0.0487

Reshape: 5.1078e-07

MATLAB 9.0.0.341360 (R2016a)

Colon operator: 0.0493

Reshape: 5.6105e-07

MATLAB 9.6.0.1072779 (R2019a)

Colon operator: 0.041046

Reshape: 1.0141e-06

MATLAB 9.9.0.1467703 (R2020b)

Colon operator: 0.040691

Reshape: 7.0104e-07

MATLAB 9.10.0.1684407 (R2021a) Update 3

Colon operator: 0.040806

Reshape: 5.7803e-07

You can see the changes happened since MATLAB R2015b. If you would like further details on what has been altered under the hood, please feel free to reach out. Otherwise, I will close the case for now. Please do not hesitate to let me know if you have further questions on the matter.

G A 2021년 8월 14일

Walter, I am discussing complex valued arrays, it can be

max(A,[],'all')

but anyway for a complex number max(A) = max(abs(A))

Walter Roberson 2021년 8월 14일

The (:) options are the slowest. reshape(abs(A),N,1) might possibly be the fastest -- there is notable variation in different runs.

Nx = 256;

Ny = 256;

Nz = 128;

N = Nx*Ny*Nz;

A0 = complex(randn(Nx, Ny, Nz), randn(Nx, Ny, Nz));

t(1) = timeit(@() use_abs_all(A0, N), 0)

t = 0.0937

t(2) = timeit(@() use_abs_colon(A0, N), 0)

t = 1×2

0.0937 0.1727

t(3) = timeit(@() use_abs_reshape_null(A0, N), 0)

t = 1×3

0.0937 0.1727 0.0994

t(4) = timeit(@() use_abs_reshape_N(A0, N), 0)

t = 1×4

0.0937 0.1727 0.0994 0.0935

t(5) = timeit(@() use_all(A0, N), 0)

t = 1×5

0.0937 0.1727 0.0994 0.0935 0.1012

t(6) = timeit(@() use_colon(A0, N), 0)

t = 1×6

0.0937 0.1727 0.0994 0.0935 0.1012 0.1802

t(7) = timeit(@() use_reshape_null(A0, N), 0)

t = 1×7

0.0937 0.1727 0.0994 0.0935 0.1012 0.1802 0.1013

t(8) = timeit(@() use_reshape_N(A0, N), 0)

t = 1×8

0.0937 0.1727 0.0994 0.0935 0.1012 0.1802 0.1013 0.1018

cats = categorical({'abs(all)', 'abs(:)', 'reshape(abs,[])','reshape(abs,N)', 'all', '(:)', 'reshape([])', 'reshape(N)'});

bar(cats, t)

function B = use_abs_all(A, N)

B = max(abs(A), [], 'all');

end

function B = use_abs_colon(A, N)

B = max(abs(A(:)));

end

function B = use_abs_reshape_null(A, N)

B = max(reshape(abs(A), [], 1));

end

function B = use_abs_reshape_N(A, N)

B = max(reshape(abs(A), N, 1));

end

function B = use_all(A, N)

B = max(A, [], 'all');

end

function B = use_colon(A, N)

B = max(A(:));

end

function B = use_reshape_null(A, N)

B = max(reshape(A, [], 1));

end

function B = use_reshape_N(A, N)

B = max(reshape(A, N, 1));

end

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2021년 7월 28일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/887219-why-is-x-so-much-slower-than-reshape-x-n-1-with-complex-arrays#answer_755289

Nx = 256;
Ny = 256;
Nz = 128;
N = Nx*Ny*Nz;
A0 = rand(Nx, Ny, Nz);
timeit(@() use_colon(A0, N), 0)
ans = 8.3490e-06
timeit(@() use_reshape_null(A0, N), 0)
ans = 6.5490e-06
timeit(@() use_reshape_N(A0, N), 0)
ans = 6.0925e-06
function use_colon(A, N)
   B = A(:);
end
function use_reshape_null(A, N)
    B = reshape(A, [], 1);
end
function use_reshape_N(A, N)
   B = reshape(A, N, 1);
end

In this particular test, the timing is close enough that we can speculate some reasons:

Using an explicit size to reshape to is faster than reshape([]) because reshape([]) has to spend time calculating the size based upon dividing numel() by the size of the known parameters.

Using (:) versus reshape() is not immediately as clear. The model for (:) is that it invokes subsref() with struct('type', {'()'}, 'subs', {':'}) and then subsref() has to invoke reshape() . I point out "model" because potentially the Execution Engine could optimize all of this, and one would tend to think that optimization of (:) should be especially good.

댓글 수: 10
이전 댓글 8개 표시이전 댓글 8개 숨기기

Adam Danz 2021년 8월 10일

편집: Adam Danz 2021년 8월 10일

When I run your example (modified to store and plot values) using the run feature (first plot) and using Matlab online (second plot) I get conflicting results.

Nx = 256;

Ny = 256;

Nz = 128;

N = Nx*Ny*Nz;

A0 = rand(Nx, Ny, Nz);

T = nan(1,3);

T(1) = timeit(@() use_colon(A0, N), 0);

T(2) = timeit(@() use_reshape_null(A0, N), 0);

T(3) = timeit(@() use_reshape_N(A0, N), 0);

bar(categorical({'colon','reshapeNull','reshape'}),T)

title('Run feature')

function use_colon(A, N)

B = A(:);

end

function use_reshape_null(A, N)

B = reshape(A, [], 1);

end

function use_reshape_N(A, N)

B = reshape(A, N, 1);

end

Results of the exact same code using Matlab Online (same platform and Matlab release)

When I run it on my local copy of Matlab (same release, Windows 10 Pro), the first time the colon method was slower but on subsequent runs, it was faster than the reshape methods. There were also some warnings that the measured time may be inaccurate due fast execution. Using the tic/toc method with repeated measures to measure variability, on my system the colon method with real numbers is fastest.

Nx = 256;
Ny = 256;
Nz = 128;
N = Nx*Ny*Nz;
A0 = rand(Nx, Ny, Nz);
n = numel(A0); 
nIterations = 500;  % number of iterations to include within the timer
nReps = 100;        % number of times to repeate the process to measure variability 
durations = nan(nReps,3);
for i = 1:nReps
    
    T = tic; 
    for j = 1:nIterations
        y = A0(:); 
    end
    durations(i,1) = toc(T); 
    
     T = tic; 
    for j = 1:nIterations
        y = reshape(A0,[],1); 
    end
    durations(i,2) = toc(T); 
    
     T = tic; 
    for j = 1:nIterations
        y = reshape(A0,n,1); 
    end
    durations(i,3) = toc(T);
end
figure
boxplot(durations, 'labels',{'colon','reshapeNull','reshape'})
grid on
ylabel(sprintf('Duration of %d iterations (sec)',nIterations))
xlabel('Method')
title(sprintf('Summary of tic/toc timing repeated %d times (real numbers).',nReps))
subtitle('Win 10 Pro; R2021A update 4')

Walter Roberson 2021년 8월 10일

편집: Walter Roberson 2021년 8월 11일

I took your earlier plot version and ran it on my desktop, and on the Run feature here, and in LiveScript on my desktop. I modified it to scale the plot relative to the slowest, to make it easier to compare relative rates. I also modified it to return values from the functions, to avoid the possibility that Execution Engine might optimize away the work because of the variable not being returned,

Desktop .m and .mlx, colon was fastest in all tests.

The time requirements did not vary much for the .m version. reshape([]) was typically pretty much 2.5 times slower than colon.

The time requirements for the colon test for the .mlx varied quite a lot, sometimes taking twice as long. The reshape() timings did not vary nearly as much. Because of that, the relative ratios between colon and reshape([]) varied quite bit, from about 1.5 to 4.

Bringing the code over to the Run feature here, colon was almost always slowest. Furthermore, the minimum timings (for reshape(N)) were pretty much 10 times slower than what I was seeing on my desktop -- where that reshape would take about 6e-7 on desktop, it takes about 6e-6 here in the Run feature.

Walter Roberson 2021년 8월 11일

@Adam Danz, I could use another pair of eyes in looking at this.

I noticed when I was running your code on my desktop, that every time I had a large timing outlier on colon. My tests showed that it was always the very first run. When I poked around, I realized that there had to be some kind of internal optimization going on. To reduce the effects of "premature optimization", I moved the operative code into functions, and I added recreation of A0 for each repetition.

Please run the below code with seperate set true and false, and notice the substantial difference in rates for the runs.

To try to deal with the initial spike in timings for colon, I decided that I would call the work functions once, "prime the pump". That was not enough, so now I loop calling them several times, warm up the system, get all the Execution Engine optimization of the functions out of the way. But... with separate = false, I am still seeing the spike on duration(1,1) !!

The only thing I have been able to think of at the moment is that when I prime the pump, I am not saving the output of the calls to a variable, and that might be affecting the timing ??

By the way, have a look at the recorded d2 values -- the timing of the priming cycles. They are notably different than the other timings... and I see unexpected spikes early on, optimized times mixed with unoptimized times.

Nx = 256;
Ny = 256;
Nz = 128;
N = Nx*Ny*Nz;
nIterations = 500;  % number of iterations to include within the timer
nReps = 100;        % number of times to repeate the process to measure variability 
durations = nan(nReps,3);
d2 = nan(nReps,3);
seperate = false;
if ~seperate; A0 = rand(Nx, Ny, Nz); end
for i = 1:nReps
    
    if seperate; A0 = rand(Nx, Ny, Nz); end
    tic; for j = 1 : 5; A0_colon(A0,N); end; d2(i,1) = toc; %prime the pump
    T = tic; 
    for j = 1:nIterations
        y = A0_colon(A0,N); 
    end
    durations(i,1) = toc(T); 
    
    if seperate; A0 = rand(Nx, Ny, Nz); end
    tic; for j = 1 : 5; A0_reshape_null(A0,N); end; d2(i,2) = toc; %prime the pump
    T = tic; 
    for j = 1:nIterations
        y = A0_reshape_null(A0,N); 
    end
    durations(i,2) = toc(T); 
    
    if seperate; A0 = rand(Nx, Ny, Nz); end
    tic; for j = 1 : 5; A0_reshape_N(A0, N); end; d2(i,3) = toc;   %prime the pump
    T = tic; 
    for j = 1:nIterations
        y = A0_reshape_N(A0,N); 
    end
    durations(i,3) = toc(T);
end
figure
boxplot(durations, 'labels',{'colon','reshapeNull','reshape'})
grid on
ylabel(sprintf('Duration of %d iterations (sec)',nIterations))
xlabel('Method')
title(sprintf('Summary of tic/toc timing repeated %d times (real numbers).',nReps))
function y = A0_colon(A0,~)
    y = A0(:);
end
function y = A0_reshape_null(A0,~)
    y = reshape(A0, [], 1);
end
function y = A0_reshape_N(A0,N)
    y = reshape(A0, N, 1);
end

Walter Roberson 2021년 8월 11일

I had the hypothesis that the 5 might have to do with my having 4 cores, or might have to do with the number of priming iterations I did, so I tested on my system that has more cores, and I did more priming iterations. The result was the same: duration(1,1) still had the major peak, and duration(5,1) was reliably a seconary peak.

Adam Danz 2021년 8월 12일

I noticed that when I re-run it within a script without clearing variables, the second peak at x=5 vanishes. Still curious but out of ideas.

댓글을 달려면 로그인하십시오.

Answer 3

Matt J 2022년 5월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/887219-why-is-x-so-much-slower-than-reshape-x-n-1-with-complex-arrays#answer_972250

편집: Matt J 2022년 5월 26일

I was just told by Tech Support that the issue was fixed in R2022a, but it doesn't appear that way:

Nx = 256;
Ny = 256;
Nz = 128;
N = Nx*Ny*Nz;
A0 = rand(Nx, Ny, Nz);
A0=complex(A0,A0);
timeit(@() A0(:), 0)
ans = 0.0530
timeit(@() use_reshape_null(A0, N), 0)
ans = 6.5199e-06
timeit(@() use_reshape_N(A0, N), 0)
ans = 6.8033e-06
function use_reshape_null(A, N)
    B = reshape(A, [], 1);
end
function use_reshape_N(A, N)
   B = reshape(A, N, 1);
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Why is x(:) so much slower than reshape(x,N,1) with complex arrays?

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

채택된 답변

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (2개)

댓글 수: 10
이전 댓글 8개 표시이전 댓글 8개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Why is x(:) so much slower than reshape(x,N,1) with complex arrays?

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

채택된 답변

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

추가 답변 (2개)

댓글 수: 10 이전 댓글 8개 표시이전 댓글 8개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

댓글 수: 10
이전 댓글 8개 표시이전 댓글 8개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기