Why does how I initialize my large matrices make such a big difference?
조회 수: 7 (최근 30일)
이전 댓글 표시
Can someone explain the results I'm seeing with the code below? The speed of my code depends significantly on how I initialize some large matrices. I have a pair of large 3D matrices (e.g. 3000 x 3000 x 10) inside of a function that gets called many times. In my actual application it's > 1000 times, but in this toy problem it's only 10x iterations.
Wrapper function:
clc
% Set number of loops to call myfunc.m
N_loop = 10;
% Set dimensions of data
n1 = 3000; n2 = 3000; n3 = 10; % Full
% Fast loop
tic
for ii = 1:N_loop
myfunc_fast(n1,n2,n3);
end
fprintf('Fast version t = %1.6f sec\n',toc)
% Slow loop
tic
for ii = 1:N_loop
myfunc_slow(n1,n2,n3);
end
fprintf('Slow version t = %1.6f sec\n',toc)
So there are two versions of this function, a "fast" version where I'm initializing the B matrix using the zeros(n1,n2,n3) call.
function myfunc_fast(n1,n2,n3)
number_elements = n1*n2*n3; % Number of elements
A = zeros(n1,n2,n3); % Initialize A
% B = A; % THIS SLOWS DOWN THE CODE
B = zeros(n1,n2,n3); % THIS IS OK!
ind = randi([1,number_elements]); % Generate a random index
A(ind) = B(ind) + 1; % Do a simple read/write
end
And a "slow" version where I initialize A, and then set B = A. I figured "hey this should be slightly faster since I'm eliminating a call to the zeros() function", but this ends up being waaaay slower.
function myfunc_slow(n1,n2,n3)
number_elements = n1*n2*n3; % Number of elements
A = zeros(n1,n2,n3); % Initialize A
B = A; % THIS SLOWS DOWN THE CODE
% B = zeros(n1,n2,n3); % THIS IS OK!
ind = randi([1,number_elements]); % Generate a random index
A(ind) = B(ind) + 1; % Do a simple read/write
end
The output is:
Fast version t = 0.001108 sec
Slow version t = 2.867316 sec
I'm guessing what's happening is that when I set B = A, internally matlab is "smart enough" to not actually create a new variable and just share the memory space, but then when I modify A later inside myfunc_slow.m, it has to go back and allocate the memory that was once shared between A and B, which ends up taking longer.
Can anyone explain what's going on here and offer any best practices to pass along?
I'm using R2022b on a Windows laptop
Thanks!
댓글 수: 0
답변 (1개)
Yash
2024년 3월 17일
Hi Michael,
You have correctly identified that when you assign an array to a second variable MATLAB does not allocate new memory right away. Instead, it creates a copy of the array reference. However, if you modify any elements of the memory block using either "A" or "B", MATLAB allocates new memory, copies the data into it, and then modifies the created copy. This technique is known as "Copy-On-Write". You can read more about copying arrays and its memory footprint here: https://www.mathworks.com/help/matlab/matlab_prog/memory-allocation.html
"myfunc_fast" has faster execution time as compared to "myfunc_slow" because MATLAB's memory management system is optimized for operations like allocating arrays of zeros. Also "myfunc_fast" does not have additional overheads like "Copy-On-Write" and doesn't need to check integrity of shared data.
Refer here for more info and best practices on performance and memory: https://www.mathworks.com/help/matlab/performance-and-memory.html
참고 항목
카테고리
Help Center 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!