Find out why mat files differ in size

조회 수: 14 (최근 30일)
Jan Kappen
Jan Kappen 2024년 3월 18일
답변: Jan Kappen 2024년 3월 25일
I'm developing a rather complex class hierachy with a few GB of data embedded in its instances which might get saved to mat files for later analysis.
I refactored a lot to improve memory and CPU footprints (using dependent properties, customized loadobj and saveobj methods etc) and saw that the resulting mat file grows in size (using save() with v7.0 and enabled compression). I screwed it up.
I have some old reference mat files from the former versions that are smaller (~30%). However if I load them using the current class definitions, the resulting objects in RAM are almost exactly (just <1% difference) in size (using the great getArrayFromByteStream function, see Serializing/deserializing Matlab data - Undocumented Matlab). That means I can't infer from the instantiated objects, what grew in size.
Question: How do I find out what really gets saved to the mat file, i.e. which variable/object is much larger compared to the old versions?
I can roll-back to my former version via Git, but that does not really help me to understand, why exactly the mat files got bigger.
Any ideas?
Thanks,
Jan

채택된 답변

Jan Kappen
Jan Kappen 2024년 3월 25일
Got it fixed.
I've followed a similar approach as @Samay Sagar proposed, but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

추가 답변 (1개)

Samay Sagar
Samay Sagar 2024년 3월 25일
You can utilize the "whos" command for thorough examination of variable sizes within MATLAB objects, facilitating the discernment of any modifications in variable dimensions present in MAT files.
Here is a sample script to identify changes in MAT file:
% Extract variables of interest
oldVariables = whos('-file', 'old_version.mat');
newVariables = whos('-file', 'new_version.mat');
% Compare variable sizes
for i = 1:length(oldVariables)
oldSize = oldVariables(i).bytes;
newSize = 0; % Initialize new size
% Find corresponding variable in new version
for j = 1:length(newVariables)
if strcmp(oldVariables(i).name, newVariables(j).name)
newSize = newVariables(j).bytes;
break;
end
end
if newSize == 0
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Variable not found in new version\n\n');
else
sizeChange = newSize - oldSize;
percentageChange = (sizeChange / oldSize) * 100;
fprintf('%s:\n', oldVariables(i).name);
fprintf(' Old Size: %d bytes\n', oldSize);
fprintf(' New Size: %d bytes\n', newSize);
fprintf(' Size Change: %d bytes (%.2f%%)\n\n', sizeChange, percentageChange);
end
end
Read more about “whos” here:
  댓글 수: 1
Jan Kappen
Jan Kappen 2024년 3월 25일
Thank you very much for that approach. Unfortunately, it looks like that does not work with handle class objects. Plus, I just had one variable in that mat file, a big class object that capsules all the data.
I've followed a similar approach but ultimately used getArrayFromByteStream, see Serializing/deserializing Matlab data - Undocumented Matlab. And I checked out the old version of my library in a second MatLab session and compared all properties step by step, skipping Dependent properties via reflection.
Root cause: I've split a data table (class table) into two class objects which should've used dependent properties, and an internal table to store the data. Turned out I forgot to make one block of properties transient/dependent to avoid saving them.
Afterwards, the mat file sizes were basicaly the same - quite interesting that there's no difference if the table is saved or a wrapping class around it - both can get compressed very efficiently, very nice Mathworks!
PS, just found out that mat files can be compared visually too: Compare and Merge MAT-Files - MATLAB & Simulink (mathworks.com) and that it can even "look" into objects, but not arbitrarily nested. But it could also be a good starting point:

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Workspace Variables and MAT-Files에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by