Using save with -v7.3 takes a long time and the mat file size is enormous

조회 수: 51 (최근 30일)
I tried to save with -v7 the file size was 18 MB while with using -v7.3 it's 6 GB !!!
  댓글 수: 4
Adam
Adam 2016년 11월 10일
편집: Adam 2016년 11월 10일
Yes, but save what in the mat file? The size of the mat file is directly proportional to the size of whatever variables you are saving in it, plus some overhead for the structure. That overhead should not be almost 6 GB though.
Walter Roberson
Walter Roberson 2016년 11월 10일
Can you make the 18 megabyte version available through something like Google Drive?

댓글을 달려면 로그인하십시오.

채택된 답변

George
George 2016년 11월 10일
I've run into this before too. From the matfile page there's a note
"Note: Version 7.3 MAT-files use an HDF5 based format that requires some overhead storage to describe the contents of the file. For cell arrays, structure arrays, or other containers that can store heterogeneous data types, Version 7.3 MAT-files are sometimes larger than Version 7 MAT-files."
Using the -v7 option was my remedy as well.
  댓글 수: 3
Omar Abdelkader
Omar Abdelkader 2016년 11월 10일
@George yea the -v7 works fine but only with small files @Adam that's why I'm asking the question because the difference it too huge I don't understand how or why although when i try to run it with a 32 bit matlab it gives me out of Memory error
Mike
Mike 2020년 2월 27일
I have a scenario where saving with v7.3 results in a 750 MB mat file whereas saving with v7 results in a 3.4 MB mat file. The data i was saving was an array of Simulink.SimulationOutput returned from a parsim command.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Rik van der Weij
Rik van der Weij 2020년 6월 8일
편집: Walter Roberson 2020년 6월 8일
tried the following:
a = ones(15000);
save('a.mat', 'a'); % 800kb file
save('b.mat', 'a', '-v7.3'); % 11 mb file
The same problem I have with real data. My file gets flagged for 2GB limit, although any file I save in reality is much smaller, and I'm forced to save in -v7.3 and then the file size gets really, really large.
  댓글 수: 1
Walter Roberson
Walter Roberson 2020년 6월 8일
-v7 MAT files have 32 bit size counters. For any particular variable, the process is to generate the uncompressed variable (which must therefore stay within the limits of the 32 bit counters), and then run a compression routine on it and store the compressed version. There is no clever algorithm to do piecewise packing into segments that each individually fit into 2 GB or 4 GB compressed, there is just the raw (uncompressed, not-clever) serialized representation and the LZW version of that, in -v7 files.
Unfortunately, Yes, -v7.3 HDF files are not nearly as compact as one might hope.
Poking at b.mat with an HDF viewer, I see that it was created with GZIP level 3 compression, 169.972:1 compression ratio, which is 99.4%. When I wrote those 1's out in binary with no overhead (just double precision numbers) I find that gzip -3 does indeed compress to 99.4% (though smaller than the .mat file). I find that even gzip -9 only compresses to 99.8%, leaving a file that is over 2 1/2 megabytes.
Now, if I take that gzip -9 result and pass it through gzip -9 again, then I get a super small file, only 8553 bytes, so there is still a lot of redundant information left after the 99.4 or 99.8% compression, but gzip -3 or gzip -9 cannot find that in one pass.
It looks to me as if the HDF5 specification permits a couple of compression options that could sometimes be more effective, but it does turn out that what MATLAB is invoking is not unreasonable -- it isn't Mathwork's fault that libz's gzip -3 or even gzip -9 do not do nearly as well as one might hope.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Google에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by