How to speed up loading of .mat files
조회 수: 190 (최근 30일)
이전 댓글 표시
I have around 200K .mat files which I need to analyze. It will take me a lot of time if I load each file to access a particular field of interest. I'll highly appreciate your good advice.
댓글 수: 0
답변 (3개)
Jan
2017년 3월 14일
Store the files on a SSD.
댓글 수: 7
Jan
2017년 9월 15일
@Andre: What does "Saving the file is just as bad" exactly mean? Storing a single array might be more efficient with fwrite in a binary format.
I thought of publishing an alternative save command, which uses 7zip (optimize for output size and time for reading, but slow writing) or minilzo (fast, but no powerful compression) for a compression. Unfortunately the details are critical: nested struct arrays containing function handles and user-defined objects, brrr. I cannot decide if I should implement a feature for extracting parts of the file (some variables or a slices of large arrays). It is easy to optimize such a tool for a specific purpose, but then it can never compete with the established, flexible and massively tested MAT format. Therefore I still use binary files without compression.
x = rand(1, 2e8); % 1.6GB data
tic;
f = fopen('test.dat', 'w');
fwrite(f, x, 'double');
fclose(f);
toc
Elapsed time is 7.466022 seconds.
About 210 MB/s with an old hard disk.
Jan
2017년 9월 15일
See: http://www.mathworks.com/matlabcentral/fileexchange/47698-savezip : This saves an array into a ZIP or GZIP file.
댓글 수: 0
David
2023년 8월 16일
편집: David
2023년 8월 16일
If the mat files are:
- large
- have a lot of variables or nested variables or structs, most of which you dont need
- saved as version 7.3 or later
Then it might be worth bypassing 'load' entirely and taking advantage of the fact that .mat v7.3 is just a HDF5.
Load the variable inside the files you want directly, without bothering with the variables you dont want to load. Itll load insanely fast, regardless of size.
Say you have a .mat file with path/to/my/file.mat with variables 'var1', 'var2', 'var3.a.b.c.d', and you just want var2 .
myVarName = 'var2';
myFile = fullfile('path','to','my','file.mat');
function argOut = quickLoad(myFile, myVarName)
% Get the location of the variable in the file using hdf syntax
% / by itself is the root of the file, then variables names come after
% Note: also works very nicely for nested structures where a.b.c.d.e would
% have varName as /a/b/c/d/e.
h5loc = ['/' myVarName]; % Always /, not like windows/linux filesep
% Open the file using H5F.
fid = H5F.open(pathToMatfile);
% Open the file using H5D.
dsetid = H5D.open(fid,h5loc);
% Load in the dataset
argOut = H5D.read(dsetid,h5Loc); % All done
% Clean up
H5D.close(dsetid);
H5F.close(fid);
Should include some input checking (existence of the variable without using 'whos' which is very slow), but that can be another post.
Try it out... itll make you happy.
myData = quickLoad(myFile, myVarName)
댓글 수: 2
David
2023년 8월 16일
Check if this file is actually v7.3 using:
tf = H5F.is_hdf(myFilePath);
Can check existence of the variable or dataset in another thread.
David
2023년 8월 16일
This will also work for nested structures, which can be handy:
myVarName = 'var3/a/b/c/d';
d = quickLoad(myFile, myVarName);
참고 항목
카테고리
Help Center 및 File Exchange에서 Whos에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!