How to speed up loading of .mat files

조회 수: 127 (최근 30일)
Faisal Ahmed
Faisal Ahmed 2017년 3월 14일
댓글: David 2023년 8월 16일
I have around 200K .mat files which I need to analyze. It will take me a lot of time if I load each file to access a particular field of interest. I'll highly appreciate your good advice.

답변 (3개)

Jan
Jan 2017년 3월 14일
Store the files on a SSD.
  댓글 수: 7
Steven Lord
Steven Lord 2017년 9월 15일
You may be able to use the matfile function to access your data on disk.
Jan
Jan 2017년 9월 15일
@Andre: What does "Saving the file is just as bad" exactly mean? Storing a single array might be more efficient with fwrite in a binary format.
I thought of publishing an alternative save command, which uses 7zip (optimize for output size and time for reading, but slow writing) or minilzo (fast, but no powerful compression) for a compression. Unfortunately the details are critical: nested struct arrays containing function handles and user-defined objects, brrr. I cannot decide if I should implement a feature for extracting parts of the file (some variables or a slices of large arrays). It is easy to optimize such a tool for a specific purpose, but then it can never compete with the established, flexible and massively tested MAT format. Therefore I still use binary files without compression.
x = rand(1, 2e8); % 1.6GB data
tic;
f = fopen('test.dat', 'w');
fwrite(f, x, 'double');
fclose(f);
toc
Elapsed time is 7.466022 seconds.
About 210 MB/s with an old hard disk.

댓글을 달려면 로그인하십시오.


Jan
Jan 2017년 9월 15일
See: http://www.mathworks.com/matlabcentral/fileexchange/47698-savezip : This saves an array into a ZIP or GZIP file.

David
David 2023년 8월 16일
편집: David 2023년 8월 16일
If the mat files are:
  • large
  • have a lot of variables or nested variables or structs, most of which you dont need
  • saved as version 7.3 or later
Then it might be worth bypassing 'load' entirely and taking advantage of the fact that .mat v7.3 is just a HDF5.
Load the variable inside the files you want directly, without bothering with the variables you dont want to load. Itll load insanely fast, regardless of size.
Say you have a .mat file with path/to/my/file.mat with variables 'var1', 'var2', 'var3.a.b.c.d', and you just want var2 .
myVarName = 'var2';
myFile = fullfile('path','to','my','file.mat');
function argOut = quickLoad(myFile, myVarName)
% Get the location of the variable in the file using hdf syntax
% / by itself is the root of the file, then variables names come after
% Note: also works very nicely for nested structures where a.b.c.d.e would
% have varName as /a/b/c/d/e.
h5loc = ['/' myVarName]; % Always /, not like windows/linux filesep
% Open the file using H5F.
fid = H5F.open(pathToMatfile);
% Open the file using H5D.
dsetid = H5D.open(fid,h5loc);
% Load in the dataset
argOut = H5D.read(dsetid,h5Loc); % All done
% Clean up
H5D.close(dsetid);
H5F.close(fid);
Should include some input checking (existence of the variable without using 'whos' which is very slow), but that can be another post.
Try it out... itll make you happy.
myData = quickLoad(myFile, myVarName)
  댓글 수: 2
David
David 2023년 8월 16일
Check if this file is actually v7.3 using:
tf = H5F.is_hdf(myFilePath);
Can check existence of the variable or dataset in another thread.
David
David 2023년 8월 16일
This will also work for nested structures, which can be handy:
myVarName = 'var3/a/b/c/d';
d = quickLoad(myFile, myVarName);

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Workspace Variables and MAT-Files에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by