Speed up loading struct from file.

조회 수: 13 (최근 30일)
Mitchell Tillman
Mitchell Tillman 2021년 8월 27일
댓글: Walter Roberson 2021년 8월 28일
Hi,
I am looking for a way to speed up saving & loading ~8GB of data. Currently, it is all contained within one structure. The structure has a format similar to the code below - there is also some metadata at each level of the struct not shown here.
for subNum=1:10; % 10 subjects
for trialNum=1:50; % 50 trials per subject
for dataStreamNum=1:50; % 50 data streams per subject
dataMatrix=rand(3,3000); % Each data stream is 3x3000
structName.Subject(subNum).Trial(trialNum).Data(dataStreamNum).Matrix=dataMatrix; % Data in matrix form
end
end
end
I looked into matfile to be able to load just part of the structure, but found that matfile doesn't allow for accessing specific fields. This post made me start thinking about splitting up each trial into its own separate .mat file (in this example there would be 500 .mat files, each of which is a smaller struct). So, I have two questions in total:
  1. Is there an alternative to matfile that would allow me to load just one trial at a time, from an 8GB struct, such as:
structName.Subject(4).Trial(15);
2. If there is no such alternative, if I use the load() command on 500 .mat files one at a time (for a total of 8GB of data), would that be noticeably slower or faster than using load() on 1 8GB .mat file?
Thank you!
  댓글 수: 1
Walter Roberson
Walter Roberson 2021년 8월 28일
With files over 2 GB, to save as a .mat file, you would have to be using -v7.3 flag, which causes the writing to be done in HDF5 format. HDF5 format is not all that efficient for arrays of struct; it more or less requires that each array member be stored as a sub-dataset and then have the struct array internally be an array of references to sub-datasets.
Because of this, you might want to experiment to see what you can do with NetCDF 3 -- 3.6 and later has large file support. But beware that NetCDF 4 is HDF5 underneath...

댓글을 달려면 로그인하십시오.

답변 (1개)

Chunru
Chunru 2021년 8월 28일
It seems that you have very regular data. Instead of using struct, you can simply use N-D numerical array which is faster and more efficient. Using matfile, you can easily get a small portion of data.
% for subNum=1:10; % 10 subjects
% for trialNum=1:50; % 50 trials per subject
% for dataStreamNum=1:50; % 50 data streams per subject
% dataMatrix=rand(3,3000); % Each data stream is 3x3000
% structName.Subject(subNum).Trial(trialNum).Data(dataStreamNum).Matrix=dataMatrix; % Data in matrix form
% end
% end
% end
Data(3, 3000, 50, 50, 10);
  댓글 수: 1
Mitchell Tillman
Mitchell Tillman 2021년 8월 28일
Thanks for the suggestion, but I actually do need the structure format. This was just a rough outline of my data format. The length of the data streams actually vary significantly, and the number of data streams per trial and trials per subject too. Because of that, and because I have lots of associated metadata too, I need the structure format.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 HDF5에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by