Dealing with Large training datasets saved in a number of .mat files

조회 수: 8 (최근 30일)
Ruohao Zhang
Ruohao Zhang . 2020년 8월 10일
댓글: Ruohao Zhang . 2020년 8월 21일
Hello all,
I have run into a problem where I need to train a LSTM signal classifier with huge amount of data.
Each 1D signal is around 100k samples, every 48 signals are saved in a .mat file. The total number of .mat files is around 2000.
The labels are similarly saved in corresponding .mat files in a different folder.
I would like to know if there's a way to train the network without the necessity of loading the whole thing into memory. (with 64GB ram I can only load ~1300files at once)
Your help will be very much appreciated.

채택된 답변

Divya Gaddipati
Divya Gaddipati 2020년 8월 13일
You can use fileDatastore for this purpose.
trainData = fileDatastore('/path/to/data', 'ReadFcn', @load, 'FileExtensions', '.mat');
You can either use "load" or your own custom function defining how to load the data.
You can also refer to this link for more information on training LSTM while loading data using fileDatastore.
  댓글 수: 1
Ruohao Zhang
Ruohao Zhang 2020년 8월 18일
Thank you for this answer, what I ended up doing is to decompose all the files that I currently have into smaller files each containing only one matrix and create a datastore from there. The problem I encountered is the loading function is acting weird when you call it in filedatastore. when I load the .mat in the command it returns as a matrix where when it is loaded in by calling filedatastore and combine, it loaded as a 1x1 cell. So I have to write my own load func to make sure the correct format

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Frantz Bouchereau
Frantz Bouchereau 2020년 8월 20일
편집: Frantz Bouchereau 님. 2020년 8월 20일
You can use two signalDatastores - one to read your signal files and another one to read your labels. You can then combine them using combine(), split the combinedDatastore into training and test sets using subset() and then feeding the combined datastores into the training function of the LSTM network.
With signslDatastore you do not need to write a load function. You specify the variable names you want read from the mat file and those are returned at every read.
  댓글 수: 1
Ruohao Zhang
Ruohao Zhang 2020년 8월 21일
Thank you for your reply. Indeed, signaldatastores can be really helpful, but sadly this function is only introduced in 2020a where I am still using 2019b. It can be a good motivation to update my software now:)

댓글을 달려면 로그인하십시오.


Help CenterFile Exchange에서 AI for Signals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by