Deep Learning Datastores causing errors with size/length

조회 수: 4 (최근 30일)
Matthew Miller
Matthew Miller 2022년 7월 8일
답변: Matthew Miller 2022년 7월 14일

Please help me use datastores to train a neural network.
I have two cell arrays, both {259,1}. One holds sequences, one holds sequence responses. They are saved as A.mat and B.mat respectively. I want to put these arrays into a datastore and use them to train a network. When I run the cell arrays directly, without using a datastore, training the network does work. A has been padded so each cell is exactly 10x10000, and each cell of B is 1x10000.
I have tried the following things:
1)
AData = datastore('A.mat','type','file','ReadFcn',@load); %sequences
BData = datastore('B.mat','type','file','ReadFcn',@load); %responses
CData = combine(AData, BData); %combination
... %layers, options, hyperparameters, etc.
[test.net, test.info] = trainNetwork(CData, layers, options); %train network
Error:
Error using trainNetwork (line 184)
Invalid training data. Predictors must be a N-by-1 cell array of sequences, where N is the number of sequences. All sequences
must have the same feature dimension and at least one time step.

Error in DL_T3_ds (line 96)[test.net, test.info] = trainNetwork(CData, layers, options);

>> preview(CData) %for clarity

ans =

1×2 cell array

{1×1 struct} {1×1 struct}

2) C3_data.mat is a file that contains only A and B arrays.

sequenceData = datastore('C3_data.mat','type','file','ReadFcn',@load); %C3_data is A and B combined to one file
... %layers, options, and hyperparameters
[test.net, test.info] = trainNetwork(sequenceData, layers, options); %train network
Error:
Error using trainNetwork (line 184)
Invalid training data. For a network with 1 inputs and 1 output, the datastore read function must return a cell array with 2
columns, but it returns an cell array with 1 columns.

Error in DL_T3_ds (line 99)[test.net, test.info] = trainNetwork(sequenceData, layers, options);

3) Using a function to avoid load creating struct

function varargout = loadStructFromFile(fileName)
varargout = struct2cell(load(fileName));
end

AData = datastore('A.mat','type','file','ReadFcn',@loadStructFromFile);
BData = datastore('B.mat','type','file','ReadFcn',@loadStructFromFile);
CData = combine(AData, BData);

[test.net, test.info] = trainNetwork(CData, layers, options);

Error using trainNetwork (line 184)
Unexpected input size: The input layer expects sequences with the same sequence length and channel dimension 10.

Error in DL_T3_ds (line 99)
[test.net, test.info] = trainNetwork(CData, layers, options);


>> preview(CData) %for clarity

ans =

1×2 cell array

{259×1 cell} {259×1 cell}

I would appreciate any help.

채택된 답변

Ben
Ben 2022년 7월 12일
I think the issue here is just wrangling the datastore-s to read out a BatchSize x 2 cell array, where each cell in the first column contains only the numeric input data, and each cell in the 2nd column contains the response data (numeric or categorical).
This gets confusing because the combine method for datastores is wrapping data in an additional cell.
Here's an example to show how to make this work with dummy data:
% Setup fake data - I'll use a sequence length of 100 rather than 10,000.
x = randn(10,100);
save('x1','x');
y = randn(1,100);
save('y1','y');
x = randn(10,100);
save('x2','x');
y = randn(1,100);
save('y2','y');
% Create datastores to read in data.
% You want to just get the data out of the struct that is loaded. I found it easiest to write a simple function:
getVarFromStruct = @(strct,varName) strct.(varName);
xds = fileDatastore("x*","ReadFcn",@(fname) getVarFromStruct(load(fname),"x"),"FileExtensions",".mat");
yds = fileDatastore("y*","ReadFcn",@(fname) getVarFromStruct(load(fname),"y"),"FileExtensions",".mat");
% Combine
cds = combine(xds,yds);
% Note that cds.read now returns a 1x2 cell array, and each cell contains numeric data (not another cell!).
% Dummy network training
layers = [sequenceInputLayer(10);lstmLayer(1);regressionLayer];
opts = trainingOptions("adam","MaxEpochs",1);
net = trainNetwork(cds,layers,opts);

추가 답변 (1개)

Matthew Miller
Matthew Miller 2022년 7월 14일
You are exactly correct.
There was another layer of cells between the data and the datastore, and that was causing the error. The trainNetwork function was reading this second layer and throwing the error. I believe that's what you meant by 'additional cell.'
I saved each array as a single-variable file and used the filedatastore implementation you demonstrated. It worked perfectly. Thank you very much for your help.
MM

카테고리

Help CenterFile Exchange에서 Image Data Workflows에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by