trainNetwork error unable to read file

조회 수: 3 (최근 30일)
Fred
Fred 2020년 4월 4일
댓글: Daniel Csata 2022년 10월 29일
HI all,
I am learning to train a convolutional network for image classification on the cloud. As a first step, I am following the example named "Train Network in the Cloud Using Automatic Parallel Support" on Mathworks.
I have started my cluster successfully and uploaded the cifar10 image library to my Amazon S3 bucket.
I then create succssefully the datastore using:
imdsTrain = imageDatastore('s3://mybucket/cifar10/train', ...
'IncludeSubfolders',true, ...
'LabelSource','foldernames');
My problem comes at the training level, where I use:
options = trainingOptions('sgdm', ...
'ExecutionEnvironment','parallel', ... % Turn on automatic parallel support.
'InitialLearnRate',initialLearnRate, ... % Set the initial learning rate.
'MiniBatchSize',miniBatchSize, ... % Set the MiniBatchSize.
'Verbose',true, ... % Do not send command line output.
'Plots','training-progress', ... % Turn on the training progress plot.
'L2Regularization',1e-10, ...
'MaxEpochs',50, ...
'Shuffle','every-epoch', ...
'ValidationData',imdsTest, ...
'ValidationFrequency',floor(numel(imdsTrain.Files)/miniBatchSize), ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropFactor',0.1, ...
'LearnRateDropPeriod',45);
net = trainNetwork(augmentedImdsTrain,layers,options);
the training starts, the display of the training starts with the indication: "initializing input data normalization"
However it stops quickly with the error message:
Error in test_parallel_cloud (line 77)
net = trainNetwork(augmentedImdsTrain,layers,options);
Caused by:
Error using nnet.internal.cnn.DistributedDispatcher/computeInParallel (line
193)
Error detected on worker 1.
Error using matlab.io.datastore.ImageDatastore/read (line 77)
Unable to read file: 's3://mybucket/cifar10/train/deer/image35398.png'.
Error using matlab.io.datastore/DsFileReader (line 113)
Could not find file : s3://mybucket/cifar10/train/deer/image35398.png
every time I rerun the code it seems to stop on another image it cannot read. However the image is always on the bucket and do not seems to be corrupt when I check using imshow.
Can you see where the problem is?
  댓글 수: 7
Fouzia Adjailia
Fouzia Adjailia 2020년 5월 1일
hello,
I'm having a similar problem to yours and I would highly appreciate it if you can help me.
I created an image data store with a costumised read function called @formoccupancygrid, when I run my code using the parallel I get this error:
Error using classifyData (line 33)
Error detected on worker 1.
Caused by:
Error using matlab.io.datastore.ImageDatastore/readall (line 42)
Error using ReadFcn @UNKNOWN Function for file
D:\--*******************************
Undefined function handle.
I solved this problem using a parfevalOnAll, it excutes the function in all the workers. after that I have anotehr error which stats that the files don't exist, I added the files to the attached files and path in the additional path in the cluster profile manager but with no luck
looking forward to your reply.
Daniel Csata
Daniel Csata 2022년 10월 29일
Hi!
I just ran into this same exact problem. Could you please tell me exactly how you solved it with the parpool function? Because it seems like that didnt work for me or I did something wrong.
Thank you,
Daniel

댓글을 달려면 로그인하십시오.

답변 (1개)

Harsha Priya Daggubati
Harsha Priya Daggubati 2020년 4월 7일
  댓글 수: 1
Fred
Fred 2020년 4월 7일
Hi,
thanks for the help!
yes I carefully followed all steps mentioned one by one.
The only deviation is that I had to set up number of workers to 1 and not 8. That is because the aws system has limits on the vCPU number I can use and the instance I am using (p2.xlarge) has only one GPU.
The problem occurs when running the TrainNetwork function on the "train a network in the cloud using a buil-in parallel support" page.
Fred

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Parallel and Cloud에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by