Nan problem ( validation loss and mini batch loss) in Transfer Learning with SSD ResNet50
조회 수: 6 (최근 30일)
이전 댓글 표시
I am trying to use SSD Resnet50 for transfer Learning on a data set (Images) with resolution of 640x360, with one class as output. I followed the example of Matlab for vehicle detection.
I set the network input size to [300 300] and kept the same options for training.
However, when training starts, the first iteration both the mini batch loss and the validation loss go to NAN.
Following suggestions and answers on this forum, I start by lowering the learning rate and I tested several values 1e-1, 1e-3, 1e-5, 1e-15, I changed also the VerboseFrequency to 50, 10 and 1 but I get the same errors (mini batch loss and the validation loss go to NAN).
I tried also to initialize the weights and the bias of the first conv layer with lower values, however I get the same error.
conv01 = convolution2dLayer([7,7],64,'Stride',2,'Padding',[3,3,3,3],'BiasLearnRateFactor',1,'name','conv1');
conv01.Weights = gpuArray(single(randn([7 7 3 64])*1e-15));
conv01.Bias = gpuArray(single(randn([1 1 64])*0.00001+1));
I tried to run the vehicle detection example and it runs perfectly, so I double checked my data, the images in my dataset is in jpg format in 8bits as in vehicle dataset.
I think I am missing something here. I have attached the script plus a screen shot of the output that shows the Nan below.
Any help is very appreciated.
addpath('C:\dataset');
%%
%Load the pedestrian ground truth data.
data = load('labelling640360.mat');
gTruth = data.gTruth;
pedestriandataset=[gTruth.DataSource.Source data.gTruth.LabelData];
pedestriandataset.Properties.VariableNames([1])={'imageFilename'};
pedestriandataset(1:4,:)
summary(pedestriandataset)
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Split the data
%Use 60% of the data for training set and the rest for the test set.
rng(0);
shuffledIndices = randperm(height(pedestriandataset));
idx = floor(0.6 * length(shuffledIndices));
trainingDataTbl = pedestriandataset(shuffledIndices(1:idx), :);
testDataTbl = pedestriandataset(shuffledIndices(idx+1:end), :);
%Create an image datastore for loading the images.
imdsTrain = imageDatastore(trainingDataTbl.imageFilename);
imdsTest = imageDatastore(testDataTbl.imageFilename);
% Create a datastore for the ground truth bounding boxes.
bldsTrain = boxLabelDatastore(trainingDataTbl(:, 2:end));
bldsTest = boxLabelDatastore(testDataTbl(:, 2:end));
% Combine the image and box label datastores.
trainingData = combine(imdsTrain, bldsTrain);
testData = combine(imdsTest, bldsTest);
%%
%%%%%%%%%%%%%%%%%%%%%% SSD %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
inputSize = [300 300 3];
%Define number of object classes to detect.
numClasses = width(pedestriandataset)-1;
%Create the SSD object detection network.
lgraph = ssdLayers(inputSize, numClasses, 'resnet50'); %'vgg16'
analyzeNetwork(lgraph);
% plot(lgraph)
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
augmentedTrainingData = transform(trainingData,@augmentData);
augmentedData = cell(4,1);
for k = 1:4
data = read(augmentedTrainingData);
augmentedData{k} = insertShape(data{1},'Rectangle',data{2});
reset(augmentedTrainingData);
end
figure
montage(augmentedData,'BorderSize',10)
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Preprocess the augmented training data to prepare for training
preprocessedTrainingData = transform(augmentedTrainingData,@(data)preprocessData(data,inputSize));
% Read the preprocessed training data.
data = read(preprocessedTrainingData);
%Display the image and bounding boxes.
I = data{1};
bbox = data{2};
annotatedImage = insertShape(I,'Rectangle',bbox);
annotatedImage = imresize(annotatedImage,2);
figure
imshow(annotatedImage)
%%
%%%%%%%%%%%%%%%%%%%%%% Train SSD Object Detector %%%%%%%%%%%%%%%%%%%
options = trainingOptions('sgdm', 'MiniBatchSize', 16, ....
'InitialLearnRate',1e-1, 'LearnRateSchedule', 'piecewise', ...
'LearnRateDropPeriod', 30, 'LearnRateDropFactor', 0.8, ...
'MaxEpochs', 300, 'VerboseFrequency', 50, ...
'CheckpointPath', tempdir, 'Shuffle','every-epoch'); %'ExecutionEnvironment','cpu'
[detector, info] = trainSSDObjectDetector(preprocessedTrainingData,lgraph,options);
답변 (1개)
Benjamin
2024년 11월 20일 7:32
I'm having the same problem, I recognize that it's a problem with the matlab ssd network itself, and after looking at the network structure, I found that there is no BatchNormalization layer in ssd network which will easily cause training results divergence.If you can add BatchNormalization layer to the net ,you can have a try ( I can't do this ,but hope someone powerful can solve the problem)
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Image Data Workflows에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!