Why are the results of forward and predict very different in deep learning?

조회 수: 58(최근 30일)
When I use the "dlnetwork" type deep neural network model to make predictions, the results of the two functions are very different, except that using the predict function will freeze the batchNormalizationLayer and dropout layers.While forward does not freeze the parameters, he is the forward transfer function used in the training phase.
From the two pictures above, there are orders of magnitude difference in the output of the previous 10 results. Where does the problem appear?
All my data is here.

채택된 답변

Daniel Vieira
Daniel Vieira 2021년 8월 5일
편집: Daniel Vieira 2021년 8월 5일
I ran into this exact problem, and I think I found a solution, I'll discover it when my model finishes training...
As others said before, the problem occurs because batchNorms behave differently in forward() and predict(). But there is still a problem here: if you trained your model (forward), it should have converged to a solution that works well in inference (predict), but it doesn't. Something is wrong in the training too.
What is wrong is that batchNorms don't update parameters the same way as other layers through (adam/rmsprop/sgdm)update functions. They update through the State property of the dlnetwork object. Consider the code:
[gradients,loss] = dlfeval(@modelGradients,dlnet,dlX,Ylabel);
[dlnet,otherOutputs]=rmspropupdate(dlnet,gradients,otherInputs);
function [gradients,loss] = modelGradients(dlnet,dlX,Ylabel)
Y=forward(dlnet,dlX);
loss=myLoss(Y,Ylabel);
gradients=dlgradient(loss,dlnet.Learnables);
end
The code above is wrong if you have batchNorms, it won't update them. The batchNorms are updated through the State property returnet from forward and assigned to dlnet:
[gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX,Ylabel);
dlnet.State=state; % THIS!!!
[dlnet,otherOutputs]=rmspropupdate(dlnet,gradients,otherInputs);
function [gradients,state,loss] = modelGradients(dlnet,dlX,Ylabel)
[Y,state]=forward(dlnet,dlX); % THIS!!!
loss=myLoss(Y,Ylabel);
gradients=dlgradient(loss,dlnet.Learnables);
end
Now that dlnet has a State property updated at every forward() call, the batchNorms are updated and your model should converge to a solution that works for predict().
I would also like caling MathWorks attention that this detail is only present in documentation in ONE example of GAN networks (in spite of the omnipresence of batchNorm layers in deep learning models) and is never mentioned explicitly.
  댓글 수: 8
Filippo Vascellari
Filippo Vascellari 2022년 10월 28일
In my training i use, in the case of contrastive, batch of 64 and, in case of triplet, batch of 32 images that keep the entire RAM and GPU, so i can't use higher batch values. The images of the dataset are taken from CasiaWebface and are resized to 224x224.
I understand that the noise of the curve (fluctuations ) are due to both margin and triplets chosen in that batch, if the triplet are hard the loss will increase during iteration.
At the end I discover tha the not curve behavour of the triplet loss using my resnet18 pretrained on dataset is due to the fact that feature extraction made by resnet18 pretrained is already "good", in fact in during test it produce a small number of False Positive and False Negative without training it on rtriplet loss, in fact the curve beahvour is obtained if i use a not pretrained resnet18 on my dataset, i used the one trained on ImageNet.
I will try to use L2Regulariztion to reduce the noise in triplet loss because i already tested the contrastive and i obtain good results.
For the batchUpdate i need it because in my test i train different netwroks: crossentropy, triplet and contrastive, the last two are made in 2 versions: only triplet or contrastive loss and another version that combines classification loss and triplet/contrastive loss, to obtain this versione the netwrok must be entirely updated, also the batch norm layers, so for this reason i need the state update. Think for example: the contrastive loss needs the loss itself + the sum of the classification loss of the pair images (anchor-pos or anchor-neg) in this case i need to update the batch layer whit which state? the anchor one or pos/neg one?

댓글을 달려면 로그인하십시오.

추가 답변(3개)

vaibhav mishra
vaibhav mishra 2020년 6월 30일
Hi there,
In my opinion you are using BatchNorm in training and not in testing, so how can you expect to get the same results from both. You need to use batchnorm in testing also with the same parameters as training.
  댓글 수: 1
cui
cui 2020년 7월 7일
편집: cui 2020년 7월 7일
Thank you for your reply! But isn't the method function predict of dlnetwork freeze the BatchNorm mean and variance during model inference?
1、If it is frozen BN, why is the second parameter state returned by predict empty?
2、in testing, If I want to use Batchnorm parameters in the training phase, how should the code be modified during the inference model?
Sincerely hope to get your reply, thank you!

댓글을 달려면 로그인하십시오.


cui
cui 2020년 7월 12일
I wrote an analysis blog on this issue, see the attachment link. The question that still bothers me is how does batchnorm() forward and predict?

Luc VIGNAUD
Luc VIGNAUD 2021년 6월 29일
Thank you for raising this question. I did observe this issue playing with GANs and the difference comes indeed from the batchNorm. I ended using InstanceNorm instead but the question remains and should be answered by the matlab team ...

태그

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by