Early Stopping for Deep Networks

Question

Roberto 2019년 1월 15일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/439823-early-stopping-for-deep-networks

편집: Greg Heath 2019년 1월 19일

Hi everyone,

just a quick question.

How can I stop the training of a deep network (LSTM for instance) in order to have weights and biases set accordingly with the minimum of the validation loss?

In other words what's the reason of having a validation set if the final network is NOT the one that minimize the validation loss because it's overtrained in any case?

Validation Patience parameter is not useful in this sense because it stops the training when it's too late and setting it too small could result in being stuck in a local minimum.

The only way I found is repeating the training with max epochs set where the minimum of validation loss in the first training is reached but it's a crazy solution...

Any idea?

Thanks

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Greg Heath 2019년 1월 16일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/439823-early-stopping-for-deep-networks#answer_356581

편집: Greg Heath 2019년 1월 16일

It is not clear to me that, based on a random 15% of the data, this is a better choice. It would be interesting to make a formal comparison based on multiple runs with different random number seeds using multiple data sets.

I believe that a more important point is to try to minimize the number of hidden nodes subject to an upper bound on the training set error rate.

Hope this helps

Thank you for formally accepting tmy answer

Greg

댓글 수: 2
없음 표시없음 숨기기

Roberto 2019년 1월 17일

Not sure to understand. You mean that L2 regularization could outperform early stopping?

In my opinion it's too related to the dataset to make a formal comparison, but in order to compare the two methods we still need a way to early stop the training in deep learning toolbox...

Greg Heath 2019년 1월 19일

편집: Greg Heath 2019년 1월 19일

No. That is not what I meant.

HOWEVER

Any decent method will outperform others depending on the data set.

My shallow net double loop procedure (MANY examples in NEWSGROUP and ANSWERS) has been successfull for decades.

Single hidden layer
Outer loop over number of hidden nodes H = 0:dH:Hmax
Inner loop over random intial weights

I have not tried it on deep nets but am interested if anyone else has.

Greg.

댓글을 달려면 로그인하십시오.

Early Stopping for Deep Networks

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Early Stopping for Deep Networks

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기