필터 지우기
필터 지우기

Criteria for judging overfitting

조회 수: 5 (최근 30일)
정민 이
정민 이 2022년 6월 23일
답변: John D'Errico 2022년 6월 30일
I'm making a model using neural network fitting in matlab. I can check the training , validation , and test R values. However, it is observed that the model created has high training R values, but low validation and test R values.
Can you determine that overfitting has occurred? How much difference does the R value have to be considered overfitting?

답변 (2개)

AMIT POTE
AMIT POTE 2022년 6월 30일
There is no thumb rule that a particular difference in R-values would cause overfitting. Typically, if the R value for the training set is higher than validaion and test sets then it is likely that your model is overfitting. To confirm that your model is overfitting, you can use other metrics like validation accuracy and loss to check how your model works on unseen data.

John D'Errico
John D'Errico 2022년 6월 30일
If there were some clear and simple rule, then the code would be written to recognize that, and alert you of the problem. But the real world is never so clear and simple, else we might all be doing something more interesting. (Certainly true for me.)
You should recognize that in virtually any case, a model will have better capability to fit the training data than it will have to predict validation data. Surely you cannot expect it to go the other way? And while it would be nice if the model does exactly as well on the training data as the validation data, life is never perfect. So it is perfectly normal for the model to fit the training data a little better. The question is, how much better? And that really has no exact answer. So what can you do?
Very often all of this indicates your data may be more noisy than you think, so a lower signal to noise ratio. And you don't want your model to be chasing noise in the data.
A simple idea is to reduce the complexity of your model, by just a bit. One would expect this to reduce the ability of your model to represent the training data. But if it is chasing noise, then it really costs you nothing. If you do reduce the model complexity, and it has no effective impact on the ability of your model to predict the validation set, then you are going in the right direction. Continue to do so, reducing the complexity of your model, until just before it starts to significantly impact the ability of the model to predict the validation set. Somewhere around that point should be the sweet spot. At this point, you might hope that the model is predicting the training set just a little better than it is predicting the validation set. That will be a good place to live.
In the end, the best solution is to GET BETTER DATA. And always you want MORE data.

카테고리

Help CenterFile Exchange에서 Sequence and Numeric Feature Data Workflows에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by