I obtain different test success than predicted from SVM training on similar datasets.

Question

Marco Tremblay 2019년 9월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/481567-i-obtain-different-test-success-than-predicted-from-svm-training-on-similar-datasets

답변: Shishir Singhal 2020년 7월 28일

I used the quadratic SVM or the Ensemble to train a classifier using Matlab's APP. To train, I used a 250 x 15 dataset and used the default setup for validation etc... (Case_Num_Train)

The Ensemble gave me a 76.3% accuracy on the test set it extracted from the training data.

I then produce a smaller (124) set of slightly different test cases using the same technique I previously used to produce the training set (Case_Num_Real). Using the "export compact model" tool, I obtained a Trained model that can run in a script (Test_ANN). This script feeds the test data into the trained model and compares the prediction with the real case.

This gave 38 errors out of 124 test cases. This is ~30% error. It is close but something is wrong as repeating the training gives a fairly consistent 76%.

No obvious difference is seen when looking at the data from the training and test sets.

The problem worsen when I used a larger training set of 2500 cases. There a quadratic SVM gives a training accuracy of 94.6% but the test with 250 cases produces 102 errors or 40%. Not good enough!

I considered overfitting and incrementall reduced the training set to the 250 presented above. While the trained accuracy and the test accuracy do converge with smaller set, it is mostly at the cost of degraded precision.

I cannot believe that this is the best we can achieve. What is wrong?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Shishir Singhal 2020년 7월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/481567-i-obtain-different-test-success-than-predicted-from-svm-training-on-similar-datasets#answer_471706

Hi,

There can be a multiple reasons behind low test accuracy when we are using SVM.

In your case,

Please check if your are splitting the data correctly.

Since, you are using SVM as a classifier, use startify split to split your data. Startify split helps you to maintain the class distribution among train, validation and test set.

Please refer to the documentation here: https://in.mathworks.com/help/stats/cvpartition.html to know more about the partitioning the data in MATLAB.

Moreover, overfitting can also because of various reasons:

Less training data.
Bad feature selection.
Redundant features.
Noise in data.

In this case, I would recommend to do some feature analysis of data before modelling.

Hope, above mentioned points will help you !!!

Thanks

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

I obtain different test success than predicted from SVM training on similar datasets.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

I obtain different test success than predicted from SVM training on similar datasets.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기