How to deal with imbalanced dataset classification by support vector machine

조회 수: 38 (최근 30일)
I have a dataset that is heavily skewed in one class. The training with support vector machine (SVM), by either fitcsvm.m or fitcecoc.m, cannot give desirable results. The accuracy for the class that has more samples is more than 90%, but for the class with much fewer samples is barely 70%. Is there any way to improve the training by SVM? or other methods that can be used to tackle the umbablanced data training?

채택된 답변

Aditya Mittal
Aditya Mittal 2020년 4월 21일
Hi,
There are some ways which can be used to balance the dataset before fitting to the classifier to get the better result. These methods are as follows:
  • Under Sampling- Removing the unwanted or repeated data from the majority class and keep only a part of these useful points. In this way, there can be some balance in the data.
  • Over Sampling- Try to get more data points for the minority class. Or try to replicate some of the data points of the minority class in order to increase cardinality.
  • Generate Data- You can decide to generate synthetic data for the minority class for balancing the data. This can be done using SMOTE method. Below is the link to use SMOTE method-
  • https://www.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique
The results vary according to the problem. And accuracy is not always the best performance matric when evaluating imbalanced data. Therefore you should try different performance metrics which can give better insight.
  • Confusion matrix
  • Precision
  • Recall
  • F1 score
Try fitting the data to various machine learning models like hybrid or ensemble machine learning algorithms (e.g. Adaboost), or deep learning models can be used in order to receive better results.
  댓글 수: 4
Kenta
Kenta 2020년 7월 11일
The answer from Dr. Aditya Mittal is very informative. The example of oversampling is posted here. I hope it helps you.
Esmeralda Ruiz Pujadas
Esmeralda Ruiz Pujadas 2023년 3월 22일
You cannot use those methods directly, you are touching the validation. And SVM is different than deep learning. You cannot especify directly the validation in svm....

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by