dealing imbalanced data in neural network

I want to use deep learning network for classification problem. I have an issue of imbalanced data, means one of the classes have less training examples than the others.
I know there is an option to remove training data from the other classes, but I wonder if there is other solution. For example, is there an option to modify the cost layer such that the cost of miss classification a specific class will be larger? Thanks,

 채택된 답변

Greg Heath
Greg Heath 2018년 6월 12일

1 개 추천

There many ways to deal with unbalanced classes when there is no more real data available. Over the decades I have used the following
1. Use the summary statistics of small classes to simulate more data
2. Design multiple nets using the smaller classes and subsets of the larger classes.
Then combine the answers.
3. Use a cost matrix to enhance the influence of the small subsets
and/or reduce the influence of the larger subsets
4. A combination of the above.
The basis of the techniques can be understood by examining the following term in the Bayesian Risk
Cij * Pi * p(i|x)
which involves the probability density, a prori probability and the classification cost.
Hope this helps.
Thank you for formally accepting my answer
Greg

댓글 수: 3

Tally
Tally 2018년 6월 14일
편집: Tally 2018년 6월 14일
Thanks Greg.
regarding option 3 (use a cost matrix), is it possible to do it using the matlab neural network toolbox. This toolbox is very convenience allowing me to easily define layers, but those layers seems like black box that cannot be modified. So I can define loss function using the builtin softmaxLayer and classificationLayer but I don't see how I can modify it such that different classes will get different costs. Does the nn toolbox allows custom loss function?
I am also trying to find how to change the classification cost matrix for a Matlab Shallow NN. I saw in another post you mentioned you answered this on usenet but I don't know what's going on with usenet these days. Seems very complicated to get on and search something! Haven't used it for 15 years. It is much harder now :-)
Kenta
Kenta 2020년 7월 11일
For the imbalanced dataset, over-sampling is also effective. The demo is posted below. I hope it helps you.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기

질문:

2018년 6월 12일

댓글:

2020년 7월 11일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by