Hello
I'm using classifiers in Matlab (e.g. [fitcsvm](<http://ch.mathworks.com/help/stats/fitcsvm.html>) or [fitcknn](<http://ch.mathworks.com/help/stats/classificationknn-class.html))>. Because I have highly unbalanced classes (10% negative class and 90% positive class), I would like to use weighting. Usually I calculate the weight for class i as follows:
weight_i = numSamples / (numClasses * numSamplesClass_i)
That means the total number of observations divided by the product of the number of classes and the number of samples for class i.
Matlab offers the 'Weights' flag to set weights for each observation. But in the description the following is written:
The software normalizes Weights to sum up to the value of the prior probability in the respective class.
I'm completely unsure how I should now use the weights. Can I just set the weight calculated from the above formula for each data point according to its class belonging?

 채택된 답변

MHN
MHN 2016년 4월 21일

0 개 추천

You can easily change 'prior' to 'uniform'. 'uniform' sets all class probabilities equal. The default value is 'empirical' which determines class probabilities from class frequencies in Y. For example if you are using decision tree as a classifier then:
tree = fitctree(X,Y, 'prior', 'uniform')

댓글 수: 3

MHN
MHN 2016년 4월 21일
편집: MHN 2016년 4월 21일
There are many tricks to handle an unbalanced data. e.g. you can also define a cost matrix in a way that misclassification of your minor class costs much more than the misclassification of another class. e.g the following cost matrix: [0 9/10; 1/10 0]
MHN
MHN 2016년 4월 21일
편집: MHN 2016년 4월 21일
You can also use weight. "The software normalizes Weights to sum up to the value of the prior probability in the respective class" means that your weight must be a distribution. For example if you define all the weights equal to 1 and change the 'prior' to 'empirical', then Matlab normalizes your weights to 1/M (M:number of samples) to make it a distribution which sums up to 1.
Tom Gerard
Tom Gerard 2016년 4월 21일
편집: Tom Gerard 2016년 4월 21일
Thank you very much for your answer. Which of the three possibilities (prior, cost, weight) is best or is there no difference?
So, technically I can use my above formula weight_i = numSamples / (numClasses * numSamplesClass_i) for setting a cost matrix but not for settings weights for each data point. Correct?

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

MHN
MHN 2016년 4월 26일

0 개 추천

It depends on your evaluation criteria and does not have a straight forward answer. I suggest you to try them and see which gives you the best answer according to your evaluation criteria.

카테고리

질문:

2016년 4월 21일

답변:

MHN
2016년 4월 26일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by