Classification problem parsed as regression problem when Split Criterion is supplied to fitcensemble
이전 댓글 표시
Hi
I ran a hyperparameter optimization to find the best parameters for a two-class classification problem using fitcensemble. But when I try to use these I get a strange warning:
Warning: You must pass 'SplitCriterion' as a character vector 'mse' for regression.
What is wrong with my code? The warning comes when I use a boosting ensemble as 'method'. When I remove the 'SplitCriterion' everything works fine, but I cannot understand why Matlab somewhere on the line thinks this is a regression problem when I use fit"c"ensemble. Here is a toy example with arbitrarily chosen settings that you can run to reproduce the Warning/Error.
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
t = templateTree( 'MaxNumSplits', 30,...
'MinLeafSize', 10,...
'SplitCriterion', 'gdi');
classificationEnsemble = fitcensemble(X,'Cylinders',...
'Method', 'LogitBoost', ...
'NumLearningCycles',12,...
'Learners',t,...
'KFold',7,...
'LearnRate',0.1);
댓글 수: 4
Don Mathis
2017년 4월 3일
편집: Don Mathis
2017년 4월 4일
Logitboost internally fits regression trees, and the gdi split criterion doesn't work for regression. So the problem is that 'LogitBoost' is incompatible with 'gdi'. The warning messages you get are pretty bad at explaining the problem, however.
When I run your code, I get a useless model as output. I don't understand how an optimization could have chosen these to be the best hyperparameters when the model is unusable. Did you use the 'bayesopt' function or the 'OptimizeHyperparameters' argument to do the optimization? If so, I would like to see the code for that.
Tobias Pahlberg
2017년 4월 6일
Don Mathis
2017년 4월 6일
When I run an optimization I never see successes for LogitBoost+gdi. Nor GentleBoost+gdi. They fail and eventually are not tried any more. Could you post a reproducible example of an optimization that shows successes for those combinations? That would be very helpful.
In any case, there is a problem in that LogitBoost is never run with 'mse', which is the only SplitCriterion it can use. As a workaround, you might try running a separate optimization without optimizing SplitCriterion. Then you could take the best result from the 2 optimizations. Something like this:
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
classificationEnsemble = fitcensemble(X,'Cylinders',...
'NumLearningCycles',12,...
'Learners','Tree',...
'OptimizeHyperparameters', {'Method', 'LearnRate', 'MinLeafSize', 'MaxNumSplits', 'NumVariablesToSample'})
Tobias Pahlberg
2017년 4월 10일
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Classification Ensembles에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!