How to include all variables in each decision tree of an ensemble?

조회 수: 17 (최근 30일)
Haris K.
Haris K. 2021년 2월 13일
답변: Aditya Patil 2021년 2월 16일
Hi everyone. I am fitting the following 10-tree ensemble.
X = rand(1000,50);
Y = rand(1000,1);
N = size(X,2);
Ntrees=10;
t = templateTree('NumVariablesToSample','all');
Mdl = fitrensemble(X,Y,'Method','LSBoost','Learners',t,'NumLearningCycles',Ntrees);
Below I extract the number of variables that are included in each of the 10 trees.
z = false(N,Ntrees);
for i = 1:Ntrees
idx = unique(Mdl.Trained{i}.CutPredictorIndex);
idx(idx==0)=[];
z(idx,i) = 1;
end
sum(z)
>> ans =
8 10 8 10 9 9 10 8 9 9
Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables included in each tree, only 8-10 out of the 50 features are included in each tree. Does anyone have a suggestion on how to force all variables to be included in all trees? Thanks.

답변 (1개)

Aditya Patil
Aditya Patil 2021년 2월 16일
'NumVariablesToSample' defines the number of variables(predictors) which will be considered at any given split. The decision tree algorithm picks random set of predictors, and then selects one of them, based on certain criterias.
It might not be necessary, or sometimes even possible, to use a specific variable in a tree. For example, consider if a prior split leaves samples of only one class. In such a case, selecting a decision boundary for that variable will not be possible.
If you need to use all variables, you can look at some of the other classification algorithms available in MATLAB, such as SVM.

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by