Feature Selection in TreeBagger
이전 댓글 표시
Hello MathWorks community
I'm currently working with the TreeBagger class to generate some classification tree esembles. Now I would like to know, how it decides wich features are used for splitting the data. If I create for example an esemble of tree stumps with 5000 trees and use it to classify a dataset with two features (e.g. VRQL-Value and maximum frequency), and then check which feature was selected for splitting for every single tree like this:
cellArray={};
for y=1:length(Random_Forest_Model.Trees)
cellArray{y}=Random_Forest_Model.Trees{y}.CutPredictor{1};
end
It happens in some cases, that only one feature was selected for all 5000 trees and the other feature was selected in not a single case (i.e. cellArray looks like this: {'x2', 'x2', 'x2', ..., 'x2', }). This can also happen with multiple features: only one feature is selected, the others are ignored.
Maybe important things to mention about the dataset:
-One feature achieves Values from 1 to 100, the other one from about 200 to 1200
-The classes are imbalanced (class 1: 52 entries, class 2: over 300 entries)
-only the greater class contains the NaNs
-both features contain NaNs
My question now is: how can I achieve, that the TreeBagger uses all features for classification and not only one or how can I in genreal achieve a more balanced selection of features.
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Classification Ensembles에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!