Hi. I am running a regression tree ensemble, and despite specifying a categorical variable using name-value pair 'CategoricalPredictors', the resulting ‘Mdl’ object looks like it has not used the categorical variable.
Assuming that the default value (1/3) of 'NumVariablesToSample’, might be the reason why the categorical feature is excluded, I set 'NumVariablesToSample’ to ‘all', to make sure that all variables are contained in all trees.
T=1000; N=20;
X = randn(T,N);
b = zeros(N+1,1); b(2:6) = 2; b(7:11) = -3;
D = [repmat([1 2 3 4 5], [1 200])]'; 
dum = dummyvar(D);
theta = 1:5; theta = theta'/5^2;
Y = [ones(T,1) X]*b + dum*theta + randn(T,1)*0.1;
N = size([X D],2);
clear dum theta b
Ntrees = 500;
tr = templateTree('MinParentSize',250,'CategoricalPredictors',21,'NumVariablesToSample','all');
Mdl = fitrensemble([X D],Y,'Method','Bag','Learners',tr,'NumLearningCycles',Ntrees);
rsvm = false(N,Ntrees);
for i = 1:Ntrees
    idx = unique(Mdl.Trained{i}.CutPredictorIndex);
    idx(idx==0)=[];
    rsvm(idx,i) = 1;
end
mean(sum(rsvm)) 
>> ans =
    4.6040
Question 1: Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables used in each tree (using Mdl.Trained{i}.CutPredictorIndex), on average only 5 out of the 21 features are included in each tree. I was expecting all 21 to be included in all trees. Why this is not the case? 
In the Bagged Ensemble above, I further checked and none of the individual trees picks the categorical variable (i.e. variable 21). When instead I fit a Boosted Ensemble, the algorithm still does not pick all the variables (-only picks on average 6 variables). However, it turns out that the categorical variable (#21) is now included in a few of the individual trees. 
tr = templateTree('MinParentSize',250,'CategoricalPredictors',21,'NumVariablesToSample','all');
Mdl = fitrensemble([X D],Y,'Method','LSBoost','Learners',tr,'NumLearningCycles',Ntrees);
rsvm = false(N,Ntrees);
for i = 1:Ntrees
    idx = unique(Mdl.Trained{i}.CutPredictorIndex);
    idx(idx==0)=[];
    rsvm(idx,i) = 1;
end
mean(sum(rsvm))
>> ans =
    6.3160
find(rsvm(21,:)) 
>> ans =
  Columns 1 through 18
    18    31    39    50    60    67    81   151   179   181   195   204   269   298   317   319   337   394
Question 2: Despite the fact that variable 21 is included in a number of trees, property 'CategoricalPredictors' is always empty. Can any explain why this is the case?
Mdl.CategoricalPredictors
>> ans = []
Mdl.Trained{18}.CutPredictorIndex
>> ans = []
Any insights are appreciated.