oobQuantileError
Out-of-bag quantile loss of bag of regression trees
Description
returns half of the out-of-bag mean absolute deviation
(MAD) from comparing the true responses in err
= oobQuantileError(Mdl
)Mdl.Y
to the predicted,
out-of-bag medians at Mdl.X
, the predictor data, and using the bag of
regression trees Mdl
. Mdl
must be a TreeBagger
model object.
uses additional options specified by one or more err
= oobQuantileError(Mdl
,Name,Value
)Name,Value
pair
arguments. For example, specify quantile probabilities, the error type, or which trees to
include in the quantile-regression-error estimation.
Input Arguments
Mdl
— Bag of regression trees
TreeBagger
model object (default)
Bag of regression trees, specified as a TreeBagger
model object created by the TreeBagger
function.
The value of
Mdl.Method
must beregression
.When you train
Mdl
using theTreeBagger
function, you must specify the name-value pair'OOBPrediction','on'
. Consequently,TreeBagger
saves required out-of-bag observation index matrix inMdl.OOBIndices
.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Mode
— Ensemble error type
'ensemble'
(default) | 'cumulative'
| 'individual'
Ensemble error type, specified as the comma-separated pair consisting of 'Mode'
and a value in this table. Suppose tau
is the value of Quantile
.
Value | Description |
---|---|
'cumulative' |
|
'ensemble' |
|
'individual' |
|
For 'cumulative'
and 'individual'
, if you choose to include fewer trees in quantile estimation using Trees
, then this action affects the number of rows in err
and corresponding row indices.
Example: 'Mode','cumulative'
Quantile
— Quantile probability
0.5
(default) | numeric vector containing values in [0,1]
Quantile probability, specified as the comma-separated pair
consisting of 'Quantile'
and a numeric vector containing
values in the interval [0,1]. For each observation (row) in Mdl.X
, oobQuantileError
estimates
corresponding quantiles for all probabilities in Quantile
.
Example: 'Quantile',[0 0.25 0.5 0.75 1]
Data Types: single
| double
Trees
— Indices of trees to use in response estimation
'all'
(default) | numeric vector of positive integers
Indices of trees to use in response estimation, specified as the comma-separated pair consisting of 'Trees'
and 'all'
or a numeric vector of positive integers. Indices correspond to the cells of Mdl.Trees
; each cell therein contains a tree in the ensemble. The maximum value of Trees
must be less than or equal to the number of trees in the ensemble (Mdl.NumTrees
).
For 'all'
, oobQuantileError
uses all trees in the ensemble (that is, the indices 1:Mdl.NumTrees
).
Values other than the default can affect the number of rows in err
.
Example: 'Trees',[1 10 Mdl.NumTrees]
Data Types: char
| string
| single
| double
TreeWeights
— Weights to attribute to responses from individual trees
ones(Mdl.NumTrees,1)
(default) | numeric vector of nonnegative values
Weights to attribute to responses from individual trees, specified
as the comma-separated pair consisting of 'TreeWeights'
and
a numeric vector of numel(
nonnegative
values. trees
)trees
is the value of Trees
.
If you specify 'Mode','individual'
, then oobQuantileError
ignores TreeWeights
.
Data Types: single
| double
Output Arguments
err
— Half of out-of-bag quantile regression error
numeric scalar | numeric matrix
Half of the out-of-bag quantile regression error, returned as a numeric scalar or T
-by-numel(
matrix. tau
)tau
is the value of Quantile
.
T
depends on the values of Mode
, Trees
, and Quantile
. Suppose that you specify 'Quantile',
and tau
'Trees',
.trees
For
'Mode','cumulative'
,err
is anumel(
-by-trees
)numel(
numeric matrix.tau
)err(
is thej
,k
)
cumulative, out-of-bag quantile regression error using the learners intau
(k
)Mdl.Trees(
.trees
(1:j
))For
'Mode','ensemble'
,err
is a1
-by-numel(
numeric vector.tau
)err(
is thek
)
cumulative, out-of-bag quantile regression error using the learners intau
(k
)Mdl.Trees(
.trees
)For
'Mode','individual'
,err
is anumel(
-by-trees
)numel(
numeric matrix.tau
)err(
is thej
,k
)
out-of-bag quantile regression error using the learner intau
(k
)Mdl.Trees(
.trees
(j
))
Examples
Estimate Out-of-Bag Quantile Regression Error
Load the carsmall
data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders. Consider Cylinders
a categorical variable.
load carsmall
Cylinders = categorical(Cylinders);
X = table(Displacement,Weight,Cylinders,MPG);
Train an ensemble of bagged regression trees using the entire data set. Specify 100 weak learners and save the out-of-bag indices.
rng(1); % For reproducibility Mdl = TreeBagger(100,X,'MPG','Method','regression','OOBPrediction','on');
Mdl
is a TreeBagger
ensemble.
Perform quantile regression, and out-of-bag estimate the MAD of the entire ensemble using the predicted conditional medians.
oobErr = oobQuantileError(Mdl)
oobErr = 1.5349
oobErr
is an unbiased estimate of the quantile regression error for the entire ensemble.
Find Appropriate Ensemble Size Using Out-of-Bag Quantile Regression Error
Load the carsmall
data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders.
load carsmall
X = table(Displacement,Weight,Cylinders,MPG);
Train an ensemble of bagged regression trees using the entire data set. Specify 250 weak learners and save the out-of-bag indices.
rng('default'); % For reproducibility Mdl = TreeBagger(250,X,'MPG','Method','regression',... 'OOBPrediction','on');
Estimate the cumulative; out-of-bag; 0.25, 0.5, and 0.75 quantile regression errors.
err = oobQuantileError(Mdl,'Quantile',[0.25 0.5 0.75],'Mode','cumulative');
err
is an 250-by-3 matrix of cumulative, out-of-bag, quantile regression errors. Columns correspond to quantile probabilities and rows correspond to trees in the ensemble. The errors are cumulative, so they incorporate aggregated predictions from previous trees.
Plot the cumulative, out-of-bag, quantile errors on the same plot.
figure; plot(err); legend('0.25 quantile error','0.5 quantile error','0.75 quantile error'); ylabel('Out-of-bag quantile error'); xlabel('Tree index'); title('Cumulative, Out-of-Bag, Quantile Regression Error')
All quantile error curves appear to level off after training about 50 trees. So, training 50 trees appears to be sufficient to achieve minimal quantile error for the three quantile probabilities.
More About
Out-of-Bag
In a bagged ensemble, observations are out-of-bag when they are left out of the training sample for a particular learner. Observations are in-bag when they are used to train a particular learner.
When bagging learners, a practitioner takes a bootstrap sample (that is, a random sample with replacement) of size n for each learner, and then trains the learners using their respective bootstrap samples. Drawing n out of n observations with replacement omits on average about 37% of observations for each learner.
The out-of-bag ensemble error, the ensemble error estimated using out-of-bag observations only, is an unbiased estimator of the true ensemble error.
Quantile Regression Error
The quantile regression error of a model given observed predictor data and responses is the weighted mean absolute deviation (MAD). If the model under-predicts the response, then deviation weights are τ, the quantile probability. If the model over-predicts, then deviation weights are 1 – τ.
That is, the τ quantile regression error is
yj is true response j, is the τ quantile that the model predicts, and wj is observation weight j.
Tips
The out-of-bag ensemble error estimator is unbiased for the true ensemble error. So, to tune parameters of a random forest, estimate the out-of-bag ensemble error instead of implementing cross-validation.
References
[1] Breiman, L. "Random Forests." Machine Learning 45, pp. 5–32, 2001.
[2] Meinshausen, N. “Quantile Regression Forests.” Journal of Machine Learning Research, Vol. 7, 2006, pp. 983–999.
Version History
Introduced in R2016b
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)