regressionTreeComponent

Pipeline component for regression using binary decision trees

Since R2026a

Description

regressionTreeComponent is a pipeline component that creates a regression model using a binary decision tree. The pipeline component uses the functionality of the fitrtree function during the learn phase to train the tree regression model. The component uses the functionality of the predict and loss functions during the run phase to perform regression.

Creation

Syntax

component = regressionTreeComponent

component = regressionTreeComponent(Name=Value)

Description

component = regressionTreeComponent creates a pipeline component for a tree regression model.

example

component = regressionTreeComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the maximum number of decision splits, pruning criterion, and minimum leaf size.

Properties

expand all

Structural Parameters

The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

`UseWeights` — Observation weights flag
`false` or `0` (default) | `true` or `1`

This property is read-only after the component is created.

Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

Example: c = regressionTreeComponent(UseWeights=1)

Data Types: logical

Learn Parameters

The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

`MaxNumSplits` — Maximal number of decision splits
nonnegative scalar

Maximal number of decision splits (or branch nodes), specified as a nonnegative scalar. The software splits MaxNumSplits or fewer branch nodes.

The default value is size(X-1) where X is the number of observations in the first data argument of learn.

Example: c = regressionTreeComponent(MaxNumSplits=5)

Example: c.MaxNumSplits = 10

Data Types: single | double

`MergeLeaves` — Leaf merge flag
`"on"` (default) | `"off"`

Leaf merge flag, specified as "on" or "off".

When MergeLeaves is "on", then the component:

Merges leaves originating from the same parent node if that yields a sum of risk values greater than or equal to the risk associated with the parent node.
Estimates the optimal sequence of pruned subtrees, but does not prune the regression tree.

Example: c = regressionTreeComponent(MergeLeaves="off")

Example: c.MergeLeaves = "on"

Data Types: char | string

`MinLeafSize` — Minimum number of leaf node observations
`1` (default) | positive integer scalar

Minimum number of leaf node observations, specified as a positive integer scalar. Each leaf has at least MinLeafSize observations per tree leaf. If you supply both MinParentSize and MinLeafSize, then the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

Example: c = regressionTreeComponent(MinLeafSize=3)

Example: c.MinLeafSize = 1

Data Types: single | double

`MinParentSize` — Minimum number of branch node observations
`10` (default) | positive integer scalar

Minimum number of branch node observations, specified as a positive integer scalar. Each branch node has at least MinParentsSize observations. If you supply both MinParentSize and MinLeafSize, then the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

Example: c = regressionTreeComponent(MinParentSize=8)

Example: c.MinParentSize = 12

Data Types: single | double

`NumBins` — Number of bins for numeric predictors
`[]`(empty) (default) | positive integer scalar

Number of bins for numeric predictors, specified as a positive integer scalar or [] (empty).

If NumBins is empty, ([]), then the component does not bin any predictors.
If you specify NumBins as a positive integer scalar, then the component bins every numeric predictor into at most NumBins equiprobable bins, and then grows trees on the bin indices instead of the original data.

Example: c = regressionTreeComponent(NumBins=50)

Example: c.NumBins = []

Data Types: single | double

`NumVariablesToSample` — Number of predictors to select at random for each split
`"all"` (default) | positive integer scalar

Number of predictors to select at random for each split, specified as "all" or a positive integer scalar.

Example: c = regressionTreeComponent(NumVariablesToSample=3)

Example: c.NumVariablesToSample = "all"

Data Types: single | double | char | string

`PredictorSelection` — Algorithm used to select the best split predictor
`"allsplits"` (default) | `"curvature"` | `"interaction-curvature"`

Algorithm used to select the best split predictor at each node, specified as a value in this table.

Value	Description
`"allsplits"`	Standard CART — Selects the split predictor that maximizes the split-criterion gain over all possible splits of all predictors [1].
`"curvature"`	Curvature test — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [2]. Training speed is similar to standard CART.
`"interaction-curvature"`	Interaction test — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and that minimizes the p-value of a chi-square test of independence between each pair of predictors and response [2]. Training speed can be slower than standard CART.

For "curvature" and "interaction-curvature", if all tests yield p-values greater than 0.05, then the component stops splitting nodes.

Example: c = regressionTreeComponent(PredictorSelection="curvature")

Example: c.PredictorSelection = "interaction-curvature"

Data Types: char | string

`Prune` — Flag to estimate optimal sequence of pruned subtrees
`"on"` (default) | `"off"`

Flag to estimate the optimal sequence of pruned subtrees, specified as "on" or "off". If Prune is "on", then the component grows the regression tree without pruning it, but estimates the optimal sequence of pruned subtrees. If Prune is "off" and MergeLeaves is also "off", then the component grows the regression tree without estimating the optimal sequence of pruned subtrees.

Example: c = regressionTreeComponent(Prune="off")

Example: c.Prune = "on"

Data Types: char | string

`PruneCriterion` — Pruning criterion
`"mse"` (default)

Pruning criterion, specified as "mse".

Data Types: char | string

`QuadraticErrorTolerance` — Quadratic error tolerance
`1e-6` (default) | positive scalar value

Quadratic error tolerance per node, specified as a positive scalar. The component stops splitting nodes when the weighted mean squared error per node drops below QuadraticErrorTolerance*ε, where ε is the weighted mean squared error of all n responses computed before growing the decision tree.

$ε = \sum_{i = 1}^{n} w_{i} {(y_{i} - \bar{y})}^{2} .$

w_i is the weight of observation i, given that the weights of all the observations sum to one ( $\sum_{i = 1}^{n} w_{i} = 1$ ), and

$\bar{y} = \sum_{i = 1}^{n} w_{i} y_{i}$

is the weighted average of all the responses.

Example: c = regressionTreeComponent(QuadraticErrorTolerance=1e-4)

Example: c.QuadraticErrorTolerance = 1e-5

Data Types: single | double

`Reproducible` — Flag to enforce reproducibility
`false` or `0` (default) | `true` or `1`

Flag to enforce reproducibility over repeated runs of training a model, specified as 0 (false) or 1 (true).

If NumVariablesToSample is not "all", then the component selects predictors at random for each split. To reproduce the random selections, you must specify Reproducible as true and set the seed of the random number generator using rng.

Example: c = regressionTreeComponent(Reproducible=true)

Example: c.Reproducible = 0

Data Types: logical

`SplitCriterion` — Split criterion
`"MSE"` (default)

Split criterion, specified as "MSE".

Data Types: char | string

`Surrogate` — Surrogate decision splits flag
`"off"` (default) | `"on"` | `"all"` | positive integer scalar

Surrogate decision splits flag, specified as "off", "on", "all", or a positive integer scalar.

If Surrogate is "on", the component finds at most 10 surrogate splits at each branch node.
If Surrogate is "all", the component finds all surrogate splits at each branch model, which can use considerable time and memory.
If Surrogate is a positive integer scalar, the component finds at most the specified number of surrogate splits at each branch node.

Example: c = regressionTreeComponent(Surrogate="on")

Example: c.Surrogate = "all"

Data Types: single | double | char | string

Run Parameters

The software sets run parameters when you create the component. You can modify the run parameters using dot notation at any time. Any unset run parameters use the corresponding default values.

`LossFun` — Loss function
`"mse"` (default) | function handle

Loss function, specified as "mse" (mean squared error) or a function handle.

To specify a custom loss function, use function handle notation. For more information on custom loss functions, see LossFun.

Example: c = regressionTreeComponent(LossFun=@lossfun)

Example: c.LossFun = "mse"

Data Types: char | string | function_handle

`ResponseTransform` — Function for transforming raw response values
`"none"` (default) | function handle | function name

Function for transforming raw response values, specified as a function handle or function name. The default is "none", which means @(y)y, or no transformation. The function must accept a vector (the original response values) and return a vector of the same size (the transformed response values).

Example: c = regressionTreeComponent(ResponseTransform=@(y)exp(y))

Example: c.ResponseTransform = "exp"

Data Types: char | string | function_handle

`TreeSize` — Tree size
`"se"` (default) | `"min"`

Tree size, specified as one of the following values.

"se" — The component returns the best pruning level, which corresponds to the smallest tree whose mean squared error (MSE) is within one standard error of the minimum MSE.
"min" — The component returns the best pruning level, which corresponds to the minimal MSE tree.

Example: c = regressionTreeComponent(TreeSize="min")

Example: c.TreeSize = "se"

Data Types: char | string

Component Properties

The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

`Name` — Component identifier
`"RegressionTree"` (default) | character vector | string scalar

Component identifier, specified as a character vector or string scalar.

Example: c = regressionTreeComponent(Name="Tree")

Example: c.Name = "TreeRegression"

Data Types: char | string

`Inputs` — Names of input ports
`["Predictors","Response"]` (default) | character vector | string array | cell array of character vectors

Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the software adds the input port "Weights" to Inputs.

Example: c = regressionTreeComponent(Inputs=["X","Y"])

Example: c.Inputs = ["X1","Y1"]

Data Types: char | string | cell

`Outputs` — Names of output ports
`["Predictions","Loss"]` (default) | character vector | string array | cell array of character vectors

Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

Example: c = regressionTreeComponent(Outputs=["Responses","LossVal"])

Example: c.Outputs = ["X","Y"]

Data Types: char | string | cell

`InputTags` — Tags that enable automatic connection of component inputs
`[1 2]` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, then the number of tags must match the number of inputs in Inputs. If UseWeights is true, the software adds a third input tag to InputTags.

Example: c = regressionTreeComponent(InputTags=[0 1])

Example: c.InputTags = [1 0]

Data Types: single | double

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 0]` (default) | nonnegative integer vector

Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, then the number of tags must match the number of outputs in Outputs.

Example: c = regressionTreeComponent(OutputTags=[0 1])

Example: c.OutputTags=[1 2]

Data Types: single | double

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

This property is read-only.

Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

Data Types: logical

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

This property is read-only.

Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

Data Types: logical

Learnables

The software sets learnables when you use the learn object function. You cannot modify learnables directly.

`TrainedModel` — Trained model
Read-only: `CompactRegressionTree` model object

This property is read-only.

Trained model, returned as a CompactRegressionTree model object.

Object Functions

`learn`	Initialize and evaluate pipeline or component
`run`	Execute pipeline or component for inference after learning
`reset`	Reset pipeline or component
`series`	Connect components in series to create pipeline
`parallel`	Connect components or pipelines in parallel to create pipeline
`view`	View diagram of pipeline inputs, outputs, components, and connections

Examples

collapse all

Create and Train a Pipeline Component for Tree Regression

Create a regressionTreeComponent component.

component = regressionTreeComponent

component = 

  regressionTreeComponent with properties:

            Name: "RegressionTree"
          Inputs: ["Predictors"    "Response"]
       InputTags: [1 2]
         Outputs: ["Predictions"    "Loss"]
      OutputTags: [1 0]

   
Learnables (HasLearned = false)
    TrainedModel: []

   
Structural Parameters (locked)
      UseWeights: 0


Show all parameters

component is a regressionTreeComponent object that contains one learnable, TrainedModel. This property remains empty until you pass data to the component during the learn phase.

To limit the number of splits in the tree model, set the MaxNumSplits property of the component to 7.

component.MaxNumSplits = 7;

Load the carsmall data set and remove missing entries from the data. Separate the predictor and response variables into two tables.

load carsmall
carData = table(Cylinders,Displacement,Horsepower,Weight,MPG);
R = rmmissing(carData);
X = R(:,["Cylinders","Displacement","Horsepower","Weight"]);
Y = R(:,"MPG");

Train the regressionTreeComponent.

component = learn(component,X,Y)

component = 
  regressionTreeComponent with properties:

            Name: "RegressionTree"
          Inputs: ["Predictors"    "Response"]
       InputTags: [1 2]
         Outputs: ["Predictions"    "Loss"]
      OutputTags: [1 0]

   
Learnables (HasLearned = true)
    TrainedModel: [1×1 classreg.learning.regr.CompactRegressionTree]

   
Structural Parameters (locked)
      UseWeights: 0

   
Learn Parameters (locked)
    MaxNumSplits: 7


Show all parameters

Note that the HasLearned property is set to true, which indicates that the software trained the tree model TrainedModel. You can use component to predict response values for new data using the run function.

References

[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

[2] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386.

[3] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

Version History

Introduced in R2026a

regressionTreeComponent

Description

Creation

Syntax

Description

Properties

Structural Parameters

UseWeights — Observation weights flag false or 0 (default) | true or 1

Learn Parameters

MaxNumSplits — Maximal number of decision splits nonnegative scalar

MergeLeaves — Leaf merge flag "on" (default) | "off"

MinLeafSize — Minimum number of leaf node observations 1 (default) | positive integer scalar

MinParentSize — Minimum number of branch node observations 10 (default) | positive integer scalar

NumBins — Number of bins for numeric predictors [](empty) (default) | positive integer scalar

NumVariablesToSample — Number of predictors to select at random for each split "all" (default) | positive integer scalar

PredictorSelection — Algorithm used to select the best split predictor "allsplits" (default) | "curvature" | "interaction-curvature"

Prune — Flag to estimate optimal sequence of pruned subtrees "on" (default) | "off"

PruneCriterion — Pruning criterion "mse" (default)

QuadraticErrorTolerance — Quadratic error tolerance 1e-6 (default) | positive scalar value

Reproducible — Flag to enforce reproducibility false or 0 (default) | true or 1

SplitCriterion — Split criterion "MSE" (default)

Surrogate — Surrogate decision splits flag "off" (default) | "on" | "all" | positive integer scalar

Run Parameters

LossFun — Loss function "mse" (default) | function handle

ResponseTransform — Function for transforming raw response values "none" (default) | function handle | function name

TreeSize — Tree size "se" (default) | "min"

Component Properties

Name — Component identifier "RegressionTree" (default) | character vector | string scalar

Inputs — Names of input ports ["Predictors","Response"] (default) | character vector | string array | cell array of character vectors

Outputs — Names of output ports ["Predictions","Loss"] (default) | character vector | string array | cell array of character vectors

InputTags — Tags that enable automatic connection of component inputs [1 2] (default) | nonnegative integer vector

OutputTags — Tags that enable automatic connection of component outputs [1 0] (default) | nonnegative integer vector

HasLearnables — Indicator for learnables Read-only: 1 (true) (default)

HasLearned — Indicator showing learning status of component Read-only: 0 (false) (default) | 1 (true)

Learnables

TrainedModel — Trained model Read-only: CompactRegressionTree model object

Object Functions

Examples

Create and Train a Pipeline Component for Tree Regression

References

Version History

See Also

`UseWeights` — Observation weights flag
`false` or `0` (default) | `true` or `1`

`MaxNumSplits` — Maximal number of decision splits
nonnegative scalar

`MergeLeaves` — Leaf merge flag
`"on"` (default) | `"off"`

`MinLeafSize` — Minimum number of leaf node observations
`1` (default) | positive integer scalar

`MinParentSize` — Minimum number of branch node observations
`10` (default) | positive integer scalar

`NumBins` — Number of bins for numeric predictors
`[]`(empty) (default) | positive integer scalar

`NumVariablesToSample` — Number of predictors to select at random for each split
`"all"` (default) | positive integer scalar

`PredictorSelection` — Algorithm used to select the best split predictor
`"allsplits"` (default) | `"curvature"` | `"interaction-curvature"`

`Prune` — Flag to estimate optimal sequence of pruned subtrees
`"on"` (default) | `"off"`

`PruneCriterion` — Pruning criterion
`"mse"` (default)

`QuadraticErrorTolerance` — Quadratic error tolerance
`1e-6` (default) | positive scalar value

`Reproducible` — Flag to enforce reproducibility
`false` or `0` (default) | `true` or `1`

`SplitCriterion` — Split criterion
`"MSE"` (default)

`Surrogate` — Surrogate decision splits flag
`"off"` (default) | `"on"` | `"all"` | positive integer scalar

`LossFun` — Loss function
`"mse"` (default) | function handle

`ResponseTransform` — Function for transforming raw response values
`"none"` (default) | function handle | function name

`TreeSize` — Tree size
`"se"` (default) | `"min"`

`Name` — Component identifier
`"RegressionTree"` (default) | character vector | string scalar

`Inputs` — Names of input ports
`["Predictors","Response"]` (default) | character vector | string array | cell array of character vectors

`Outputs` — Names of output ports
`["Predictions","Loss"]` (default) | character vector | string array | cell array of character vectors

`InputTags` — Tags that enable automatic connection of component inputs
`[1 2]` (default) | nonnegative integer vector

`OutputTags` — Tags that enable automatic connection of component outputs
`[1 0]` (default) | nonnegative integer vector

`HasLearnables` — Indicator for learnables
Read-only: `1` (`true`) (default)

`HasLearned` — Indicator showing learning status of component
Read-only: `0` (`false`) (default) | `1` (`true`)

`TrainedModel` — Trained model
Read-only: `CompactRegressionTree` model object