주요 콘텐츠

regressionTreeComponent

Pipeline component for regression using binary decision trees

Since R2026a

    Description

    regressionTreeComponent is a pipeline component that creates a regression model using a binary decision tree. The pipeline component uses the functionality of the fitrtree function during the learn phase to train the tree regression model. The component uses the functionality of the predict and loss functions during the run phase to perform regression.

    Creation

    Description

    component = regressionTreeComponent creates a pipeline component for a tree regression model.

    example

    component = regressionTreeComponent(Name=Value) sets writable Properties using one or more name-value arguments. For example, you can specify the maximum number of decision splits, pruning criterion, and minimum leaf size.

    Properties

    expand all

    Structural Parameters

    The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.

    This property is read-only after the component is created.

    Observation weights flag, specified as 0 (false) or 1 (true). If UseWeights is true, the component adds a third input "Weights" to the Inputs component property, and a third input tag 3 to the InputTags component property.

    Example: c = regressionTreeComponent(UseWeights=1)

    Data Types: logical

    Learn Parameters

    The software sets learn parameters when you create the component. You can modify learn parameters using dot notation any time before you use the learn object function. Any unset learn parameters use the corresponding default values.

    Maximal number of decision splits (or branch nodes), specified as a nonnegative scalar. The software splits MaxNumSplits or fewer branch nodes.

    The default value is size(X-1) where X is the number of observations in the first data argument of learn.

    Example: c = regressionTreeComponent(MaxNumSplits=5)

    Example: c.MaxNumSplits = 10

    Data Types: single | double

    Leaf merge flag, specified as "on" or "off".

    When MergeLeaves is "on", then the component:

    • Merges leaves originating from the same parent node if that yields a sum of risk values greater than or equal to the risk associated with the parent node.

    • Estimates the optimal sequence of pruned subtrees, but does not prune the regression tree.

    Example: c = regressionTreeComponent(MergeLeaves="off")

    Example: c.MergeLeaves = "on"

    Data Types: char | string

    Minimum number of leaf node observations, specified as a positive integer scalar. Each leaf has at least MinLeafSize observations per tree leaf. If you supply both MinParentSize and MinLeafSize, then the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

    Example: c = regressionTreeComponent(MinLeafSize=3)

    Example: c.MinLeafSize = 1

    Data Types: single | double

    Minimum number of branch node observations, specified as a positive integer scalar. Each branch node has at least MinParentsSize observations. If you supply both MinParentSize and MinLeafSize, then the component uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

    Example: c = regressionTreeComponent(MinParentSize=8)

    Example: c.MinParentSize = 12

    Data Types: single | double

    Number of bins for numeric predictors, specified as a positive integer scalar or [] (empty).

    • If NumBins is empty, ([]), then the component does not bin any predictors.

    • If you specify NumBins as a positive integer scalar, then the component bins every numeric predictor into at most NumBins equiprobable bins, and then grows trees on the bin indices instead of the original data.

    Example: c = regressionTreeComponent(NumBins=50)

    Example: c.NumBins = []

    Data Types: single | double

    Number of predictors to select at random for each split, specified as "all" or a positive integer scalar.

    Example: c = regressionTreeComponent(NumVariablesToSample=3)

    Example: c.NumVariablesToSample = "all"

    Data Types: single | double | char | string

    Algorithm used to select the best split predictor at each node, specified as a value in this table.

    ValueDescription
    "allsplits"

    Standard CART — Selects the split predictor that maximizes the split-criterion gain over all possible splits of all predictors [1].

    "curvature"Curvature test — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [2]. Training speed is similar to standard CART.
    "interaction-curvature"Interaction test — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and that minimizes the p-value of a chi-square test of independence between each pair of predictors and response [2]. Training speed can be slower than standard CART.

    For "curvature" and "interaction-curvature", if all tests yield p-values greater than 0.05, then the component stops splitting nodes.

    Example: c = regressionTreeComponent(PredictorSelection="curvature")

    Example: c.PredictorSelection = "interaction-curvature"

    Data Types: char | string

    Flag to estimate the optimal sequence of pruned subtrees, specified as "on" or "off". If Prune is "on", then the component grows the regression tree without pruning it, but estimates the optimal sequence of pruned subtrees. If Prune is "off" and MergeLeaves is also "off", then the component grows the regression tree without estimating the optimal sequence of pruned subtrees.

    Example: c = regressionTreeComponent(Prune="off")

    Example: c.Prune = "on"

    Data Types: char | string

    Pruning criterion, specified as "mse".

    Data Types: char | string

    Quadratic error tolerance per node, specified as a positive scalar. The component stops splitting nodes when the weighted mean squared error per node drops below QuadraticErrorTolerance*ε, where ε is the weighted mean squared error of all n responses computed before growing the decision tree.

    ε=i=1nwi(yiy¯)2.

    wi is the weight of observation i, given that the weights of all the observations sum to one (i=1nwi=1), and

    y¯=i=1nwiyi

    is the weighted average of all the responses.

    Example: c = regressionTreeComponent(QuadraticErrorTolerance=1e-4)

    Example: c.QuadraticErrorTolerance = 1e-5

    Data Types: single | double

    Flag to enforce reproducibility over repeated runs of training a model, specified as 0 (false) or 1 (true).

    If NumVariablesToSample is not "all", then the component selects predictors at random for each split. To reproduce the random selections, you must specify Reproducible as true and set the seed of the random number generator using rng.

    Example: c = regressionTreeComponent(Reproducible=true)

    Example: c.Reproducible = 0

    Data Types: logical

    Split criterion, specified as "MSE".

    Data Types: char | string

    Surrogate decision splits flag, specified as "off", "on", "all", or a positive integer scalar.

    • If Surrogate is "on", the component finds at most 10 surrogate splits at each branch node.

    • If Surrogate is "all", the component finds all surrogate splits at each branch model, which can use considerable time and memory.

    • If Surrogate is a positive integer scalar, the component finds at most the specified number of surrogate splits at each branch node.

    Example: c = regressionTreeComponent(Surrogate="on")

    Example: c.Surrogate = "all"

    Data Types: single | double | char | string

    Run Parameters

    The software sets run parameters when you create the component. You can modify the run parameters using dot notation at any time. Any unset run parameters use the corresponding default values.

    Loss function, specified as "mse" (mean squared error) or a function handle.

    To specify a custom loss function, use function handle notation. For more information on custom loss functions, see LossFun.

    Example: c = regressionTreeComponent(LossFun=@lossfun)

    Example: c.LossFun = "mse"

    Data Types: char | string | function_handle

    Function for transforming raw response values, specified as a function handle or function name. The default is "none", which means @(y)y, or no transformation. The function must accept a vector (the original response values) and return a vector of the same size (the transformed response values).

    Example: c = regressionTreeComponent(ResponseTransform=@(y)exp(y))

    Example: c.ResponseTransform = "exp"

    Data Types: char | string | function_handle

    Tree size, specified as one of the following values.

    • "se" — The component returns the best pruning level, which corresponds to the smallest tree whose mean squared error (MSE) is within one standard error of the minimum MSE.

    • "min" — The component returns the best pruning level, which corresponds to the minimal MSE tree.

    Example: c = regressionTreeComponent(TreeSize="min")

    Example: c.TreeSize = "se"

    Data Types: char | string

    Component Properties

    The software sets component properties when you create the component. You can modify the component properties (excluding HasLearnables and HasLearned) using dot notation at any time. You cannot modify the HasLearnables and HasLearned properties directly.

    Component identifier, specified as a character vector or string scalar.

    Example: c = regressionTreeComponent(Name="Tree")

    Example: c.Name = "TreeRegression"

    Data Types: char | string

    Names of the input ports, specified as a character vector, string array, or cell array of character vectors. If UseWeights is true, the software adds the input port "Weights" to Inputs.

    Example: c = regressionTreeComponent(Inputs=["X","Y"])

    Example: c.Inputs = ["X1","Y1"]

    Data Types: char | string | cell

    Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

    Example: c = regressionTreeComponent(Outputs=["Responses","LossVal"])

    Example: c.Outputs = ["X","Y"]

    Data Types: char | string | cell

    Tags that enable the automatic connection of the component inputs with other components or pipelines, specified as a nonnegative integer vector. If you specify InputTags, then the number of tags must match the number of inputs in Inputs. If UseWeights is true, the software adds a third input tag to InputTags.

    Example: c = regressionTreeComponent(InputTags=[0 1])

    Example: c.InputTags = [1 0]

    Data Types: single | double

    Tags that enable the automatic connection of the component outputs with other components or pipelines, specified as a nonnegative integer vector. If you specify OutputTags, then the number of tags must match the number of outputs in Outputs.

    Example: c = regressionTreeComponent(OutputTags=[0 1])

    Example: c.OutputTags=[1 2]

    Data Types: single | double

    This property is read-only.

    Indicator for the learnables, returned as 1 (true). A value of 1 indicates that the component contains Learnables.

    Data Types: logical

    This property is read-only.

    Indicator showing the learning status of the component, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the component and the Learnables are nonempty.

    Data Types: logical

    Learnables

    The software sets learnables when you use the learn object function. You cannot modify learnables directly.

    This property is read-only.

    Trained model, returned as a CompactRegressionTree model object.

    Object Functions

    learnInitialize and evaluate pipeline or component
    runExecute pipeline or component for inference after learning
    resetReset pipeline or component
    seriesConnect components in series to create pipeline
    parallelConnect components or pipelines in parallel to create pipeline
    viewView diagram of pipeline inputs, outputs, components, and connections

    Examples

    collapse all

    Create a regressionTreeComponent component.

    component = regressionTreeComponent
    component = 
    
      regressionTreeComponent with properties:
    
                Name: "RegressionTree"
              Inputs: ["Predictors"    "Response"]
           InputTags: [1 2]
             Outputs: ["Predictions"    "Loss"]
          OutputTags: [1 0]
    
       
    Learnables (HasLearned = false)
        TrainedModel: []
    
       
    Structural Parameters (locked)
          UseWeights: 0
    
    
    Show all parameters

    component is a regressionTreeComponent object that contains one learnable, TrainedModel. This property remains empty until you pass data to the component during the learn phase.

    To limit the number of splits in the tree model, set the MaxNumSplits property of the component to 7.

    component.MaxNumSplits = 7;

    Load the carsmall data set and remove missing entries from the data. Separate the predictor and response variables into two tables.

    load carsmall
    carData = table(Cylinders,Displacement,Horsepower,Weight,MPG);
    R = rmmissing(carData);
    X = R(:,["Cylinders","Displacement","Horsepower","Weight"]);
    Y = R(:,"MPG");

    Train the regressionTreeComponent.

    component = learn(component,X,Y)
    component = 
      regressionTreeComponent with properties:
    
                Name: "RegressionTree"
              Inputs: ["Predictors"    "Response"]
           InputTags: [1 2]
             Outputs: ["Predictions"    "Loss"]
          OutputTags: [1 0]
    
       
    Learnables (HasLearned = true)
        TrainedModel: [1×1 classreg.learning.regr.CompactRegressionTree]
    
       
    Structural Parameters (locked)
          UseWeights: 0
    
       
    Learn Parameters (locked)
        MaxNumSplits: 7
    
    
    Show all parameters
    

    Note that the HasLearned property is set to true, which indicates that the software trained the tree model TrainedModel. You can use component to predict response values for new data using the run function.

    References

    [1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

    [2] Loh, W.Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386.

    [3] Loh, W.Y. and Y.S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.

    Version History

    Introduced in R2026a

    See Also

    | |