classificationTreeComponent
Pipeline component for multiclass classification using binary decision trees
Since R2026a
Description
classificationTreeComponent is a pipeline component that performs
multiclass classification using a binary decision tree. The pipeline component uses the
functionality of the fitctree function during the learn phase to train
the tree classification model. The component uses the functionality of the loss and predict functions during the run phase to perform
classification.
Creation
Description
creates a pipeline component for multiclass classification using a binary decision
tree.component = classificationTreeComponent
sets writable Properties using one or more
name-value arguments. For example, you can specify the maximum number of decision splits,
pruning criterion, and misclassification cost.component = classificationTreeComponent(Name=Value)
Properties
Structural Parameters
The software sets structural parameters when you create the component. You cannot modify structural parameters after creating the component.
This property is read-only after the component is created.
Observation weights flag, specified as 0 (false)
or 1 (true). If UseWeights is
true, the component adds a third input "Weights" to the
Inputs component property, and a third input tag
3 to the InputTags component
property.
Example: c = classificationTreeComponent(UseWeights=1)
Data Types: logical
Learn Parameters
The software sets learn parameters when you create the component. You can modify learn
parameters using dot notation any time before you use the learn object
function. Any unset learn parameters use the corresponding default values.
Algorithm for the best split on a categorical predictor with C categories for data and K ≥ 3 classes, specified as one of the following values.
| Value | Description |
|---|---|
"Exact" | Consider all 2C–1 – 1 combinations and choose the split that has the lowest impurity. |
"PullLeft" | Start with all C categories on the right branch. Consider moving each category to the left branch to achieve the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity. |
"PCA" | Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits. Choose the split that has the lowest impurity. |
"OVAbyClass" | Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the lowest impurity. |
By default, the component chooses the optimal subset of algorithms for each split
using the known number of classes and levels of a categorical predictor. For binary
classification, the component uses "Exact".
For more information, see Splitting Categorical Predictors in Classification Trees.
Example: c =
classificationTreeComponent(AlgorithmForCategorical="PCA")
Example: c.AlgorithmForCategorical = "Exact"
Data Types: char | string
Misclassification cost, specified as a square matrix or a structure.
If
Costis a square matrix,Cost(i,j)is the cost of classifying a point into classjif its true class isi.If
Costis a structureS, it has two fields:S.ClassificationCosts, which contains the cost matrix; andS.ClassNames, which contains the group names and defines the class order of the rows and columns of the cost matrix.
The default is Cost(i,j)=1 if i~=j, and
Cost(i,j)=0 if i=j.
Example: c = classificationTreeComponent(Cost=[0 1; 2
0])
Example: c.Cost = [0 1; 1 0]
Data Types: single | double | struct
Maximum number of category levels, specified as a nonnegative scalar. The
component splits a categorical predictor using the exact search algorithm if the
predictor has at most MaxNumCategories levels in the split node.
Otherwise, the component finds the best categorical split using one of the inexact
algorithms.
Example: c =
classificationTreeComponent(MaxNumCategories=8)
Example: c.MaxNumCategories = 15
Data Types: single | double
Maximum number of decision splits (or branch nodes), specified as a nonnegative
scalar. The component splits MaxNumSplits or fewer branch
nodes.
The default value is size(X – 1), where
X is the number of observations in the first data argument of
learn.
Example: c =
classificationTreeComponent(MaxNumSplits=5)
Example: c.MaxNumSplits = 10
Data Types: single | double
Flag to merge leaves, specified as "on" or
"off".
When MergeLeaves is "on", the component:
Merges leaves originating from the same parent node if doing so yields a sum of risk values greater than or equal to the risk associated with the parent node
Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree
Example: c =
classificationTreeComponent(MergeLeaves="off")
Example: c.MergeLeaves = "on"
Data Types: char | string
Minimum number of leaf node observations, specified as a positive integer scalar.
Each leaf has at least MinLeafSize observations per tree leaf. If
you specify both MinParentSize and MinLeafSize, the component uses
the setting that gives larger leaves: .MinParentSize =
max(MinParentSize,2*MinLeafSize)
Example: c =
classificationTreeComponent(MinLeafSize=3)
Example: c.MinLeafSize = 1
Data Types: single | double
Minimum number of branch node observations, specified as a positive integer
scalar. Each branch node has at least MinParentSize observations.
If you supply both MinParentSize and MinLeafSize, the component uses the setting that gives larger leaves:
. MinParentSize =
max(MinParentSize,2*MinLeafSize)
Example: c =
classificationTreeComponent(MinParentSize=8)
Example: c.MinParentSize = 12
Data Types: single | double
Number of bins for numeric predictors, specified as [] (empty)
or a positive integer scalar.
If
NumBinsis empty ([]), the component does not bin any predictors.If
NumBinsis a positive integer scalar, the component bins every numeric predictor into at mostNumBinsequiprobable bins, and then grows trees on the bin indices instead of the original data.
Example: c =
classificationTreeComponent(NumBins=50)
Example: c.NumBins = []
Data Types: single | double
Number of predictors to select at random for each split, specified as
"all" or a positive integer scalar.
Example: c =
classificationTreeComponent(NumVariablesToSample=3)
Example: c.NumVariablesToSample = "all"
Data Types: single | double | char | string
Algorithm used to select the best split predictor at each node, specified as one of the following values.
| Value | Description |
|---|---|
"allsplits" | Standard CART (Classification and Regression Tree) algorithm — Selects the split predictor that maximizes the split-criterion gain over all possible splits of all predictors [1] |
"curvature" | Curvature test — Selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response [3]. The training speed is similar to the speed of standard CART. |
"interaction-curvature" | Interaction test — Chooses the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and minimizes the p-value of a chi-square test of independence between each pair of predictors and the response [2]. The training speed can be slower than the speed of standard CART. |
For "curvature" and
"interaction-curvature", if all tests yield
p-values greater than 0.05, the component stops splitting
nodes.
Example: c =
classificationTreeComponent(PredictorSelection="curvature")
Example: c.PredictorSelection =
"interaction-curvature"
Data Types: char | string
Prior probabilities for each class, specified as a value in this table.
| Value | Description |
|---|---|
"empirical" | The class prior probabilities are the class relative frequencies. The class relative
frequencies are determined by the second data argument of
learn. |
"uniform" | All class prior probabilities are equal to 1/K, where K is the number of classes. |
| numeric vector | A numeric vector with one value for each class. Each element is a class prior probability.
The component normalizes the elements such that they sum to
1. |
| structure | A structure
|
If you set UseWeights to true, the component
renormalizes the weights to add up to the value of the prior probability in
the respective class.
Example: c = classificationTreeComponent(Prior="uniform")
Example: c.Prior = "empirical"
Data Types: single | double | char | string | struct
Flag to estimate the optimal sequence of pruned subtrees, specified as
"on" or "off". If Prune
is "on", the component estimates the optimal sequence of pruned
subtrees, but grows the classification tree without pruning it. If
Prune is "off" and MergeLeaves
is also "off", the component grows the classification tree without
estimating the optimal sequence of pruned subtrees.
Example: c =
classificationTreeComponent(Prune="off")
Example: c.Prune = "on"
Data Types: char | string
Pruning criterion, specified as "error" or
"impurity".
IfPruneCriterion is "error", the component
splits nodes of the decision tree based on node error, or the fraction of
misclassified classes at a node. If PruneCriterion is
"impurity", the component splits nodes of the decision tree based
on the impurity measure specified by the SplitCriterion value.
Example: c =
classificationTreeComponent(PruneCriterion="impurity")
Example: c.PruneCriterion = "error"
Data Types: char | string
Split criterion, specified as "gdi" for Gini's diversity index,
"twoing" for the twoing rule, or "deviance"
for maximum deviance reduction (also known as cross-entropy).
Gini's diversity index and maximum deviance reduction are both measures of
impurity. A value of 0 represents a pure node with just one class.
Otherwise, these values are positive.
The twoing rule is not a purity measure of a node, but is a measure for deciding how to split a node into two. If the expression is large, the split made each child node purer. If the expression is small, the split does not increase node purity.
For more information, see Impurity and Node Error.
Example: c =
classificationTreeComponent(SplitCriterion="deviance")
Example: c.SplitCriterion = "twoing"
Data Types: char | string
Surrogate decision splits, specified as "off",
"on", "all", or a positive integer scalar.
If
Surrogateis"off", the component does not use surrogate splits.If
Surrogateis"on", the component finds at most 10 surrogate splits at each branch node.If
Surrogateis"all", the component finds all surrogate splits at each branch node, a process that can use considerable time and memory.If
Surrogateis a positive integer scalar, the component finds at most the specified number of surrogate splits at each branch node.
Example: c =
classificationTreeComponent(Surrogate="on")
Example: c.Surrogate = "all"
Data Types: single | double | char | string
Run Parameters
The software sets run parameters when you create the component. You can modify the run parameters at any time. Any unset run parameters use the corresponding default values.
Loss function, specified as a built-in loss function name or a function handle.
| Value | Description |
|---|---|
"binodeviance" | Binomial deviance |
"classifcost" | Observed misclassification cost |
"classiferror" | Misclassified rate in decimal |
"exponential" | Exponential loss |
"hinge" | Hinge loss |
"logit" | Logistic loss |
"mincost" | Minimal expected misclassification cost (for classification scores that are posterior probabilities) |
"quadratic" | Quadratic loss |
To specify a custom loss function, use function handle notation. For more
information on custom loss functions, see LossFun.
Example: c =
classificationTreeComponent(LossFun="classiferror")
Example: c.LossFun = "binodeviance"
Data Types: char | string | function_handle
Tree size, specified as "se" or "min".
If
TreeSizeis"se", the component returns the best pruning level, which corresponds to the highest pruning level with the loss within one standard deviation of the minimum.If
TreeSizeis"min", the component returns the best pruning level, which corresponds to the pruning level with the smallest loss.
Example: c =
classificationTreeComponent(TreeSize="min")
Example: c.TreeSize = "se"
Data Types: char | string
Score transformation, specified as a built-in function name or a function handle.
This table summarizes the available built-in score transform functions.
| Value | Description |
|---|---|
"doublelogit" | 1/(1 + e–2x) |
"invlogit" | log(x / (1 – x)) |
"ismax" | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 |
"logit" | 1/(1 + e–x) |
"none" or "identity" | x (no transformation) |
"sign" | –1 for x < 0 0 for x = 0 1 for x > 0 |
"symmetric" | 2x – 1 |
"symmetricismax" | Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 |
"symmetriclogit" | 2/(1 + e–x) – 1 |
To specify a custom score transform function, use function handle notation. The function must accept a matrix containing the original scores and return a matrix of the same size containing the transformed scores.
Example: c = classificationTreeComponent(ScoreTransform="logit")
Example: c.ScoreTransform = "symmetric"
Data Types: char | string | function_handle
Component Properties
The software sets component properties when you create the component. You can modify the
component properties (excluding HasLearnables and
HasLearned) at any time. You cannot modify the
HasLearnables and HasLearned properties
directly.
Component identifier, specified as a character vector or string scalar.
Example: c =
classificationTreeComponent(Name="Tree")
Example: c.Name="TreeClassifier"
Data Types: char | string
Names of the input ports, specified as a character vector, string array, or cell
array of character vectors. If UseWeights is true, the component adds the input port
"Weights" to Inputs.
Example: c =
classificationTreeComponent(Inputs=["X","Y"])
Example: c.Inputs = ["X1","Y1"]
Data Types: char | string | cell
Names of the output ports, specified as a character vector, string array, or cell array of character vectors.
Example: c =
classificationTreeComponent(Outputs=["Class","ClassScore","LossVal"])
Example: c.Outputs = ["X","Y","Z"]
Data Types: char | string | cell
Tags that enable the automatic connection of the component inputs with other
components or pipelines, specified as a nonnegative integer vector. If you specify
InputTags, the number of tags must match the number of inputs
in Inputs. If
UseWeights is true, the component adds a third input
tag to InputTags.
Example: c = classificationTreeComponent(InputTags=[0
1])
Example: c.InputTags = [1 0]
Data Types: single | double
Tags that enable the automatic connection of the component outputs with other
components or pipelines, specified as a nonnegative integer vector. If you specify
OutputTags, the number of tags must match the number of outputs
in Outputs.
Example: c = classificationTreeComponent(OutputTags=[1 0
4])
Example: c.OutputTags = [1 2 0]
Data Types: single | double
This property is read-only.
Indicator for learnables, returned as 1
(true). A value of 1 indicates that the
component contains Learnables.
Data Types: logical
This property is read-only.
Indicator showing the learning status of the component, returned as
0 (false) or 1
(true). A value 1 indicates that the
learn object function has been applied to the component and the
Learnables are nonempty.
Data Types: logical
Learnables
The software sets learnables when you use the learn object
function. You cannot modify learnables directly.
This property is read-only.
Trained model, returned as a CompactClassificationTree model object.
Object Functions
learn | Initialize and evaluate pipeline or component |
run | Execute pipeline or component for inference after learning |
reset | Reset pipeline or component |
series | Connect components in series to create pipeline |
parallel | Connect components or pipelines in parallel to create pipeline |
view | View diagram of pipeline inputs, outputs, components, and connections |
Examples
Create a classificationTreeComponent pipeline component.
component = classificationTreeComponent
component =
classificationTreeComponent with properties:
Name: "ClassificationTree"
Inputs: ["Predictors" "Response"]
InputTags: [1 2]
Outputs: ["Predictions" "Scores" "Loss"]
OutputTags: [1 0 0]
Learnables (HasLearned = false)
TrainedModel: []
Structural Parameters (locked)
UseWeights: 0
Show all parameters
component is a classificationTreeComponent
object that contains one learnable, TrainedModel. This property
remains empty until you pass data to the component during the learn phase.
To limit the number of splits in the tree model, set the
MaxNumSplits property of the component to
7.
component.MaxNumSplits = 7;
Load the ionosphere data set and save the data in two
tables.
load ionosphere
X = array2table(X);
Y = array2table(Y);Use the learn object function to train the
classificationTreeComponent object using the entire data set.
component = learn(component,X,Y)
component =
classificationTreeComponent with properties:
Name: "ClassificationTree"
Inputs: ["Predictors" "Response"]
InputTags: [1 2]
Outputs: ["Predictions" "Scores" "Loss"]
OutputTags: [1 0 0]
Learnables (HasLearned = true)
TrainedModel: [1×1 classreg.learning.classif.CompactClassificationTree]
Structural Parameters (locked)
UseWeights: 0
Learn Parameters (locked)
MaxNumSplits: 7
Show all parameters
Note that the HasLearned property is set to
true, which indicates that the software trained the classification
tree model TrainedModel. You can use component
to classify new data using the run object function.
References
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
[2] Loh, W. Y. “Regression Trees with Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, Vol. 12, 2002, pp. 361–386.
[3] Loh, W. Y., and Y. S. Shih. “Split Selection Methods for Classification Trees.” Statistica Sinica, Vol. 7, 1997, pp. 815–840.
Version History
Introduced in R2026a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)