kfoldEdge
Classification edge for cross-validated classification model
Description
returns the classification edge obtained by the cross-validated
classification model E
= kfoldEdge(CVMdl
)CVMdl
. For every fold,
kfoldEdge
computes the classification edge for validation-fold
observations using a classifier trained on training-fold observations.
CVMdl.X
and CVMdl.Y
contain both sets of
observations.
returns the classification edge with additional options specified by one or more name-value
arguments. For example, specify the folds to use or specify to compute the classification
edge for each individual fold.E
= kfoldEdge(CVMdl
,Name,Value
)
Examples
Estimate k-fold Edge of Classifier
Compute the k-fold edge for a model trained on Fisher's iris data.
Load Fisher's iris data set.
load fisheriris
Train a classification tree classifier.
tree = fitctree(meas,species);
Cross-validate the classifier using 10-fold cross-validation.
cvtree = crossval(tree);
Compute the k-fold edge.
edge = kfoldEdge(cvtree)
edge = 0.8578
Compute K-Fold Edge of Held-Out Observations
Compute the k-fold edge for an ensemble trained on the Fisher iris data.
Load the sample data set.
load fisheriris
Train an ensemble of 100 boosted classification trees.
t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Learners',t);
Create a cross-validated ensemble from ens
and find the classification edge.
rng(10,'twister') % For reproducibility cvens = crossval(ens); E = kfoldEdge(cvens)
E = 3.2033
Input Arguments
CVMdl
— Cross-validated partitioned classifier
ClassificationPartitionedModel
object | ClassificationPartitionedEnsemble
object | ClassificationPartitionedGAM
object
Cross-validated partitioned classifier, specified as a ClassificationPartitionedModel
, ClassificationPartitionedEnsemble
, or ClassificationPartitionedGAM
object. You can create the object in two ways:
Pass a trained classification model listed in the following table to its
crossval
object function.Train a classification model using a function listed in the following table and specify one of the cross-validation name-value arguments for the function.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: kfoldEdge(CVMdl,'Folds',[1 2 3 5])
specifies to use the
first, second, third, and fifth folds to compute the classification edge, but to exclude the
fourth fold.
Folds
— Fold indices to use
1:CVMdl.KFold
(default) | positive integer vector
Fold indices to use, specified as a positive integer vector. The elements of Folds
must be within the range from 1
to CVMdl.KFold
.
The software uses only the folds specified in Folds
.
Example: 'Folds',[1 4 10]
Data Types: single
| double
IncludeInteractions
— Flag to include interaction terms
true
| false
Flag to include interaction terms of the model, specified as true
or
false
. This argument is valid only for a generalized
additive model (GAM). That is, you can specify this argument only when
CVMdl
is ClassificationPartitionedGAM
.
The default value is true
if the models in
CVMdl
(CVMdl.Trained
) contain
interaction terms. The value must be false
if the models do not
contain interaction terms.
Example: 'IncludeInteractions',false
Data Types: logical
Mode
— Aggregation level for output
'average'
(default) | 'individual'
| 'cumulative'
Aggregation level for the output, specified as 'average'
, 'individual'
, or 'cumulative'
.
Value | Description |
---|---|
'average' | The output is a scalar average over all folds. |
'individual' | The output is a vector of length k containing one value per fold, where k is the number of folds. |
'cumulative' | Note If you want to specify this value,
|
Example: 'Mode','individual'
Output Arguments
E
— Classification edge
numeric scalar | numeric column vector
Classification edge, returned as a numeric scalar or numeric column vector.
If
Mode
is'average'
, thenE
is the average classification edge over all folds.If
Mode
is'individual'
, thenE
is a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.If
Mode
is'cumulative'
andCVMdl
isClassificationPartitionedEnsemble
, thenE
is amin(CVMdl.NumTrainedPerFold)
-by-1 numeric column vector. Each elementj
is the average classification edge over all folds that the function obtains by using ensembles trained with weak learners1:j
.If
Mode
is'cumulative'
andCVMdl
isClassificationPartitionedGAM
, then the output value depends on theIncludeInteractions
value.If
IncludeInteractions
isfalse
, thenL
is a(1 + min(NumTrainedPerFold.PredictorTrees))
-by-1 numeric column vector. The first element ofL
is the average classification edge over all folds that is obtained using only the intercept (constant) term. The(j + 1)
th element ofL
is the average edge obtained using the intercept term and the firstj
predictor trees per linear term.If
IncludeInteractions
istrue
, thenL
is a(1 + min(NumTrainedPerFold.InteractionTrees))
-by-1 numeric column vector. The first element ofL
is the average classification edge over all folds that is obtained using the intercept (constant) term and all predictor trees per linear term. The(j + 1)
th element ofL
is the average edge obtained using the intercept term, all predictor trees per linear term, and the firstj
interaction trees per interaction term.
More About
Classification Edge
The classification edge is the weighted mean of the classification margins.
One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge.
Classification Margin
The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The classification margin for multiclass classification is the difference between the classification score for the true class and the maximal score for the false classes.
If the margins are on the same scale (that is, the score values are based on the same score transformation), then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.
Algorithms
kfoldEdge
computes the classification edge as described in the
corresponding edge
object function. For a model-specific description, see
the appropriate edge
function reference page in the following
table.
Model Type | edge Function |
---|---|
Discriminant analysis classifier | edge |
Ensemble classifier | edge |
Generalized additive model classifier | edge |
k-nearest neighbor classifier | edge |
Naive Bayes classifier | edge |
Neural network classifier | edge |
Support vector machine classifier | edge |
Binary decision tree for multiclass classification | edge |
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Usage notes and limitations:
This function fully supports GPU arrays for the following cross-validated model objects:
Ensemble classifier trained with
fitcensemble
k-nearest neighbor classifier trained with
fitcknn
Support vector machine classifier trained with
fitcsvm
Binary decision tree for multiclass classification trained with
fitctree
Neural network for classification trained with
fitcnet
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011aR2024b: Specify GPU arrays for neural network models (requires Parallel Computing Toolbox)
kfoldEdge
fully supports GPU arrays for ClassificationPartitionedModel
models trained using
fitcnet
.
R2023b: Observations with missing predictor values are used in resubstitution and cross-validation computations
Starting in R2023b, the following classification model object functions use observations with missing predictor values as part of resubstitution ("resub") and cross-validation ("kfold") computations for classification edges, losses, margins, and predictions.
In previous releases, the software omitted observations with missing predictor values from the resubstitution and cross-validation computations.
R2022a: kfoldEdge
returns a different value for cross-validated SVM and ensemble classifiers with a nondefault cost matrix
If you specify a nondefault cost matrix when you cross-validate the input model object for an SVM or ensemble classification model, the kfoldEdge
function returns a different value compared to previous releases.
The kfoldEdge
function uses the
observation weights stored in the W
property. The way the function uses the
W
property value has not changed. However, the property value stored in the input model object has changed for
cross-validated SVM and ensemble model objects with a nondefault cost matrix, so the
function can return a different value.
For details about the property value change, see Cost property stores the user-specified cost matrix (cross-validated SVM classifier) or Cost property stores the user-specified cost matrix (cross-validated ensemble classifier).
If you want the software to handle the cost matrix, prior
probabilities, and observation weights in the same way as in previous releases, adjust the prior
probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a
classification model, specify the adjusted prior probabilities and observation weights by using
the Prior
and Weights
name-value arguments, respectively,
and use the default cost matrix.
See Also
kfoldPredict
| kfoldMargin
| kfoldLoss
| kfoldfun
| ClassificationPartitionedModel
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)