Main Content

crossval

Cross-validate multiclass error-correcting output codes (ECOC) model

Description

example

CVMdl = crossval(Mdl) returns a cross-validated (partitioned) multiclass error-correcting output codes (ECOC) model (CVMdl) from a trained ECOC model (Mdl). By default, crossval uses 10-fold cross-validation on the training data to create CVMdl, a ClassificationPartitionedECOC model.

example

CVMdl = crossval(Mdl,Name,Value) returns a partitioned ECOC model with additional options specified by one or more name-value pair arguments. For example, you can specify the number of folds or a holdout sample proportion.

Examples

collapse all

Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error.

Load Fisher's iris data set. Specify the predictor data X and the response data Y.

load fisheriris
X = meas;
Y = species;
rng(1); % For reproducibility

Create an SVM template, and standardize the predictors.

t = templateSVM('Standardize',true)
t = 
Fit template for SVM.
    Standardize: 1

t is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values.

Train the ECOC classifier, and specify the class order.

Mdl = fitcecoc(X,Y,'Learners',t,...
    'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a ClassificationECOC classifier. You can access its properties using dot notation.

Cross-validate Mdl using 10-fold cross-validation.

CVMdl = crossval(Mdl);

CVMdl is a ClassificationPartitionedECOC cross-validated ECOC classifier.

Estimate the generalized classification error.

genError = kfoldLoss(CVMdl)
genError = 0.0400

The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.

Consider the arrhythmia data set. This data set contains 16 classes, 13 of which are represented in the data. The first class indicates that the subject does not have arrhythmia, and the last class indicates that the arrhythmia state of the subject is not recorded. The other classes are ordinal levels indicating the severity of arrhythmia.

Train an ECOC classifier with a custom coding design specified by the description of the classes.

Load the arrhythmia data set. Convert Y to a categorical variable, and determine the number of classes.

load arrhythmia
Y = categorical(Y);
K = unique(Y); % Number of distinct classes

Construct a coding matrix that describes the nature of the classes.

OrdMat = designecoc(11,'ordinal');
nOrdMat = size(OrdMat);
class1VSOrd = [1; -ones(11,1); 0];
class1VSClass16 = [1; zeros(11,1); -1];
OrdVSClass16 = [0; ones(11,1); -1];
Coding = [class1VSOrd class1VSClass16 OrdVSClass16,...
    [zeros(1,nOrdMat(2)); OrdMat; zeros(1,nOrdMat(2))]];

Train an ECOC classifier using the custom coding design (Coding) and parallel computing. Specify an ensemble of 50 classification trees boosted using GentleBoost.

t = templateEnsemble('GentleBoost',50,'Tree');
options = statset('UseParallel',true);
Mdl = fitcecoc(X,Y,'Coding',Coding,'Learners',t,'Options',options);
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).

Mdl is a ClassificationECOC model. You can access its properties using dot notation.

Cross-validate Mdl using 8-fold cross-validation and parallel computing.

rng(1); % For reproducibility
CVMdl = crossval(Mdl,'Options',options,'KFold',8);
Warning: One or more folds do not contain points from all the groups.

Because some classes have low relative frequency, some folds do not train using observations from those classes. CVMdl is a ClassificationPartitionedECOC cross-validated ECOC model.

Estimate the generalization error using parallel computing.

error = kfoldLoss(CVMdl,'Options',options)
error = 0.3208

The cross-validated classification error is 32%, which indicates that this model does not generalize well. To improve the model, try training using a different boosting method, such as RobustBoost, or a different algorithm, such as SVM.

Input Arguments

collapse all

Full, trained multiclass ECOC model, specified as a ClassificationECOC model trained with fitcecoc.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: crossval(Mdl,'KFold',3) specifies using three folds in a cross-validated model.

Cross-validation partition, specified as a cvpartition object that specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp.

Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Holdout=0.1

Data Types: double | single

Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: KFold=5

Data Types: single | double

Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Leaveout="on"

Data Types: char | string

Estimation options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset.

To invoke parallel computing:

  • You need a Parallel Computing Toolbox™ license.

  • Specify 'Options',statset('UseParallel',true).

Tips

  • Assess the predictive performance of Mdl on cross-validated data using the "kfold" methods and properties of CVMdl, such as kfoldLoss.

Alternative Functionality

Instead of training an ECOC model and then cross-validating it, you can create a cross-validated ECOC model directly by using fitcecoc and specifying one of these name-value pair arguments: 'CrossVal', 'CVPartition', 'Holdout', 'Leaveout', or 'KFold'.

Extended Capabilities

Version History

Introduced in R2014b