crossval
Cross-validate multiclass error-correcting output codes (ECOC) model
Description
returns a cross-validated (partitioned) multiclass error-correcting output codes (ECOC)
model (CVMdl
= crossval(Mdl
)CVMdl
) from a trained ECOC model (Mdl
). By
default, crossval
uses 10-fold cross-validation on the training data to
create CVMdl
, a ClassificationPartitionedECOC
model.
returns a partitioned ECOC model with additional options specified by one or more name-value
pair arguments. For example, you can specify the number of folds or a holdout sample
proportion.CVMdl
= crossval(Mdl
,Name,Value
)
Examples
Cross-Validate ECOC Classifier
Cross-validate an ECOC classifier with SVM binary learners, and estimate the generalized classification error.
Load Fisher's iris data set. Specify the predictor data X
and the response data Y
.
load fisheriris X = meas; Y = species; rng(1); % For reproducibility
Create an SVM template, and standardize the predictors.
t = templateSVM('Standardize',true)
t = Fit template for SVM. Standardize: 1
t
is an SVM template. Most of the template object properties are empty. When training the ECOC classifier, the software sets the applicable properties to their default values.
Train the ECOC classifier, and specify the class order.
Mdl = fitcecoc(X,Y,'Learners',t,... 'ClassNames',{'setosa','versicolor','virginica'});
Mdl
is a ClassificationECOC
classifier. You can access its properties using dot notation.
Cross-validate Mdl
using 10-fold cross-validation.
CVMdl = crossval(Mdl);
CVMdl
is a ClassificationPartitionedECOC
cross-validated ECOC classifier.
Estimate the generalized classification error.
genError = kfoldLoss(CVMdl)
genError = 0.0400
The generalized classification error is 4%, which indicates that the ECOC classifier generalizes fairly well.
Cross-Validate ECOC Classifier Using Parallel Computing
Consider the arrhythmia
data set. This data set contains 16 classes, 13 of which are represented in the data. The first class indicates that the subject does not have arrhythmia, and the last class indicates that the arrhythmia state of the subject is not recorded. The other classes are ordinal levels indicating the severity of arrhythmia.
Train an ECOC classifier with a custom coding design specified by the description of the classes.
Load the arrhythmia
data set. Convert Y
to a categorical
variable, and determine the number of classes.
load arrhythmia Y = categorical(Y); K = unique(Y); % Number of distinct classes
Construct a coding matrix that describes the nature of the classes.
OrdMat = designecoc(11,'ordinal'); nOrdMat = size(OrdMat); class1VSOrd = [1; -ones(11,1); 0]; class1VSClass16 = [1; zeros(11,1); -1]; OrdVSClass16 = [0; ones(11,1); -1]; Coding = [class1VSOrd class1VSClass16 OrdVSClass16,... [zeros(1,nOrdMat(2)); OrdMat; zeros(1,nOrdMat(2))]];
Train an ECOC classifier using the custom coding design (Coding
) and parallel computing. Specify an ensemble of 50 classification trees boosted using GentleBoost.
t = templateEnsemble('GentleBoost',50,'Tree'); options = statset('UseParallel',true); Mdl = fitcecoc(X,Y,'Coding',Coding,'Learners',t,'Options',options);
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).
Mdl
is a ClassificationECOC
model. You can access its properties using dot notation.
Cross-validate Mdl
using 8-fold cross-validation and parallel computing.
rng(1); % For reproducibility CVMdl = crossval(Mdl,'Options',options,'KFold',8);
Warning: One or more folds do not contain points from all the groups.
Because some classes have low relative frequency, some folds do not train using observations from those classes. CVMdl
is a ClassificationPartitionedECOC
cross-validated ECOC model.
Estimate the generalization error using parallel computing.
error = kfoldLoss(CVMdl,'Options',options)
error = 0.3208
The cross-validated classification error is 32%, which indicates that this model does not generalize well. To improve the model, try training using a different boosting method, such as RobustBoost, or a different algorithm, such as SVM.
Input Arguments
Mdl
— Full, trained multiclass ECOC model
ClassificationECOC
model
Full, trained multiclass ECOC model, specified as a ClassificationECOC
model trained with fitcecoc
.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: crossval(Mdl,'KFold',3)
specifies using three folds in a
cross-validated model.
CVPartition
— Cross-validation partition
[]
(default) | cvpartition
object
Cross-validation partition, specified as a cvpartition
object that specifies the type of cross-validation and the
indexing for the training and validation sets.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Suppose you create a random partition for 5-fold cross-validation on 500
observations by using cvp = cvpartition(500,KFold=5)
. Then, you can
specify the cross-validation partition by setting
CVPartition=cvp
.
Holdout
— Fraction of data for holdout validation
scalar value in the range (0,1)
Fraction of the data used for holdout validation, specified as a scalar value in the range
[0,1]. If you specify Holdout=p
, then the software completes these
steps:
Randomly select and reserve
p*100
% of the data as validation data, and train the model using the rest of the data.Store the compact trained model in the
Trained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Holdout=0.1
Data Types: double
| single
KFold
— Number of folds
10
(default) | positive integer value greater than 1
Number of folds to use in the cross-validated model, specified as a positive integer value
greater than 1. If you specify KFold=k
, then the software completes
these steps:
Randomly partition the data into
k
sets.For each set, reserve the set as validation data, and train the model using the other
k
– 1 sets.Store the
k
compact trained models in ak
-by-1 cell vector in theTrained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: KFold=5
Data Types: single
| double
Leaveout
— Leave-one-out cross-validation flag
"off"
(default) | "on"
Leave-one-out cross-validation flag, specified as "on"
or
"off"
. If you specify Leaveout="on"
, then for
each of the n observations (where n is the number
of observations, excluding missing observations, specified in the
NumObservations
property of the model), the software completes
these steps:
Reserve the one observation as validation data, and train the model using the other n – 1 observations.
Store the n compact trained models in an n-by-1 cell vector in the
Trained
property of the cross-validated model.
To create a cross-validated model, you can specify only one of these four name-value
arguments: CVPartition
, Holdout
,
KFold
, or Leaveout
.
Example: Leaveout="on"
Data Types: char
| string
Options
— Estimation options
[]
(default) | structure array returned by statset
Estimation options, specified as the comma-separated pair consisting
of 'Options'
and a structure array returned by statset
.
To invoke parallel computing:
You need a Parallel Computing Toolbox™ license.
Specify
'Options',statset('UseParallel',true)
.
Tips
Alternative Functionality
Instead of training an ECOC model and then cross-validating it, you can create a
cross-validated ECOC model directly by using fitcecoc
and specifying one of these name-value pair arguments:
'CrossVal'
, 'CVPartition'
,
'Holdout'
, 'Leaveout'
, or
'KFold'
.
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, specify the Options
name-value argument in the call to
this function and set the UseParallel
field of the
options structure to true
using
statset
:
"Options",statset("UseParallel",true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2014b
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)