crossvalind
Generate indices for training and test sets
Syntax
Description
___ = crossvalind(___,
specifies additional options using one or more name-value pair arguments in addition
to the arguments in previous syntaxes. For example, Name,Value
)cvIndices =
crossvalind('HoldOut',Groups,0.2,'Class',{'Cancer','Control'})
specifies to use observations from the 'Cancer' and 'Control' groups to generate
indices that represent 20% of observations as the holdout set and 80% as the
training set.
Examples
Perform 10-Fold Cross-Validation
Create indices for the 10-fold cross-validation and classify measurement data for the Fisher iris data set. The Fisher iris data set contains width and length measurements of petals and sepals from three species of irises.
Load the data set.
load fisheriris
Create indices for the 10-fold cross-validation.
indices = crossvalind('Kfold',species,10);
Initialize an object to measure the performance of the classifier.
cp = classperf(species);
Perform the classification using the measurement data and report the error rate, which is the ratio of the number of incorrectly classified samples divided by the total number of classified samples.
for i = 1:10 test = (indices == i); train = ~test; class = classify(meas(test,:),meas(train,:),species(train,:)); classperf(cp,class,test); end cp.ErrorRate
ans = 0.0200
Suppose you want to use the observation data from the setosa
and virginica
species only and exclude the versicolor
species from cross-validation.
labels = {'setosa','virginica'}; indices = crossvalind('Kfold',species,10,'Classes',labels);
indices
now contains zeros for the rows that belong to the versicolor
species.
Perform the classification again.
for i = 1:10 test = (indices == i); train = ~test; class = classify(meas(test,:),meas(train,:),species(train,:)); classperf(cp,class,test); end cp.ErrorRate
ans = 0.0160
Perform Leave-One-Out Cross-Validation
Load the carbig data set.
load carbig;
x = Displacement;
y = Acceleration;
N = length(x);
Train a second degree polynomial model with the leave-one-out cross-validation, and evaluate the averaged cross-validation error. The function randomly selects one observation to hold out for the evaluation set, and using this method within a loop does not guarantee disjointed evaluation sets, and you may see a different CVerr for each run.
sse = 0; % Initialize the sum of squared error. for i = 1:100 [train,test] = crossvalind('LeaveMOut',N,1); yhat = polyval(polyfit(x(train),y(train),2),x(test)); sse = sse + sum((yhat - y(test)).^2); end CVerr = sse / 100;
Input Arguments
cvMethod
— Cross-validation method
character vector | string
Cross-validation method, specified as a character vector or string.
This table describes the valid cross-validation methods. Depending on the
method, the third input argument (M
) has different
meanings and requirements.
cvMethod | M | Description |
---|---|---|
|
| The method uses K-fold
cross-validation to generate indices. This method uses
|
|
| The method randomly selects approximately
|
|
| The method randomly selects |
|
| The method randomly selects
|
Example: 'Kfold'
Data Types: char
| string
N
— Total number of observations or grouping information
positive integer | vector of positive integers | logical vector | cell array of character vectors
Total number of observations or grouping information, specified as a positive integer, vector of positive integers, logical vector, or cell array of character vectors.
N
can be a positive integer specifying the total
number of samples in your data set, for instance.
N
can also be a vector of positive integers or
logical values, or a cell array of character vectors, containing grouping
information or labels for your samples. The partition of the groups depends
on the type of cross-validation. For 'Kfold'
, each group
is divided into M
subsets, approximately equal in size.
For all other methods, approximately equal numbers of observations from each
group are selected for the evaluation (test) set. The training set contains
at least one observation from each group regardless of the cross-validation
method you use.
Example: 100
Data Types: double
| cell
M
— Cross-validation parameter
positive scalar | positive integer | two-element vector
Cross-validation parameter, specified as a positive scalar between 0 and
1, positive integer, or two-element vector. Depending on the
cross-validation method, the requirements for M
differ.
For details, see cvMethod
.
Example: 5
Data Types: double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: [train,test] =
crossvalind('LeaveMOut',groups,1,'Min',3)
specifies to have at least
three observations in each group in the training set when performing the
leave-one-out cross-validation.
Classes
— Class or group information
vector of positive integers | character vector | string | string vector | cell array of character vectors
Class or group information, specified as the comma-separated pair
consisting of 'Classes'
and a vector of positive
integers, character vector, string, string vector, or cell array of
character vectors. This option lets you restrict the observations to
only the specified groups.
This name-value pair argument is applicable only when you specify
N
as a grouping variable. The data type of
'Classes'
must match that of
N
. For example, if you specify
N
as a cell array of character vectors
containing class labels, you must use a cell array of character vectors
to specify 'Classes'
. The output arguments you
specify contain the value 0
for observations
belonging to excluded classes.
Example: 'Classes',{'Cancer','Control'}
Data Types: double
| cell
Min
— Minimum number of observations
1
(default) | positive integer
Minimum number of observations for each group in the training set,
specified as the comma-separated pair consisting of
'Min'
and a positive integer. Setting a large
value can help to balance the training groups, but causes partial
resubstitution when there are not enough observations.
This name-value pair argument is not applicable for the
'Kfold'
method.
Example: 'Min',3
Data Types: double
Output Arguments
cvIndices
— Cross-validation indices
vector
Cross-validation indices, returned as a vector.
If you are using 'Kfold'
as the cross-validation
method, cvIndices
contains equal (or approximately
equal) proportions of the integers 1 through M
, which
define a partition of the N
observations into
M
disjointed subsets.
For other cross-validation methods, cvIndices
is a
logical vector containing 1s for observations that belong to the training
set and 0s for observations that belong to the test (evaluation) set.
train
— Training set
logical vector
Training set, returned as a logical vector. This argument specifies which observations belong to the training set.
test
— Test set
logical vector
Test set, returned as a logical vector. This argument specifies which observations belong to the test set.
Version History
Introduced before R2006a
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)