Main Content

crossval

Cross-validate regression ensemble model

Description

example

cvens = crossval(ens) returns a cross-validated (partitioned) regression ensemble model (cvens) from a trained regression ensemble model (ens). By default, crossval uses 10-fold cross-validation on the training data to create cvens, a RegressionPartitionedEnsemble model.

cvens = crossval(ens,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the cross-validation partition, the fraction of data for holdout validation, and the number of folds to use.

Input Arguments

expand all

Regression ensemble model, specified as a RegressionEnsemble model object trained with fitrensemble.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: crossval(ens,KFold=10,NPrint=5) specifies to use 10 folds in a cross-validated model, and to display a message to the command line every time crossval finishes training 5 folds.

Cross-validation partition, specified as a cvpartition object that specifies the type of cross-validation and the indexing for the training and validation sets.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500 observations by using cvp = cvpartition(500,KFold=5). Then, you can specify the cross-validation partition by setting CVPartition=cvp.

Fraction of the data used for holdout validation, specified as a scalar value in the range [0,1]. If you specify Holdout=p, then the software completes these steps:

  1. Randomly select and reserve p*100% of the data as validation data, and train the model using the rest of the data.

  2. Store the compact trained model in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Holdout=0.1

Data Types: double | single

Number of folds to use in the cross-validated model, specified as a positive integer value greater than 1. If you specify KFold=k, then the software completes these steps:

  1. Randomly partition the data into k sets.

  2. For each set, reserve the set as validation data, and train the model using the other k – 1 sets.

  3. Store the k compact trained models in a k-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: KFold=5

Data Types: single | double

Leave-one-out cross-validation flag, specified as "on" or "off". If you specify Leaveout="on", then for each of the n observations (where n is the number of observations, excluding missing observations, specified in the NumObservations property of the model), the software completes these steps:

  1. Reserve the one observation as validation data, and train the model using the other n – 1 observations.

  2. Store the n compact trained models in an n-by-1 cell vector in the Trained property of the cross-validated model.

To create a cross-validated model, you can specify only one of these four name-value arguments: CVPartition, Holdout, KFold, or Leaveout.

Example: Leaveout="on"

Data Types: char | string

Printout frequency, specified as a positive integer or "off".

To track the number of folds trained by the software so far, specify a positive integer m. The software displays a message to the command line every time it finishes training m folds.

If you specify "off", the software does not display a message when it completes training folds.

Example: NPrint=5

Data Types: single | double | char | string

Examples

expand all

Create a cross-validated regression model for the carsmall data, and evaluate its quality using the kfoldLoss method.

Load the carsmall data set and select acceleration, displacement, horsepower, and vehicle weight as predictors.

load carsmall;
X = [Acceleration Displacement Horsepower Weight];

Train a regression ensemble.

rens = fitrensemble(X,MPG);

Create a cross-validated ensemble from rens and find the cross-validation loss.

rng(10,"twister") % For reproducibility
cvens = crossval(rens);
L = kfoldLoss(cvens)
L = 30.3471

Alternatives

You can create a cross-validation ensemble directly from the data, instead of creating an ensemble followed by a cross-validation ensemble. To do so, include one of these five options in fitrensemble: CrossVal, CVPartition, Holdout, Leaveout, or KFold.

Extended Capabilities

Version History

Introduced in R2011a