Main Content

kfoldPredict

Predict responses for observations in cross-validated regression model

    Description

    yFit = kfoldPredict(CVMdl) returns responses predicted by the cross-validated regression model CVMdl. For every fold, kfoldPredict predicts the responses for validation-fold observations using a model trained on training-fold observations. CVMdl.X and CVMdl.Y contain both sets of observations.

    example

    yFit = kfoldPredict(CVMdl,Name,Value) specifies options using one or more name-value arguments. For example, 'IncludeInteractions',true specifies to include interaction terms in computations for generalized additive models.

    [yFit,ySD,yInt] = kfoldPredict(___) also returns the standard deviations and prediction intervals of the response variable, evaluated at each observation in the predictor data CVMdl.X, using any of the input argument combinations in the previous syntaxes. This syntax applies only to generalized additive models (GAM) for which the IsStandardDeviationFit property of CVMdl is true.

    Examples

    collapse all

    When you create a cross-validated regression model, you can compute the mean squared error (MSE) by using the kfoldLoss object function. Alternatively, you can predict responses for validation-fold observations using kfoldPredict and compute the MSE manually.

    Load the carsmall data set. Specify the predictor data X and the response data Y.

    load carsmall
    X = [Cylinders Displacement Horsepower Weight];
    Y = MPG;

    Train a cross-validated regression tree model. By default, the software implements 10-fold cross-validation.

    rng('default') % For reproducibility
    CVMdl = fitrtree(X,Y,'CrossVal','on');

    Compute the 10-fold cross-validation MSE by using kfoldLoss.

    L = kfoldLoss(CVMdl)
    L = 
    29.4963
    

    Predict the responses yfit by using the cross-validated regression model. Compute the mean squared error between yfit and the true responses CVMdl.Y. The computed MSE matches the loss value returned by kfoldLoss.

    yfit = kfoldPredict(CVMdl);
    mse = mean((yfit - CVMdl.Y).^2)
    mse = 
    29.4963
    

    Input Arguments

    collapse all

    Cross-validated partitioned regression model, specified as a RegressionPartitionedModel, RegressionPartitionedEnsemble, RegressionPartitionedGAM, RegressionPartitionedGP, RegressionPartitionedNeuralNetwork, or RegressionPartitionedSVM object. You can create the object in two ways:

    • Pass a trained regression model listed in the following table to its crossval object function.

    • Train a regression model using a function listed in the following table and specify one of the cross-validation name-value arguments for the function.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

    Example: 'Alpha',0.01,'IncludeInteractions',false specifies the confidence level as 99% and excludes interaction terms from computations for a generalized additive model.

    Significance level for the confidence level of the prediction intervals yInt, specified as a numeric scalar in the range [0,1]. The confidence level of yInt is equal to 100(1 – Alpha)%.

    This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, you can specify this argument only when CVMdl is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl is true.

    Example: 'Alpha',0.01

    Data Types: single | double

    Flag to include interaction terms of the model, specified as true or false. This argument is valid only for a generalized additive model (GAM). That is, you can specify this argument only when CVMdl is RegressionPartitionedGAM.

    The default value is true if the models in CVMdl (CVMdl.Trained) contain interaction terms. The value must be false if the models do not contain interaction terms.

    Data Types: logical

    Since R2023b

    Predicted response value to use for observations with missing predictor values, specified as "median", "mean", or a numeric scalar. This argument is valid only for a Gaussian process regression, neural network, or support vector machine model. That is, you can specify this argument only when CVMdl is a RegressionPartitionedGP, RegressionPartitionedNeuralNetwork, or RegressionPartitionedSVM object.

    ValueDescription
    "median"

    kfoldPredict uses the median of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.

    This value is the default when CVMdl is a RegressionPartitionedGP, RegressionPartitionedNeuralNetwork, or RegressionPartitionedSVM object.

    "mean"kfoldPredict uses the mean of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
    Numeric scalarkfoldPredict uses this value as the predicted response value for observations with missing predictor values.

    Example: "PredictionForMissingValue","mean"

    Example: "PredictionForMissingValue",NaN

    Data Types: single | double | char | string

    Output Arguments

    collapse all

    Predicted responses, returned as an n-by-1 numeric vector, where n is the number of observations. (n is size(CVMdl.X,1) when observations are in rows.) Each entry of yFit corresponds to the predicted response for the corresponding row of CVMdl.X.

    If you use a holdout validation technique to create CVMdl (that is, if CVMdl.KFold is 1), then yFit has NaN values for training-fold observations.

    Standard deviations of the response variable, evaluated at each observation in the predictor data CVMdl.X, returned as a column vector of length n, where n is the number of observations in CVMdl.X. The ith element ySD(i) contains the standard deviation of the ith response for the ith observation CVMdl.X(i,:), estimated using the trained standard deviation model in CVMdl.

    This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, kfoldPredict can return this argument only when CVMdl is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl is true.

    Prediction intervals of the response variable, evaluated at each observation in the predictor data CVMdl.X, returned as an n-by-2 matrix, where n is the number of observations in CVMdl.X. The ith row yInt(i,:) contains the estimated 100(1 – Alpha)% prediction interval of the ith response for the ith observation CVMdl.X(i,:) using ySD(i). The Alpha value is the probability that the prediction interval does not contain the true response value CVMdl.Y(i). The first column of yInt contains the lower limits of the prediction intervals, and the second column contains the upper limits.

    This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, kfoldPredict can return this argument only when CVMdl is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl is true.

    Extended Capabilities

    Version History

    Introduced in R2011a

    expand all