Regression loss for generalized additive model (GAM)
returns the regression loss (
L = loss(
L), a scalar representing how well the
generalized additive model
Mdl predicts the predictor data in
Tbl compared to the true response values in
The interpretation of
L depends on the loss function
'LossFun') and weighting scheme (
general, better models yield smaller loss values. The default
'mse' (mean squared error).
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in previous syntaxes. For example, you can specify the loss function
and the observation weights.
L = loss(___,
Determine Test Sample Regression Loss
Determine the test sample regression loss (mean squared error) of a generalized additive model. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.
patients data set.
Create a table that contains the predictor variables (
SelfAssessedHealthStatus) and the response variable (
tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);
Randomly partition observations into a training set and a test set. Specify a 10% holdout sample for testing.
rng('default') % For reproducibility cv = cvpartition(size(tbl,1),'HoldOut',0.10);
Extract the training and test indices.
trainInds = training(cv); testInds = test(cv);
Train a univariate GAM that contains the linear terms for the predictors in
Mdl = fitrgam(tbl(trainInds,:),"Systolic");
Determine how well the algorithm generalizes by estimating the test sample regression loss. By default, the
loss function of
RegressionGAM estimates the mean squared error.
L = loss(Mdl,tbl(testInds,:))
L = 35.7540
Compare GAMs by Examining Regression Loss
Train a generalized additive model (GAM) that contains both linear and interaction terms for predictors, and estimate the regression loss (mean squared error, MSE) with and without interaction terms for the training data and test data. Specify whether to include interaction terms when estimating the regression loss.
carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
Weight as the predictor variables (
MPG as the response variable (
X = [Acceleration,Displacement,Horsepower,Weight]; Y = MPG;
Partition the data set into two sets: one containing training data, and the other containing new, unobserved test data. Reserve 10 observations for the new test data set.
rng('default') % For reproducibility n = size(X,1); newInds = randsample(n,10); inds = ~ismember(1:n,newInds); XNew = X(newInds,:); YNew = Y(newInds);
Train a generalized additive model that contains all the available linear and interaction terms in
Mdl = fitrgam(X(inds,:),Y(inds),'Interactions','all');
Mdl is a
RegressionGAM model object.
Compute the resubstitution MSEs (that is, the in-sample MSEs) both with and without interaction terms in
Mdl. To exclude interaction terms, specify
resubl = resubLoss(Mdl)
resubl = 0.0292
resubl_nointeraction = resubLoss(Mdl,'IncludeInteractions',false)
resubl_nointeraction = 4.7330
Compute the regression MSEs both with and without interaction terms for the test data set. Use a memory-efficient model object for the computation.
CMdl = compact(Mdl);
CMdl is a
CompactRegressionGAM model object.
l = loss(CMdl,XNew,YNew)
l = 12.8604
l_nointeraction = loss(CMdl,XNew,YNew,'IncludeInteractions',false)
l_nointeraction = 15.6741
Including interaction terms achieves a smaller error for the training data set and test data set.
Mdl — Generalized additive model
RegressionGAM model object |
CompactRegressionGAM model object
Generalized additive model, specified as a
CompactRegressionGAM model object.
Tbl — Sample data
Sample data, specified as a table. Each row of
to one observation, and each column corresponds to one predictor variable. Multicolumn
variables and cell arrays other than cell arrays of character vectors are not
Tbl must contain all of the predictors used to train
Tbl can contain a column
for the response variable and a column for the observation weights.
The response variable must be a numeric vector. If the response variable in
Tblhas the same name as the response variable used to train
Mdl, then you do not need to specify
The weight values must be a numeric vector. You must specify the observation weights in
If you trained
Mdl using sample data contained in a table, then
the input data for
loss must also be in a table.
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
'IncludeInteractions',false,'Weights',w specifies to exclude
interaction terms from the model and to use the observation weights
Weighted Mean Squared Error
Introduced in R2021a