Simulate 10000 observations from this model
is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.
Create a set of 15 logarithmically-spaced regularization strengths from through .
Hold out 30% of the data for testing. Identify the test-sample indices.
Train a linear regression model using lasso penalties with the strengths in Lambda
. Specify the regularization strengths, optimizing the objective function using SpaRSA, and the data partition. To increase execution speed, transpose the predictor data and specify that the observations are in columns.
Mdl1
is a RegressionLinear
model. Because Lambda
is a 15-dimensional vector of regularization strengths, you can think of Mdl1
as 15 trained models, one for each regularization strength.
Estimate the test-sample mean squared error for each regularized model.
Higher values of Lambda
lead to predictor variable sparsity, which is a good quality of a regression model. Retrain the model using the entire data set and all options used previously, except the data-partition specification. Determine the number of nonzero coefficients per model.
In the same figure, plot the MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.
Select the index or indices of Lambda
that balance minimal classification error and predictor-variable sparsity (for example, Lambda(11)
).
MdlFinal
is a trained RegressionLinear
model object that uses Lambda(11)
as a regularization strength.