This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

lassoblm

Bayesian linear regression model with lasso regularization

Description

The Bayesian linear regression model object lassoblm specifies that the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing Bayesian lasso regression [1]. For j = 1,…,NumPredictors, the conditional prior distribution of βj|σ2 is the Laplace (double exponential) distribution with a mean of 0 and scale σ2/λ, where λ is the lasso regularization, or shrinkage, parameter. The prior distribution of σ2 is inverse gamma with shape A and scale B.

The data likelihood is t=1Tϕ(yt;xtβ,σ2), where ϕ(yt;xtβ,σ2) is the Gaussian probability density evaluated at yt with mean xtβ and variance σ2. The resulting posterior distribution is not analytically tractable. For details on the posterior distribution, see Analytically Tractable Posteriors.

In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function.

Creation

Syntax

PriorMdl = lassoblm(NumPredictors)
PriorMdl = lassoblm(NumPredictors,Name,Value)

Description

example

PriorMdl = lassoblm(NumPredictors) creates a Bayesian linear regression model object (PriorMdl) composed of NumPredictors predictors and an intercept. The joint prior distribution of (β, σ2) is appropriate for implementing Bayesian lasso regression [1]. PriorMdl is a template defining the prior distributions, and specifying the values of the lasso regularization parameter λ and dimensionality of β.

example

PriorMdl = lassoblm(NumPredictors,Name,Value)use additional options specified by one or more Name,Value pair arguments. Name is a property name, except NumPredictors, and Value is the corresponding value. Name must appear inside quotes. You can specify several Name,Value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

For example, lassoblm(3,'Lambda',0.5) specifies a shrinkage of 0.5 for the three coefficients (not the intercept).

Properties

expand all

You can set property values when you create the model object using name-value pair argument syntax, or after model creation using dot notation. For example, to set the shrinkage for all coefficients, except the intercept, to 0.5, enter

PriorMdl.Lambda = 0.5;

Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer.

NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation.

When specifying NumPredictors, exclude any intercept term for the value.

After creating a model, if you change the value NumPredictors using dot notation, then all these parameters revert to the default values:

  • Variables names (VarNames)

  • The shrinkage parameter (Lambda)

Data Types: double

Indicate whether to include regression model intercept, specified as the comma-separated pair consisting of 'Intercept' and a value in this table.

ValueDescription
falseExclude an intercept from the regression model. Hence, β is a p-dimensional vector, where p is the value of the NumPredictors property.
trueInclude an intercept in the regression model. Hence, β is a (p + 1)-dimensional vector. During estimation, simulation, and forecasting, MATLAB® prepends the predictor data with an appropriately-sized vector of ones.

If you include a column of ones in the predictor data for an intercept term, then set Intercept to false.

Example: 'Intercept',false

Data Types: logical

Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames  must contain NumPredictors elements. VarNames(j) is the name of variable in column j of the predictor data set, which you specify during estimation, simulation, and forecasting.

The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors.

Example: 'VarNames',["UnemploymentRate"; "CPI"]

Data Types: string | cell | char

Lasso regularization parameter for all regression coefficients, specified as a positive numeric scalar or (Intercept + NumPredictors)-by-1 positive numeric vector. Larger values of Lambda cause corresponding coefficients to shrink closer to zero.

Suppose X is a T-by-NumPredictors matrix of predictor data, which you specify during estimation, simulation, or forecasting.

  • If Lambda is a vector:

    • If Intercept is true, Lambda(1) is the shrinkage for the intercept, Lambda(2) is the shrinkage for the coefficient of the first predictor X(:,1), Lambda(3) is the shrinkage for the coefficient of the second predictor X(:,2),…, Lambda(NumPredictors + 1) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors).

    • Otherwise, Lambda(1) is the shrinkage for the coefficient of the first predictor X(:,1),…, Lambda(NumPredictors) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors).

  • If you supply the scalar s for Lamda, then all coefficients of the predictors in X have a shrinkage of s.

    • If Intercept is true, then the intercept has a shrinkage of 0.01, and lassoblm stores [0.01; s*ones(NumPredictors,1)] in Lambda.

    • Otherwise, lassoblm stores s*ones(NumPredictors,1) in Lambda

Example: 'Lambda',6

Data Types: double

Shape parameter of inverse gamma prior on σ2, specified as a numeric scalar.

A must be at least -(Intercept + NumPredictors)/2.

With B held fixed, as A increases, the inverse gamma distribution becomes taller and more concentrated. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation.

For the functional form of the inverse gamma distribution, see Analytically Tractable Posteriors.

Example: 'A',0.1

Data Types: double

Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf.

With A held fixed, as B increases, the inverse gamma distribution becomes taller and more concentrated. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation.

Example: 'B',5

Data Types: double

Object Functions

estimatePerform predictor variable selection for Bayesian linear regression models
simulateSimulate regression coefficients and disturbance variance of Bayesian linear regression model
forecastForecast responses of Bayesian linear regression model
plotVisualize prior and posterior densities of Bayesian linear regression model parameters
summarizeDistribution summary statistics of Bayesian linear regression model for predictor variable selection

Examples

collapse all

Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

For all , is a series of independent Gaussian disturbances with a mean of 0 and variance .

Assume that the prior distributions are:

  • For j = 0,...,3, has a Laplace distribution with a mean of 0 and a scale of , where is the shrinkage parameter. The coefficients are conditionally independent.

  • . and are the shape and scale, respectively, of an inverse gamma distribution.

Create a prior model for Bayesian linear regression. Specify the number of predictors, p.

p = 3;
Mdl = lassoblm(p);

Mdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, lassoblm displays a summary of the prior distributions.

Alternatively, you can create a prior model for Bayesian lasso regression by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'lasso'.

MdlBayesLM = bayeslm(p,'ModelType','lasso')
MdlBayesLM = 
  lassoblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}
           Lambda: [4x1 double]
                A: 3
                B: 1

 
           |  Mean     Std           CI95         Positive   Distribution  
---------------------------------------------------------------------------
 Intercept |  0       100    [-200.000, 200.000]    0.500   Scale mixture  
 Beta(1)   |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 Beta(2)   |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 Beta(3)   |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 Sigma2    | 0.5000  0.5000    [ 0.138,  1.616]     1.000   IG(3.00,    1) 
 

Mdl and MdlBayesLM are equivalent model objects.

You can set writable property values of created models using dot notation. Specify the regression coefficient names to the corresponding variable names.

Mdl.VarNames = ["IPI" "E" "WR"]
Mdl = 
  lassoblm with properties:

    NumPredictors: 3
        Intercept: 1
         VarNames: {4x1 cell}
           Lambda: [4x1 double]
                A: 3
                B: 1

 
           |  Mean     Std           CI95         Positive   Distribution  
---------------------------------------------------------------------------
 Intercept |  0       100    [-200.000, 200.000]    0.500   Scale mixture  
 IPI       |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 E         |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 WR        |  0       1        [-2.000,  2.000]     0.500   Scale mixture  
 Sigma2    | 0.5000  0.5000    [ 0.138,  1.616]     1.000   IG(3.00,    1) 
 

MATLAB® associates the variable names to the regression coefficients in displays.

This example is based on Create Prior Model for Bayesian Lasso Regression.

Create a prior model for performing Bayesian lasso regression. Specify the number of predictors, p, and the names of the regression coefficients.

p = 3;
PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]);
shrinkage = PriorMdl.Lambda
shrinkage = 4×1

    0.0100
    1.0000
    1.0000
    1.0000

PriorMdl stores shrinkage value for all predictors in its Lambda property. shrinkage(1) is the shrinkage for the intercept, and the elements of shrinkage(2:end) correspond to the coefficients of the predictors in Mdl.VarNames. The default shrinkage for the intercept is 0.01, and the default is 1 for all other coefficients.

Load the Nelson-Plosser data set. Create variables for the response and predictor series. Because lasso is sensitive to variable scales, standardize all variables.

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

X = (X - nanmean(X))./nanstd(X);
y = (y - nanmean(y))/nanstd(y);

Although this example standardizes variables, you can specify different shrinkage values for each coefficient instead by setting the Lambda property of PriorMdl to a numeric vector of shrinkage values.

Implement Bayesian lasso regression by estimating the marginal posterior distributions of and . Because Bayesian lasso regression uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results.

rng(1);
PosteriorMdl = estimate(PriorMdl,X,y);
Method: lasso MCMC sampling with 10000 draws
Number of observations: 62
Number of predictors:   4
 
           |   Mean     Std         CI95        Positive  Distribution 
-----------------------------------------------------------------------
 Intercept | -0.4490  0.0527  [-0.548, -0.344]    0.000     Empirical  
 IPI       |  0.6679  0.1063  [ 0.456,  0.878]    1.000     Empirical  
 E         |  0.1114  0.1223  [-0.110,  0.365]    0.827     Empirical  
 WR        |  0.2215  0.1367  [-0.024,  0.494]    0.956     Empirical  
 Sigma2    |  0.0343  0.0062  [ 0.024,  0.048]    1.000     Empirical  
 

PosteriorMdl is an empiricalblm model object storing object storing draws from the posterior distributions of and given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include:

  • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [-0.110, 0.365] is 0.95.

  • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.

By default, estimate draws and discards a burn-in sample of size 5000. However, it is good practice to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution, that is, the properties BetaDraws and Sigma2Draws, using dot notation.

figure;
for j = 1:(p + 1)
    subplot(2,2,j);
    plot(PosteriorMdl.BetaDraws(j,:));
    title(sprintf('%s',PosteriorMdl.VarNames{j}));
end

figure;
plot(PosteriorMdl.Sigma2Draws);
title('Sigma2');

The trace plots indicate that the draws seem to be mixing well, that is, there is no detectable transience or serial correlation, and the draws do not jump between states.

Plot the posterior distributions of the coefficients and disturbance variance.

figure;
plot(PosteriorMdl)

E and WR might not be important predictors because 0 is within the region of high density in their posterior distributions.

It is common to standardize variables when implemeting lasso regression. However, if you want to preserve the interpretation of the coefficients, but the variables have different scales, then you can perform differential shrinkage by specifying a different shrinkage for each coefficient instead. This example is based on Perform Variable Selection Using Default Lasso Shrinkage.

Create a prior model for performing Bayesian lasso regression. Specify the number of predictors, p, and the names of the regression coefficients.

p = 3;
PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]);

Load the Nelson-Plosser data set. Create variables for the response and predictor series. Determine whether the variables have exponential trends by plotting each in separate figure.

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

figure;
plot(dates,y)
title('GNPR')

for j = 1:3
    figure;
    plot(dates,X(:,j));
    title(PriorMdl.VarNames(j + 1));
end

The variables GNPR, IPI, and WR appear to have an exponential trend.

Remove the exponential trend from the variables GNPR, IPI, and WR.

y = log(y);
X(:,[1 3]) = log(X(:,[1 3]));

All predictor variables have different scales (for more details, enter Description at the command line). Display the mean of each predictor. Because the varibles contain leading missing values, use nanmean.

predmeans = nanmean(X)
predmeans = 1×3
104 ×

    0.0002    4.7700    0.0004

The values of the second predictor are much greater than those of the other two predictors and the response. Hence, it's regression coefficient can appear close zero.

Using dot notation, attribute a very low shrinkage to the intercept, a shrinkage of 0.1 to the first and third predictor, and a shrinkage of 1000 to the second predictor.

PriorMdl.Lambda = [1e-5 0.1 1e4 0.1];

Implement Bayesian lasso regression by estimating the marginal posterior distributions of and . Because Bayesian lasso regression uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results.

rng(1);
PosteriorMdl = estimate(PriorMdl,X,y);
Method: lasso MCMC sampling with 10000 draws
Number of observations: 62
Number of predictors:   4
 
           |  Mean     Std         CI95        Positive  Distribution 
----------------------------------------------------------------------
 Intercept | 2.0281  0.6839  [ 0.679,  3.323]    0.999     Empirical  
 IPI       | 0.3534  0.2497  [-0.139,  0.839]    0.923     Empirical  
 E         | 0.0000  0.0000  [-0.000,  0.000]    0.762     Empirical  
 WR        | 0.5250  0.3482  [-0.126,  1.209]    0.937     Empirical  
 Sigma2    | 0.0315  0.0055  [ 0.023,  0.044]    1.000     Empirical  
 

This example is based on Create Prior Model for Bayesian Lasso Regression.

Perform Bayesian lasso regression by:

  1. Creating a Bayesian lasso prior model for the regression coefficients and disturbance variance. Use the default shrinkage.

  2. Holding out the the last 10 periods of data from estimation.

  3. Estimating the marginal posterior distributions.

p = 3;
PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]);

load Data_NelsonPlosser
fhs = 10; % Forecast horizon size
X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)};
y = DataTable{1:(end - fhs),'GNPR'};
XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data
yFT = DataTable{(end - fhs + 1):end,'GNPR'};                  % True future responses

rng(1); % For reproducibility
PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);

Forecast responses using the posterior predictive distribution and using the future predictor data XF. Plot the true values of the response and the forecasted values.

yF = forecast(PosteriorMdl,XF);

figure;
plot(dates,DataTable.GNPR);
hold on
plot(dates((end - fhs + 1):end),yF)
h = gca;
hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],...
    h.YLim([1,1,2,2]),[0.8 0.8 0.8]);
uistack(hp,'bottom');
legend('True GNPR','Forecasted GNPR','Forecast Horizon','Location','NW')
title('Real Gross National Product: 1909 - 1970');
ylabel('rGNP');
xlabel('Year');
hold off

yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data.

Estimate the forecast root mean squared error (RMSE).

frmse = sqrt(mean((yF - yFT).^2))
frmse = 25.4831

Forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best performing model of the ones being compared.

When you perform Bayesian lasso regression, it is best practice search for appropriate shrinkage values. One way to find the appropriate shrinkages is to estimate forecast RMSE over a grid of shrinkage values, and choose the shrinkages that minimize forecast RMSE.

More About

expand all

Tips

  • Lambda is a tuning parameter. You should perform Bayesian lasso regression of a grid of shrinkage values, and choose the model that best balances a fit criterion and model complexity.

  • For estimation, simulation, and forecasting, MATLAB does not standardize predictor data. If the variables in the predictor data have different scales, then specify a shrinkage parameter for each predictor by supplying a numeric vector for Lambda.

Alternative Functionality

The bayeslm function can create any supported prior model object for Bayesian linear regression.

References

[1] Park, T. and G. Casella. "The Bayesian Lasso." JASA. Vol. 103, No. 482, 2008, pp. 681–686.

Introduced in R2018b