summarize

Distribution summary statistics of Bayesian linear regression model for predictor variable selection

Description

To obtain a summary of a standard Bayesian linear regression model, see summarize.

example

summarize(Mdl) displays a tabular summary of the random regression coefficients and disturbance variance of the Bayesian linear regression model Mdl at the command line. For each parameter, the summary includes the:

• Standard deviation (square root of the variance)

• 95% equitailed credible intervals

• Probability that the parameter is greater than 0

• Description of the distributions, if known

• Marginal probability that a coefficient should be included in the model, for stochastic search variable selection (SSVS) predictor-variable-selection models

example

SummaryStatistics = summarize(Mdl) returns a structure array with a table summarizing the regression coefficients and disturbance variance, and a description of the joint distribution of the parameters.

Examples

collapse all

Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

${\text{GNPR}}_{t}={\beta }_{0}+{\beta }_{1}{\text{IPI}}_{t}+{\beta }_{2}{\text{E}}_{t}+{\beta }_{3}{\text{WR}}_{t}+{\epsilon }_{t}.$

For all $t$, ${\epsilon }_{t}$ is a series of independent Gaussian disturbances with a mean of 0 and variance ${\sigma }^{2}$.

Assume these prior distributions for $\mathit{k}$ = 0,...,3:

• ${\beta }_{k}|{\sigma }^{2},{\gamma }_{k}={\gamma }_{k}\sigma \sqrt{{V}_{k1}}{Z}_{1}+\left(1-{\gamma }_{k}\right)\sigma \sqrt{{V}_{k2}}{Z}_{2}$, where ${\mathit{Z}}_{1}$ and ${\mathit{Z}}_{2}\text{\hspace{0.17em}}$are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.

• ${\sigma }^{2}\sim IG\left(A,B\right)$. $A$ and $B$ are the shape and scale, respectively, of an inverse gamma distribution.

• ${\gamma }_{\mathit{k}}\in \left\{0,1\right\}$and it represents the random variable-inclusion regime variable with a discrete uniform distribution.

Create a prior model for SSVS. Specify the number of predictors p.

p = 3;
VarNames = ["IPI" "E" "WR"];
PriorMdl = bayeslm(p,'ModelType','mixconjugateblm','VarNames',VarNames);

PriorMdl is a mixconjugateblm Bayesian linear regression model object for SSVS predictor selection representing the prior distribution of the regression coefficients and disturbance variance.

Summarize the prior distribution.

summarize(PriorMdl)

|  Mean     Std         CI95        Positive      Distribution
------------------------------------------------------------------------------
Intercept |  0      1.5890  [-3.547,  3.547]    0.500   Mixture distribution
IPI       |  0      1.5890  [-3.547,  3.547]    0.500   Mixture distribution
E         |  0      1.5890  [-3.547,  3.547]    0.500   Mixture distribution
WR        |  0      1.5890  [-3.547,  3.547]    0.500   Mixture distribution
Sigma2    | 0.5000  0.5000  [ 0.138,  1.616]    1.000   IG(3.00,    1)

The function displays a table of summary statistics and other information about the prior distribution at the command line.

Load the Nelson-Plosser data set, and create variables for the predictor and response data.

X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable.GNPR;

Estimate the posterior distributions. Suppress the estimation display.

PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);

PosteriorMdl is an empiricalblm model object that contains the posterior distributions of $\beta$ and ${\sigma }^{2}$.

Obtain summary statistics from the posterior distribution.

summary = summarize(PosteriorMdl);

summary is a structure array containing two fields: MarginalDistributions and JointDistribution.

Display the marginal distribution summary by using dot notation.

summary.MarginalDistributions
ans=5×5 table
Mean          Std                 CI95              Positive    Distribution
__________    _________    ________________________    ________    _____________

Intercept        -18.66       10.348       -37.006        0.8406     0.0412     {'Empirical'}
IPI              4.4555      0.15287        4.1561        4.7561          1     {'Empirical'}
E            0.00096765    0.0003759    0.00021479     0.0016644     0.9968     {'Empirical'}
WR               2.4739      0.36337        1.7607        3.1882          1     {'Empirical'}
Sigma2           47.773       8.6863        33.574        67.585          1     {'Empirical'}

The MarginalDistributions field is a table of summary statistics and other information about the posterior distribution.

Input Arguments

collapse all

Bayesian linear regression model for predictor variable selection, specified as a model object in this table.

Model ObjectDescription
mixconjugateblmDependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
mixsemiconjugateblmIndependent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
lassoblmBayesian lasso regression model returned by bayeslm

Output Arguments

collapse all

Parameter distribution summary, returned as a structure array containing the information in this table.

Structure FieldDescription
MarginalDistributions

Table containing a summary of the parameter distributions. Rows correspond to parameters. Columns correspond to the:

• Estimated posterior mean (Mean)

• Standard deviation (Std)

• 95% equitailed credible interval (CI95)

• Posterior probability that the parameter is greater than 0 (Positive)

• Description of the marginal or conditional posterior distribution of the parameter (Distribution)

Row names are the names in Mdl.VarNames. The name of the last row is Sigma2.

JointDistribution

A string scalar that describes the distributions of the regression coefficients (Beta) and the disturbance variance (Sigma2) when known.

For distribution descriptions:

• N(Mu,V) denotes the normal distribution with mean Mu and variance matrix V. This distribution can be multivariate.

• IG(A,B) denotes the inverse gamma distribution with shape A and scale B.

• Mixture distribution denotes a Student’s t mixture distribution.

Note

If Mdl is a lassoblm model and Mdl.Probability is a function handle representing the regime probability distribution, then summarize cannot estimate prior distribution statistics for the coefficients. Therefore, entries corresponding to coefficient statistics are NaN values.

collapse all

Bayesian Linear Regression Model

A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables.

For times t = 1,...,T:

• yt is the observed response.

• xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t.

• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt.

• εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is

$\ell \left(\beta ,{\sigma }^{2}|y,x\right)=\prod _{t=1}^{T}\varphi \left({y}_{t};{x}_{t}\beta ,{\sigma }^{2}\right).$

ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;.

Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.

Algorithms

• If Mdl is a lassoblm model object and Mdl.Probability is a numeric vector, then the 95% credible intervals on the regression coefficients are Mean + [–2 2]*Std, where Mean and Std are variables in the summary table.

• If Mdl is a mixconjugateblm or mixsemiconjugateblm model object, then the 95% credible intervals on the regression coefficients are estimated from the mixture cdf. If the estimation fails, then summarize returns NaN values instead.