# simsmooth

Simulation smoother of Bayesian vector autoregression (VAR) model

## Syntax

``[CoeffDraws,SigmaDraws] = simsmooth(PriorMdl,Y)``
``[CoeffDraws,SigmaDraws] = simsmooth(PriorMdl,Y,Name,Value)``
``````[CoeffDraws,SigmaDraws,NaNDraws] = simsmooth(___)``````
``````[CoeffDraws,SigmaDraws,NaNDraws,YMean,YStd] = simsmooth(___)``````

## Description

`simsmooth` is well suited for advanced applications, such as out-of-sample conditional forecasting from the posterior predictive distribution of a Bayesian VAR(p) model, VARX(p) model forecasting, missing value imputation, and parameter estimation in the presence of missing values. Also, `simsmooth` enables you to adjust the Gibbs sampler for out-of-sample forecasting. To estimate out-of-sample forecasts from a Bayesian VAR(p) model, see `forecast`.

example

````[CoeffDraws,SigmaDraws] = simsmooth(PriorMdl,Y)` returns 1000 random draws of coefficient vectors λ `Coeff` and innovations covariance matrices Σ `Sigma`, drawn from the posterior distribution formed by combining the prior Bayesian VAR(p) model `PriorMdl` and the response data `Y`.The sampling procedure includes a Bayesian data augmentation step that uses the Kalman filter (see Algorithms). During sampling, `simsmooth` replaces non-presample missing values in `Y`, indicated by `NaN` values, with imputed values.```

example

``[CoeffDraws,SigmaDraws] = simsmooth(PriorMdl,Y,Name,Value)` uses additional options specified by one or more name-value pair arguments. For example, you can set the number of random draws from the distribution or specify the presample response data.`

example

``````[CoeffDraws,SigmaDraws,NaNDraws] = simsmooth(___)``` also returns the imputed response values of each draw `NaNDraws` using any of the input argument combinations in the previous syntaxes.```

example

``````[CoeffDraws,SigmaDraws,NaNDraws,YMean,YStd] = simsmooth(___)``` also returns the mean `YMean` and standard deviation `YStd` of the posterior predictive distribution of the augmented data.```

## Examples

collapse all

When `simulate` estimates the posterior distribution from which to draw parameters, it removes all rows in the data that contain at least one missing value (`NaN`). However, `simsmooth` uses Bayesian data augmentation to impute non-presample missing values during posterior estimation.

Consider the 3-D VAR(4) model for the US inflation (`INFL`), unemployment (`UNRATE`), and federal funds (`FEDFUNDS`) rates.

`$\left[\begin{array}{l}{\text{INFL}}_{t}\\ {\text{UNRATE}}_{t}\\ {\text{FEDFUNDS}}_{t}\end{array}\right]=c+\sum _{j=1}^{4}{\Phi }_{j}\left[\begin{array}{l}{\text{INFL}}_{t-j}\\ {\text{UNRATE}}_{t-j}\\ {\text{FEDFUNDS}}_{t-j}\end{array}\right]+\left[\begin{array}{c}{\epsilon }_{1,t}\\ {\epsilon }_{2,t}\\ {\epsilon }_{3,t}\end{array}\right].$`

For all $t$, ${\epsilon }_{t}$ is a series of independent 3-D normal innovations with a mean of 0 and covariance $\Sigma$. Assume a weak conjugate prior distribution for the parameters $\left({\left[{\Phi }_{1},...,{\Phi }_{4},\mathit{c}\right]}^{\prime },\Sigma \right)$.

Load and Preprocess Data

Load the US macroeconomic data set. Compute the inflation rate, and stabilize the unemployment and federal funds rates.

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);```

Several series have leading `NaN` values because their measurements were not available at the same time as other measurements. Because leading `NaN` values can interfere with presample specification, remove all rows containing at least one leading missing value.

`rmldDataTimeTable = rmmissing(DataTimeTable(:,seriesnames));`

Create Prior Model

Create a weak conjugate prior model by specifying large coefficient prior variances. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames); numcoeffseqn = size(PriorMdl.V,1); PriorMdl.V = 1e4*eye(numcoeffseqn);```

Randomly Place Missing Values in Data

To illustrate simulation in the presence of missing values, randomly place missing values in the data after the presample period.

```rng(1) % For reproducibility T = size(rmldDataTimeTable,1); idxpre = 1:PriorMdl.P; % Presample period idxest = (PriorMdl.P + 1):T; % Estimation period nmissing = 20; % Simulate at most nmissing missing values sidx = [randsample(idxest,nmissing,true); randsample(1:numseries,nmissing,true)]; lsidx = sub2ind([T,numseries],sidx(1,:),sidx(2,:)); MissData = table2array(rmldDataTimeTable); MissData(lsidx) = NaN; MissDataTimeTable = rmldDataTimeTable; MissDataTimeTable{:,:} = MissData;```

Simulate Parameters from Posterior

Draw 1000 parameter sets from the posterior distribution by calling `simsmooth`, which estimates the posterior distribution of the parameters, and then forms the posterior predictive distribution.

`[Coeff,Sigma] = simsmooth(PriorMdl,MissDataTimeTable.Variables);`

`Coeff` is a 39-by-1000 matrix of randomly drawn coefficient vectors from the posterior. `Sigma` is a 3-by-3-by-1000 array of randomly drawn innovations covariance matrices.

By default, `simsmooth` initializes the VAR model by using the first four observations in the data.

To associate rows of `Coeff` to coefficients, obtain a summary of the prior distribution by using `summarize`. In a table, display the first set of randomly drawn coefficients with corresponding names.

```Summary = summarize(PriorMdl,'off'); table(Coeff(:,1),'RowNames',Summary.CoeffMap)```
```ans=39×1 table Var1 __________ AR{1}(1,1) 0.22109 AR{1}(1,2) -0.24034 AR{1}(1,3) 0.093315 AR{2}(1,1) 0.18329 AR{2}(1,2) -0.23178 AR{2}(1,3) -0.026301 AR{3}(1,1) 0.39991 AR{3}(1,2) 0.41141 AR{3}(1,3) 0.054702 AR{4}(1,1) 0.024944 AR{4}(1,2) -0.37372 AR{4}(1,3) -0.0095642 Constant(1) 0.21499 AR{1}(2,1) -0.073776 AR{1}(2,2) 0.36086 AR{1}(2,3) 0.071088 ⋮ ```

Alternatively, you can create an empirical model from the draws, and use `summarize` to display the model by specifying any display option.

Display a summary of the posterior draws as an equation.

```EmpMdl = empiricalbvarm(numseries,numlags,'SeriesNames',seriesnames,... 'CoeffDraws',Coeff,'SigmaDraws',Sigma); summarize(EmpMdl,'equation')```
``` VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) DUNRATE(-3) DFEDFUNDS(-3) INFL(-4) DUNRATE(-4) DFEDFUNDS(-4) Constant ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ INFL | 0.1447 -0.3685 0.1013 0.2974 -0.0959 0.0360 0.4115 0.2244 0.0474 0.0265 -0.2321 0.0030 0.1095 | (0.0744) (0.1314) (0.0370) (0.0833) (0.1509) (0.0398) (0.0833) (0.1440) (0.0403) (0.0879) (0.1301) (0.0370) (0.0744) DUNRATE | -0.0187 0.4445 0.0314 0.0836 0.2372 0.0489 -0.0407 -0.0548 -0.0064 0.0483 -0.1753 0.0027 -0.0597 | (0.0447) (0.0808) (0.0234) (0.0514) (0.0863) (0.0230) (0.0507) (0.0906) (0.0243) (0.0514) (0.0779) (0.0225) (0.0466) DFEDFUNDS | -0.2046 -1.1927 -0.2524 0.2864 -0.2282 -0.2657 0.2709 -0.6231 0.0289 -0.0404 0.1043 -0.1236 -0.2952 | (0.1530) (0.2931) (0.0816) (0.1832) (0.3123) (0.0857) (0.1736) (0.3105) (0.0900) (0.1866) (0.2880) (0.0758) (0.1684) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------- INFL | 0.2842 -0.0098 0.1346 | (0.0286) (0.0122) (0.0464) DUNRATE | -0.0098 0.1062 -0.1496 | (0.0122) (0.0106) (0.0296) DFEDFUNDS | 0.1346 -0.1496 1.3187 | (0.0464) (0.0296) (0.1422) ```

Consider the 3-D VAR(4) model of Simulate Parameters from Posterior Distribution in Presence of Missing Values.

Load the US macroeconomic data set. Compute the inflation rate, and stabilize the unemployment and federal funds rates.

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);```

Remove all rows that contain leading missing values.

`rmldDataTimeTable = rmmissing(DataTimeTable(:,seriesnames));`

Create a weak conjugate prior model. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames); numcoeffseqn = size(PriorMdl.V,1); PriorMdl.V = 1e4*eye(numcoeffseqn);```

Randomly place missing values in the data after the presample period.

```rng(1) % For reproducibility T = size(rmldDataTimeTable,1); idxpre = 1:PriorMdl.P; % Presample period idxest = (PriorMdl.P + 1):T; % Estimation period nmissing = 20; % Simulate at most nmissing missing values sidx = [randsample(idxest,nmissing,true); randsample(1:numseries,nmissing,true)]; lsidx = sub2ind([T,numseries],sidx(1,:),sidx(2,:)); MissData = table2array(rmldDataTimeTable); MissData(lsidx) = NaN; MissDataTimeTable = rmldDataTimeTable; MissDataTimeTable{:,:} = MissData;```

Draw 1000 parameter sets from the posterior distribution by calling `simsmooth`. Return the values that the simulation smoother imputes for the missing observations.

`[~,~,NaNDraws] = simsmooth(PriorMdl,MissDataTimeTable.Variables);`

`NaNDraws` is a 19-by-1000 matrix of randomly drawn response vectors from the posterior predictive distribution. Elements correspond to the missing values in the data ordered by a column-wise search. For example, `NaNDraws(3,1)` is the first randomly drawn imputed response of the third missing value in the data. Find the corresponding value in the data.

```[idxi,idxj] = find(ismissing(MissDataTimeTable),3); responsename = seriesnames(idxj(end))```
```responsename = "INFL" ```
`observationtime = MissDataTimeTable.Time(idxi(end))`
```observationtime = datetime Q3-65 ```

Plot the empirical distribution of the imputed values of the inflation rate during Q3-65.

```histogram(NaNDraws(3,:)) title('Q3-65 Inflation Rate Empirical Distribution')```

Consider the 3-D VAR(4) model of Simulate Parameters from Posterior Distribution in Presence of Missing Values.

Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values (the data includes leading missing values only).

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable); ```

Create a weak conjugate prior model. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames); numcoeffseqn = size(PriorMdl.V,1); PriorMdl.V = 1e4*eye(numcoeffseqn);```

Simulate 5000 coefficients and innovations covariance matrices from the posterior distribution. Specify a burn-in period of 1000 and a thinning factor of 5.

```rng(1); % For reproducibility [Coeff,Sigma] = simsmooth(PriorMdl,rmDataTimeTable{:,seriesnames},... 'NumDraws',5000,'BurnIn',1000,'Thin',5);```

`Coeff` is a 39-by-5000 matrix of coefficients, and `Sigma` is a 3-by-3-by-5000 array of innovations covariance matrices. Both `Coeff` and `Sigma` are randomly drawn from the posterior distribution.

Consider the 3-D VAR(4) model of Simulate Parameters from Posterior Distribution in Presence of Missing Values. In this case, assume that the prior model is semiconjugate.

Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values (the data includes leading missing values only).

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable); ```

Create a semiconjugate prior model. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'ModelType','semiconjugate',... 'SeriesNames',seriesnames);```

To obtain starting values for the coefficients, consider the coefficients from a VAR model fitted to the first 30 observations.

Define index sets that partition the data into four sets:

• Length p = 4 initialization period for the dynamic component of the model

• ${\mathit{n}}_{0}$ = 30 observations to estimate the VAR for the coefficient initial values

• Length p = 4 initialization period for the dynamic component of the Bayesian VAR model

• Remaining ${\mathit{n}}_{1}$ observations to estimate the posterior

```T = size(rmDataTimeTable); n0 = 30; idxpre0 = 1:PriorMdl.P; idxest0 = (idxpre0(end) + 1):(idxpre0(end) + 1 + n0); idxpre1 = (idxest0(end) + 1 - PriorMdl.P):idxest0(end); idxest1 = (idxest0(end) + 1):T; n1 = numel(idxest1);```

Obtain initial coefficient values by fitting a VAR model to the first 34 observations. Use the first four observations as a presample.

```Mdl0 = varm(PriorMdl.NumSeries,PriorMdl.P); EstMdl0 = estimate(Mdl0,rmDataTimeTable{idxest0,seriesnames},'Y0',rmDataTimeTable{idxpre0,seriesnames});```

`estimate` returns a model object containing the AR coefficient estimates in matrix form and the constants in a vector. The Bayesian VAR functions require initial coefficient values in a vector. Format the initial coefficient values.

```Coeff0 = [EstMdl0.AR{:} EstMdl0.Constant].'; Coeff0 = Coeff0(:);```

Simulate 1000 coefficients and innovations covariance matrices from the posterior distribution. Specify the remaining observations from which to estimate the posterior. Initialize the VAR model by specifying the last four observations in the previous estimation sample as a presample, and specify the initial coefficient vector to initialize the posterior sample.

```rng(1); % For reproducibility [Coeff,Sigma] = simsmooth(PriorMdl,rmDataTimeTable{idxest1,seriesnames},... 'Y0',rmDataTimeTable{idxpre1,seriesnames},'Coeff0',Coeff0);```

`Coeff` is a 39-by-1000 matrix of coefficients and `Sigma` is a 3-by-3-by-1000 array of innovations covariance matrices. Both `Coeff` and `Sigma` are randomly drawn from the posterior distribution.

Consider the 3-D VAR(4) model of Simulate Parameters from Posterior Distribution in Presence of Missing Values.

Load and Preprocess Data

Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values (the data includes leading missing values only).

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable); ```

Create Prior Model

Create a weak conjugate prior model. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames); numcoeffseqn = size(PriorMdl.V,1); PriorMdl.V = 1e4*eye(numcoeffseqn);```

Forecast Responses Using `forecast`

Directly forecast two years (eight quarters) of observations from the posterior predictive distribution. `forecast` estimates the posterior distribution of the parameters, and then forms the posterior predictive distribution.

```rng(1); % For reproducibility numperiods = 8; YF = forecast(PriorMdl,numperiods,rmDataTimeTable{:,seriesnames});```

`YF` is an 8-by-3 matrix of forecasted responses.

Plot the forecasted responses.

```fh = rmDataTimeTable.Time(end) + calquarters(1:8); for j = 1:PriorMdl.NumSeries subplot(3,1,j) plot(rmDataTimeTable.Time(end - 20:end),rmDataTimeTable{end - 20:end,seriesnames(j)},'r',... [rmDataTimeTable.Time(end) fh],[rmDataTimeTable{end,seriesnames(j)}; YF(:,j)],'b'); legend("Observed","Forecasted",'Location','NorthWest') title(seriesnames(j)) end```

Forecast Responses Using `simsmooth`

Configure the data set for obtaining forecasts from `simsmooth` by concatenating a `numperiods`-by-`numseries` timetable of missing values to the end of the set.

```fTT = array2timetable(NaN(numperiods,numseries),'RowTimes',fh,... 'VariableNames',seriesnames); frmDataTimeTable = [rmDataTimeTable(:,seriesnames); fTT]; tail(frmDataTimeTable)```
``` Time INFL DUNRATE DFEDFUNDS _____ ____ _______ _________ Q2-09 NaN NaN NaN Q3-09 NaN NaN NaN Q4-09 NaN NaN NaN Q1-10 NaN NaN NaN Q2-10 NaN NaN NaN Q3-10 NaN NaN NaN Q4-10 NaN NaN NaN Q1-11 NaN NaN NaN ```

Forecast the VAR model using `simsmooth`. Like `forecast`, `simsmooth` estimates the posterior distribution, so it requires a prior model and data. Specify the data containing the trailing `NaN`s.

`[~,~,~,YMean] = simsmooth(PriorMdl,frmDataTimeTable.Variables);`

`YMean` is almost equal to `frmDataTimeTable`, with these exceptions:

• `YMean` excludes rows corresponding to the presample period `frmDataTimeTable(1:4,:)`.

• If `frmDataTimeTable(``j``,``k``)` is `NaN`, `YMean(``j``,``k``)` is the mean of the posterior predictive distribution of response `k` at time `j`.

Create a timetable from `YMean`.

```YMeanTT = array2timetable(YMean,'RowTimes',frmDataTimeTable.Time((PriorMdl.P + 1):end),... 'VariableNames',seriesnames);```

Plot the forecasted responses.

```tiledlayout(3,1) for j = 1:PriorMdl.NumSeries nexttile plot(YMeanTT.Time((end - 20 - numperiods):(end - numperiods)),YMeanTT{(end - 20 - numperiods):(end - numperiods),j},'r',... YMeanTT.Time((end - numperiods):end),YMeanTT{(end - numperiods):end,j},'b'); legend("Observed","Forecasted",'Location','NorthWest') title(seriesnames(j)) end```

Consider the 3-D VAR(4) model of Simulate Parameters from Posterior Distribution in Presence of Missing Values.

Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values (the data includes leading missing values only).

```load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable); ```

Create a weak conjugate prior model. Specify the response series names.

```numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames); numcoeffseqn = size(PriorMdl.V,1); PriorMdl.V = 1e4*eye(numcoeffseqn);```

Conditional forecasting occurs when response values are known or hypothesized in the forecast horizon, and unknown values are forecasted conditioned on the known values. Suppose that the unemployment rate change (`DUNRATE`) remains at one percentage point through a two-year forecast horizon.

Configure the data set for obtaining forecasts from `simsmooth` by concatenating a `numperiods`-by-`numseries` timetable of missing values to the end of the set. For the unemployment rate change, replace the missing values with a vector of ones.

```rng(1); % For reproducibility numperiods = 8; fh = rmDataTimeTable.Time(end) + calquarters(1:8); fTT = array2timetable(NaN(numperiods,numseries),'RowTimes',fh,... 'VariableNames',seriesnames); frmDataTimeTable = [rmDataTimeTable(:,seriesnames); fTT]; frmDataTimeTable.DUNRATE((end - numperiods + 1):end) = ones(numperiods,1); tail(frmDataTimeTable)```
``` Time INFL DUNRATE DFEDFUNDS _____ ____ _______ _________ Q2-09 NaN 1 NaN Q3-09 NaN 1 NaN Q4-09 NaN 1 NaN Q1-10 NaN 1 NaN Q2-10 NaN 1 NaN Q3-10 NaN 1 NaN Q4-10 NaN 1 NaN Q1-11 NaN 1 NaN ```

Obtain forecasts for the inflation rate and federal funds rate change series, given that the unemployment rate change is one for the entire horizon.

`[~,~,~,YMean] = simsmooth(PriorMdl,frmDataTimeTable.Variables);`

`YMean` is almost equal to `frmDataTimeTable`, with these exceptions:

• `YMean` excludes rows corresponding to the presample period `frmDataTimeTable(1:4,:)`.

• If `frmDataTimeTable(``j``,``k``)` is `NaN`, `YMean(``j``,``k``)` is the mean of the posterior predictive distribution of response `k` at time `j`.

Create a timetable from `YMean`.

```YMeanTT = array2timetable(YMean,'RowTimes',frmDataTimeTable.Time((PriorMdl.P + 1):end),... 'VariableNames',seriesnames);```

Plot the forecasted responses.

```for j = 1:PriorMdl.NumSeries subplot(3,1,j) plot(YMeanTT.Time((end - 20 - numperiods):(end - numperiods)),YMeanTT{(end - 20 - numperiods):(end - numperiods),j},'r',... YMeanTT.Time((end - numperiods):end),YMeanTT{(end - numperiods):end,j},'b'); legend("Observed","Forecasted",'Location','NorthWest') title(seriesnames(j)) end```

Consider the 2-D VARX(1) model for the US real GDP (`RGDP`) and investment (`GCE`) rates that treats the personal consumption (`PCEC`) rate as exogenous:

`$\left[\begin{array}{l}{\text{RGDP}}_{t}\\ {\text{GCE}}_{t}\end{array}\right]=c+\Phi \left[\begin{array}{l}{\text{RGDP}}_{t-1}\\ {\text{GCE}}_{t-1}\end{array}\right]+{\text{PCEC}}_{t}\beta +{\epsilon }_{t}.$`

For all $t$, ${\epsilon }_{t}$ is a series of independent 2-D normal innovations with a mean of 0 and covariance $\Sigma$. Assume the following prior distributions:

• ${\left[\begin{array}{ccc}\Phi & \mathit{c}& \beta \end{array}\right]}^{\prime }|\Sigma \sim {Ν}_{4×2}\left(Μ,\mathit{V},\Sigma \right)$, where M is a 4-by-2 matrix of means and $\mathit{V}$ is the 4-by-4 among-coefficient scale matrix. Equivalently, $\mathrm{vec}\left({\left[\begin{array}{ccc}\Phi & \mathit{c}& \beta \end{array}\right]}^{\prime }\right)|\Sigma \sim {Ν}_{8}\left(\mathrm{vec}\left(Μ\right),\Sigma \otimes \text{\hspace{0.17em}}\mathit{V}\right)$.

• $\Sigma \sim Inverse\phantom{\rule{0.16666666666666666em}{0ex}}Wishart\left(\Omega ,\nu \right),$where Ω is the 2-by-2 scale matrix and $\nu$ is the degrees of freedom.

Load the US macroeconomic data set. Compute the real GDP, investment, and personal consumption rate series. Remove all missing values from the resulting series.

```load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF; seriesnames = ["PCEC"; "RGDP"; "GCE"]; rates = varfun(@price2ret,DataTimeTable,'InputVariables',seriesnames); rates = rmmissing(rates); rates.Properties.VariableNames = seriesnames;```

Create a conjugate prior model for the 2-D VARX(1) model parameters.

```numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = conjugatebvarm(numseries,numlags,'NumPredictors',numpredictors,... 'SeriesNames',seriesnames(2:end));```

Create index sets that partition the data into estimation and forecast samples. Specify a forecast horizon of two years.

```T = size(rates,1); numperiods = 8; idxest = 1:(T - numperiods); % Includes presample idxf = (T - numperiods + 1):T; idxtot = [idxest idxf];```

The simulation smoother forecasts by imputing missing values. Therefore, create a data set that contains missing values for the responses in the forecast horizon.

```missingrates = rates; missingrates{idxf,PriorMdl.SeriesNames} = nan(numperiods,PriorMdl.NumSeries);```

Forecast responses in the forecast horizon. Specify presample observations and the exogenous predictor data. Return standard deviations of the posterior predictive distribution.

```rng(1) % For reproducibility [~,~,~,YMean,YStd] = simsmooth(PriorMdl,missingrates{:,PriorMdl.SeriesNames},... 'X',missingrates{:,seriesnames(1)});```

Create a timetable from `YMean`.

```YMeanTT = array2timetable(YMean,'RowTimes',rates.Time((PriorMdl.P + 1):end),... 'VariableNames',PriorMdl.SeriesNames);```

Plot the forecasted responses.

```tiledlayout(2,1) for j = 1:PriorMdl.NumSeries nexttile plot(rates.Time((end - 20):end),rates{(end - 20):end,PriorMdl.SeriesNames(j)},'r',... YMeanTT.Time((end - numperiods):end),YMeanTT{(end - numperiods):end,PriorMdl.SeriesNames(j)},'b'); legend("Observed","Forecasted",'Location','NorthWest') title(PriorMdl.SeriesNames(j)) end```

## Input Arguments

collapse all

Prior Bayesian VAR model, specified as a model object in this table.

Model ObjectDescription
`conjugatebvarm`Dependent, matrix-normal-inverse-Wishart conjugate model returned by `bayesvarm`, `conjugatebvarm`, or `estimate`
`semiconjugatebvarm`Independent, normal-inverse-Wishart semiconjugate prior model returned by `bayesvarm` or `semiconjugatebvarm`
`diffusebvarm`Diffuse prior model returned by `bayesvarm` or `diffusebvarm`
`normalbvarm`Normal conjugate model with a fixed innovations covariance matrix, returned by `bayesvarm`, `normalbvarm`, or `estimate`

Observed multivariate response series to which `simsmooth` fits the model, specified as a `numobs`-by-`numseries` numeric matrix.

`numobs` is the sample size. `numseries` is the number of response variables (`PriorMdl.NumSeries`).

Rows correspond to observations, and the last row contains the latest observation. Columns correspond to individual response variables.

`NaN` values in `Y` indicate missing observations. `simsmooth` replaces non-presample missing values with an imputation (see `NaNDraws`) as part of the sampling process.

`Y` represents the continuation of the presample response series `Y0`.

Tip

• To obtain `numperiods` out-of-sample forecasts, concatenate a `numperiods`-by-`numseries` matrix of `NaN`s to the end of `Y`. `simsmooth` returns the forecasts and standard deviations in the corresponding elements of `YMean` and `YStd`, respectively. For example:

```numperiods = 8; Y = [Y; NaN(numperiods,PriorMdl.NumSeries)];```

• To obtain forecasts conditioned on observed (or hypothesized) values in a forecast horizon, concatenate a `numperiods`-by-`numseries` matrix of `NaN`s and observed or hypothesized values to the end of `Y`. For example, in a 2-D system, obtain the one-step-ahead forecast of response variable 1, given that response variable 2 is 0.5, by specifying

`Y = [Y; [NaN 0.5]];`

Data Types: `double`

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'Y0',Y0,'X',X` specifies the presample response data `Y0` to initialize the VAR model for posterior simulation, and the predictor data `X` for the exogenous regression component.

Presample response data to initialize the VAR model for simulation, specified as the comma-separated pair consisting of `'Y0'` and a `numpreobs`-by-`numseries` numeric matrix. `numpreobs` is the number of presample observations.

Rows correspond to presample observations, and the last row contains the latest observation. `Y0` must have at least `PriorMdl.P` rows. If you supply more rows than necessary, `simsmooth` uses the latest `PriorMdl.P` observations only.

Columns must correspond to the response series in `Y`.

By default, `simsmooth` uses `Y(1:PriorMdl.P,:)` as presample observations, and then estimates the posterior using `Y((PriorMdl.P + 1):end,:)`. This action reduces the effective sample size. The number of rows of the outputs `YMean` and `YStd` is `size(Y,1) – PriorMdl.P`. If `Y0` contains at least one `NaN`, `simsmooth` issues an error.

Data Types: `double`

Predictor data for the exogenous regression component in the model, specified as the comma-separated pair consisting of `'X'` and a `numobs`-by-`PriorMdl.NumPredictors` numeric matrix.

Rows correspond to observations, and the last row contains the latest observation. `simsmooth` does not use the regression component in the presample period. `X` must have at least as many observations as the observations used after the presample period.

• If you specify `Y0`, then `X` must have at least `numobs` rows (see `Y`).

• Otherwise, `X` must have at least `numobs``PriorMdl.P` observations to account for the presample removal.

In either case, if you supply more rows than necessary, `simsmooth` uses the latest observations only.

Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation.

If `X` contains at least one `NaN`, `simsmooth` issues an error.

Data Types: `double`

Number of random draws from the distributions, specified as the comma-separated pair consisting of `'NumDraws'` and a positive integer.

Example: `'NumDraws',1e7`

Data Types: `double`

Number of draws to remove from the beginning of the sample to reduce transient effects, specified as the comma-separated pair consisting of `'BurnIn'` and a nonnegative scalar. For details on how `simsmooth` reduces the full sample, see Algorithms.

Tip

To help you specify the appropriate burn-in period size:

1. Determine the extent of the transient behavior in the sample by specifying `'BurnIn',0`.

2. Simulate a few thousand observations by using `simsmooth`.

3. Draw trace plots.

Example: `'BurnIn',0`

Data Types: `double`

Adjusted sample size multiplier, specified as the comma-separated pair consisting of `'Thin'` and a positive integer.

The actual sample size is `BurnIn` + `NumDraws``*Thin`. After discarding the burn-in, `simsmooth` discards every `Thin``1` draws, and then retains the next draw. For more details on how `simsmooth` reduces the full sample, see Algorithms.

Tip

To reduce potential large serial correlation in the sample, or to reduce the memory consumption of the draws stored in `PriorMdl`, specify a large value for `Thin`.

Example: `'Thin',5`

Data Types: `double`

Starting value of the VAR model coefficients for the Gibbs sampler, specified as the comma-separated pair consisting of `'Coeff0'` and a numeric column vector with (`Mdl.NumSeries*k`)-by-`NumDraws` elements, where `k = Mdl.NumSeries*Mdl.P + Mdl.IncludeIntercept + Mdl.IncludeTrend + Mdl.NumPredictors`, which is the number of coefficients in a response equation. For details on the structure of `Coeff0`, see the output `CoeffDraws`.

By default, `Coeff0` is the multivariate least-squares estimate.

Tip

• To specify `Coeff0`:

1. Set separate variables for the initial values of each coefficient matrix and vector.

2. Horizontally concatenate all coefficient means in this order:

`$Coeff=\left[\begin{array}{ccccccc}{\Phi }_{1}& {\Phi }_{2}& \cdots & {\Phi }_{p}& c& \delta & Β\end{array}\right].$`

3. Vectorize the transpose of the coefficient mean matrix.

```Coeff0 = Coeff.'; Coeff0 = Coeff0(:);```

• A good practice is to run `simsmooth` multiple times with different parameter starting values. Verify that the estimates from each run converge to similar values.

Data Types: `double`

Starting value of the innovations covariance matrix for the Gibbs sampler, specified as the comma-separated pair consisting of `'Sigma0'` and a `PriorMdl.NumSeries`-by-`PriorMdl.NumSeries` positive definite numeric matrix. By default, `Sigma0` is the residual mean squared error from multivariate least-squares. Rows and columns correspond to innovations in the equations of the response variables ordered by `Mdl.SeriesNames`.

Tip

A good practice is to run `simsmooth` multiple times with different parameter starting values. Verify that the estimates from each run converge to similar values.

Data Types: `double`

## Output Arguments

collapse all

Simulated VAR model coefficients, returned as a (`PriorMdl.NumSeries*k`)-by-`NumDraws` numeric matrix, where `k = PriorMdl.NumSeries*PriorMdl.P + PriorMdl.IncludeIntercept + PriorMdl.IncludeTrend + PriorMdl.NumPredictors`, which is the number of coefficients in a response equation.

Each column is a successive draw (vector of coefficients) from the posterior distribution.

For draw `j`, `CoeffDraws(1:k,j)` corresponds to all coefficients in the equation of response variable `PriorMdl.SeriesNames(1)`, `Coeff((k + 1):(2*k),j)` corresponds to all coefficients in the equation of response variable `PriorMdl.SeriesNames(2)`, and so on. For a set of indices corresponding to an equation:

• Elements `1` through `PriorMdl.NumSeries` correspond to the lag 1 AR coefficients of the response variables ordered by `PriorMdl.SeriesNames`.

• Elements `PriorMdl.NumSeries + 1` through `2*PriorMdl.NumSeries` correspond to the lag 2 AR coefficients of the response variables ordered by `PriorMdl.SeriesNames`.

• In general, elements `(q – 1)*PriorMdl.NumSeries + 1` through `q*PriorMdl.NumSeries` correspond to the lag `q` AR coefficients of the response variables ordered by `PriorMdl.SeriesNames`.

• If `PriorMdl.IncludeConstant` is `true`, element `PriorMdl.NumSeries*PriorMdl.P + 1` is the model constant.

• If `Mdl.IncludeTrend` is `true`, element `PriorMdl.NumSeries*PriorMdl.P + 2` is the linear time trend coefficient.

• If `PriorMdl.NumPredictors` > 0, elements `PriorMdl.NumSeries*PriorMdl.P + 3` through `k` compose the vector of regression coefficients of the exogenous variables.

This figure shows the structure of `CoeffDraws(:,j)` for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors.

`$\left[\stackrel{{y}_{1,t}}{\overbrace{\begin{array}{ccccccccccc}{\varphi }_{1,11}& {\varphi }_{1,12}& {\varphi }_{2,11}& {\varphi }_{2,12}& {\varphi }_{3,11}& {\varphi }_{3,12}& {c}_{1}& {\beta }_{11}& {\beta }_{12}& {\beta }_{13}& {\beta }_{14}\end{array}}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{{y}_{2,t}}{\overbrace{\begin{array}{ccccccccccc}{\varphi }_{1,21}& {\varphi }_{1,22}& {\varphi }_{2,21}& {\varphi }_{2,22}& {\varphi }_{3,21}& {\varphi }_{3,22}& {c}_{2}& {\beta }_{21}& {\beta }_{22}& {\beta }_{23}& {\beta }_{24}\end{array}}}\right],$`

where

• ϕq,jk is element (j,k) of the lag q AR coefficient matrix.

• cj is the model constant in the equation of response variable j.

• Bju is the regression coefficient of exogenous variable u in the equation of response variable j.

Simulated innovations covariance matrices, returned as a `PriorMdl.NumSeries`-by-`PriorMdl.NumSeries`-by-`NumDraws` array of positive definite numeric matrices.

Each page is a successive draw (covariance) from the posterior distribution. Rows and columns correspond to innovations in the equations of the response variables ordered by `PriorMdl.SeriesNames`.

If `PriorMdl` is a `normalbvarm` object, all covariances in `SigmaDraws` are equal to `PriorMdl.Covariance`.

Imputations for missing response data, returned as a `numNaNs`-by-`NumDraws` numeric matrix, where `numNaNs = sum(isNaN(Y))`, the number of `NaN`s in the response data.

Each column is a successive draw (vector of response imputations) from the posterior predictive distribution.

Each row corresponds to a `NaN` in `Y`. `simsmooth` determines the order of the rows by using a column-wise search. For example, suppose `Y` is the following matrix.

```Y = [ 1 NaN;... NaN 2 ;... NaN 5 ;... 7 8];```
Here, `NaNDraws` is a 3-by-`NumDraws` matrix. For draw `j`, `NaNDraws(1,j)` contains the imputed value of the response variable `PriorMdl.SeriesNames(1)` at time 2, `NaNDraws(2,j)` contains the imputed value of the response variable `PriorMdl.SeriesNames(1)` at time 3, and `NaNDraws(3,j)` contains the imputed value of the response variable `PriorMdl.SeriesNames(2)` at time 1.

Augmented response data, returned as a `numobs`-by-`PriorMdl.NumSeries` numeric matrix. `YMean` is equal to the non-presample portion of the response data `Y`, but `simsmooth` replaces each `NaN` with the corresponding mean of the posterior predictive distribution (`YMean` contains zero missing values).

Standard deviations of the posterior predictive distribution of the imputed responses, returned as a `numobs`-by-`PriorMdl.NumSeries` numeric matrix. The dimensions of `YStd` and `YMean` correspond.

Missing values in the response data `Y`, which correspond to imputed values in `YMean`, have a nonzero standard deviation. Standard deviations corresponding to nonmissing response data are 0.

collapse all

### Bayesian Vector Autoregression (VAR) Model

A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table.

ModelEquation
Reduced-form VAR(p) in difference-equation notation
`${y}_{t}={\Phi }_{1}{y}_{t-1}+...+{\Phi }_{p}{y}_{t-p}+c+\delta t+Β{x}_{t}+{\epsilon }_{t}.$`
Multivariate regression
`${y}_{t}={Z}_{t}\lambda +{\epsilon }_{t}.$`
Matrix regression
`${y}_{t}={\Lambda }^{\prime }{z}_{t}^{\prime }+{\epsilon }_{t}.$`

For each time t = 1,...,T:

• yt is the m-dimensional observed response vector, where m = `numseries`.

• Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = `numlags`.

• c is the m-by-1 vector of model constants if `IncludeConstant` is `true`.

• δ is the m-by-1 vector of linear time trend coefficients if `IncludeTrend` is `true`.

• Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = `NumPredictors`. All predictor variables appear in each equation.

• ${z}_{t}=\left[\begin{array}{ccccccc}{y}_{t-1}^{\prime }& {y}_{t-2}^{\prime }& \cdots & {y}_{t-p}^{\prime }& 1& t& {x}_{t}^{\prime }\end{array}\right],$ which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix

`$\left[\begin{array}{cccc}{z}_{t}& {0}_{z}& \cdots & {0}_{z}\\ {0}_{z}& {z}_{t}& \cdots & {0}_{z}\\ ⋮& ⋮& \ddots & ⋮\\ {0}_{z}& {0}_{z}& {0}_{z}& {z}_{t}\end{array}\right],$`

where 0z is a 1-by-(mp + r + 2) vector of zeros.

• $\Lambda ={\left[\begin{array}{ccccccc}{\Phi }_{1}& {\Phi }_{2}& \cdots & {\Phi }_{p}& c& \delta & Β\end{array}\right]}^{\prime }$, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).

• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is

`$\ell \left(\Lambda ,\Sigma |y,x\right)=\prod _{t=1}^{T}f\left({y}_{t};\Lambda ,\Sigma ,{z}_{t}\right),$`

where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt.

Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where:

• Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T.

• X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T.

• Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.

### Posterior Predictive Distribution

A posterior predictive distribution of a posterior Bayesian VAR(p) model π(Yf|Y,X) is the distribution of the next τ future response variables after the final observation in the estimation sample Yf = [yT+1, yT+2,…,yT+τ] given the following, marginalized over Λ and Σ:

• Presample and estimation sample response data Y

• Coefficients Λ

• Innovations covariance matrix Σ

• Estimation and future sample exogenous data X

Symbolically,

`$\pi \left({Y}_{f}|Y,X\right)=\int \pi \left({Y}_{f}|Y,X,\Lambda ,\Sigma \right)\pi \left(\Lambda ,\Sigma |Y,X\right)d\Lambda d\Sigma .$`

### Gibbs Sampler with Bayesian Data Augmentation

The Gibbs sampler with Bayesian data augmentation is a Markov Chain Monte Carlo sampling method that draws from the posterior distribution of the parameters and unobserved response data given the observed data.

Consider these sets of indices:

• I = {(i,j) | i = 1,…,m +τ, j = 1,…,m +τ}, the universe containing all index pairs (i,j) of non-presample data and the forecast horizon. Y(I) represents the entire non-presample response data including any placeholders (`NaN` values) for missing observations and unobserved values in the forecast horizon.

• U = {(i,j) ∈ I | Y(i,j) is missing or unobserved}, the set of index pairs corresponding to missing or unobserved response data. Y(U) is a set of placeholders, and Y(Uc) is the set of observed data.

`simsmooth` simulates from the posterior distribution of the missing response data, coefficients, and innovations covariance matrix $\pi \left(Y\left(U\right),\Lambda ,\Sigma |Y\left({U}^{c}\right),X\right)$ by following this Gibbs sampler.

1. Create an observation-error-free state-space model by applying the specified VAR(p) structure to the state equations. Apply the initial parameter values Λ(0) and Σ(0) (`Coeff0` and `Sigma0`). The resulting model is the conditional posterior predictive distribution π(Y(U)|Λ(0)(0),Y(Uc),X).

2. During iteration j:

1. Simulate missing response data Y(U)(j) from π(Y(U) | Λ(j-1)(j-1),Y(Uc),X). To accomplish this action, `simsmooth` uses the state-space model simulation smoother (see the `simsmooth` function of the `ssm` model object). `simsmooth` stores Y(U)(j) in the output `NaNDraws`.

2. Compose the augmented data set Y(I)(j) = Y(U)(j)Y(Uc).

3. Draw coefficients Λ(j) and an innovations covariance matrix Σ(j) from the conditional posterior Bayesian VAR model with augmented data π(Λ,Σ | Y(I)(j),X). `simsmooth` stores Λ(j) and Σ(j) in the outputs `CoeffDraws` and `SigmaDraws`, respectively.

3. The mean and standard deviation of the posterior predictive distribution π(Y(U) | Λ,Σ,Y(Uc),X) is the mean and standard deviation of the series of draws {Y(U)(j)}. `simsmooth` stores the statistics in the outputs `YMean` and `YStd`.

After removing the burn-in period (`Burnin`) and thinning the sample (`Thin`), `simsmooth` stores the remaining set of draws and computes posterior statistics from it.

## Tips

• Monte Carlo simulation is subject to variation. If `simsmooth` uses Monte Carlo simulation, then estimates and inferences might vary when you call `simsmooth` multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed by using `rng` before calling `simsmooth`.

## Algorithms

• This figure shows how `simsmooth` reduces the sample by using the values of `NumDraws`, `Thin`, and `BurnIn`. Rectangles represent successive draws from the distribution. `simsmooth` removes the white rectangles from the sample. The remaining `NumDraws` black rectangles compose the sample.

• `simsmooth` does not return default starting values that it generates.

## References

[1] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384.

## Version History

Introduced in R2020a