# ecmmvnrmle

Multivariate normal regression with missing data

## Syntax

``[Param,Covar] = ecmmvnrmle(Data,Design)``
``[Param,Covar,Resid,Info] = ecmmvnrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)``

## Description

example

````[Param,Covar] = ecmmvnrmle(Data,Design)` estimates a multivariate normal regression model with missing data. The model has the form $Dat{a}_{k}\sim N\left(Desig{n}_{k}×Parameters,\text{\hspace{0.17em}}Covariance\right)$for samples k = 1, ... , `NUMSAMPLES`.```

example

````[Param,Covar,Resid,Info] = ecmmvnrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)` adds an optional arguments for `MaxIterations`, `TolParam`, `TolObj`, `Param0`, `Covar0`, and `CovarFormat`. ```

## Examples

collapse all

This example shows how to estimate a multivariate normal regression model with missing data.

First, load dates, total returns, and ticker symbols for the twelve stocks from the MAT-file.

```load CAPMuniverse whos Assets Data Dates```
``` Name Size Bytes Class Attributes Assets 1x14 1568 cell Data 1471x14 164752 double Dates 1471x1 11768 double ```
`Dates = datetime(Dates,'ConvertFrom','datenum');`

The assets in the model have the following symbols, where the last two series are proxies for the market and the riskless asset.

`Assets(1:14)`
```ans = 1x14 cell Columns 1 through 6 {'AAPL'} {'AMZN'} {'CSCO'} {'DELL'} {'EBAY'} {'GOOG'} Columns 7 through 12 {'HPQ'} {'IBM'} {'INTC'} {'MSFT'} {'ORCL'} {'YHOO'} Columns 13 through 14 {'MARKET'} {'CASH'} ```

The data covers the period from January 1, 2000 to November 7, 2005 with daily total returns. Two stocks in this universe have missing values that are represented by `NaN`s. One of the two stocks had an IPO during this period and, consequently, has significantly less data than the other stocks.

Compute separate regressions for each stock, where the stocks with missing data have estimates that reflect their reduced observability.

```[NumSamples, NumSeries] = size(Data); NumAssets = NumSeries - 2; StartDate = Dates(1); EndDate = Dates(end); Alpha = NaN(1, length(NumAssets)); Beta = NaN(1, length(NumAssets)); Sigma = NaN(1, length(NumAssets)); StdAlpha = NaN(1, length(NumAssets)); StdBeta = NaN(1, length(NumAssets)); StdSigma = NaN(1, length(NumAssets)); for i = 1:NumAssets % Set up separate asset data and design matrices TestData = zeros(NumSamples,1); TestDesign = zeros(NumSamples,2); TestData(:) = Data(:,i) - Data(:,14); TestDesign(:,1) = 1.0; TestDesign(:,2) = Data(:,13) - Data(:,14); % Estimate the multivariate normal regression for each asset separately. [Param, Covar] = ecmmvnrmle(TestData, TestDesign) end ```
```Param = 2×1 0.0012 1.2294 ```
```Covar = 0.0010 ```
```Param = 2×1 0.0006 1.3661 ```
```Covar = 0.0020 ```
```Param = 2×1 -0.0002 1.5653 ```
```Covar = 8.8911e-04 ```
```Param = 2×1 -0.0000 1.2594 ```
```Covar = 6.4996e-04 ```
```Param = 2×1 0.0014 1.3441 ```
```Covar = 0.0014 ```
```Param = 2×1 0.0046 0.3742 ```
```Covar = 6.3272e-04 ```
```Param = 2×1 0.0001 1.3745 ```
```Covar = 6.5040e-04 ```
```Param = 2×1 -0.0000 1.0807 ```
```Covar = 2.8562e-04 ```
```Param = 2×1 0.0001 1.6002 ```
```Covar = 6.9146e-04 ```
```Param = 2×1 -0.0002 1.1765 ```
```Covar = 3.7138e-04 ```
```Param = 2×1 0.0000 1.5010 ```
```Covar = 0.0010 ```
```Param = 2×1 0.0001 1.6543 ```
```Covar = 0.0015 ```

## Input Arguments

collapse all

Data, specified as an `NUMSAMPLES`-by-`NUMSERIES` matrix with `NUMSAMPLES` samples of a `NUMSERIES`-dimensional random vector. Missing values are indicated by `NaN`s. Only samples that are entirely `NaN`s are ignored. (To ignore samples with at least one `NaN`, use `mvnrmle`.)

Data Types: `double`

Design model, specified as a matrix or a cell array that handles two model structures:

• If `NUMSERIES = 1`, `Design` is a `NUMSAMPLES`-by-`NUMPARAMS` matrix with known values. This structure is the standard form for regression on a single series.

• If `NUMSERIES``1`, `Design` is a cell array. The cell array contains either one or `NUMSAMPLES` cells. Each cell contains a `NUMSERIES`-by-`NUMPARAMS` matrix of known values.

If `Design` has a single cell, it is assumed to have the same `Design` matrix for each sample. If `Design` has more than one cell, each cell contains a `Design` matrix for each sample.

Data Types: `double` | `cell`

(Optional) Maximum number of iterations for the estimation algorithm, specified as a numeric.

Data Types: `double`

(Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates, specified as a numeric. The convergence test for changes in model parameters is

`$‖Para{m}_{k}-Para{m}_{k-1}‖`

where `Param` represents the output `Parameters`, and iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam``0` and `TolObj``0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests.

Data Types: `double`

(Optional) Convergence tolerance for estimation algorithm based on changes in the objective function, specified as a numeric. The convergence test for changes in the objective function is

`$|Ob{j}_{k}-Ob{j}_{k-1}|<\text{\hspace{0.17em}}TolObj×\left(1+|Ob{j}_{k}|\right)$`

for iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam``0` and `TolObj``0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests.

Data Types: `double`

(Optional) Estimate for the parameters of the regression model, specified as an `NUMPARAMS`-by-`1` column vector.

Data Types: `double`

(Optional) Estimate for the covariance matrix of the regression residuals, specified as `NUMSERIES`-by-`NUMSERIES` matrix.

Data Types: `double`

(Optional) Format for the covariance matrix, specified as a character vector. The choices are:

• `'full'` — Compute the full covariance matrix.

• `'diagonal'` — Force the covariance matrix to be a diagonal matrix.

Data Types: `char`

## Output Arguments

collapse all

Estimates for the parameters of the regression model, returned as a `NUMPARAMS`-by-`1` column vector.

Estimates for the covariance of the regression model's residuals, returned as a `NUMSERIES`-by-`NUMSERIES` matrix.

Residuals from the regression, returned as a `NUMSAMPLES`-by-`NUMSERIES` matrix. For any missing values in `Data`, the corresponding residual is the difference between the conditionally imputed value for `Data` and the model, that is, the imputed residual.

Note

The covariance estimate `Covariance` cannot be derived from the residuals.

Additional information from the regression, returned as a structure. The structure has these fields:

• `Info.Obj` — A variable-extent column vector, with no more than `MaxIterations` elements, that contain each value of the objective function at each iteration of the estimation algorithm. The last value in this vector, `Obj``(end)`, is the terminal estimate of the objective function. If you do maximum likelihood estimation, the objective function is the log-likelihood function.

• `Info.PrevParameters``NUMPARAMS`-by-`1` column vector of estimates for the model parameters from the iteration just prior to the terminal iteration.`Info.PrevCovariance``NUMSERIES`-by-`NUMSERIES` matrix of estimates for the covariance parameters from the iteration just prior to the terminal iteration.

## References

[1] Little, Roderick J. A. and Donald B. Rubin. Statistical Analysis with Missing Data. 2nd Edition. John Wiley & Sons, Inc., 2002.

[2] Meng, Xiao-Li and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

[3] Sexton, Joe and Anders Rygh Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

[4] Dempster, A. P., N. M. Laird, and Donald B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.

## Version History

Introduced in R2006a