Main Content

Delete-1 change in covariance (`covratio`

)
identifies the observations that are influential in the regression
fit. An influential observation is one where its exclusion from the
model might significantly alter the regression function. Values of
covratio larger than 1 + 3**p*/*n* or
smaller than 1 – 3**p*/*n* indicate
influential points, where *p* is the number of regression
coefficients, and *n* is the number of observations.

The covratio statistic is the ratio of the determinant of the
coefficient covariance matrix with observation *i* deleted
to the determinant of the covariance matrix for the full model:

$$\mathrm{cov}ratio=\frac{\mathrm{det}\left\{MSE\left(i\right){\left[{X}^{\prime}\left(i\right)X\left(i\right)\right]}^{-1}\right\}}{\mathrm{det}\left[MSE{\left({X}^{\prime}X\right)}^{-1}\right]}.$$

`CovRatio`

is an *n*-by-1
vector in the `Diagnostics`

table of the fitted `LinearModel`

object.
Each element is the ratio of the generalized variance of the estimated
coefficients when the corresponding element is deleted to the generalized
variance of the coefficients using all the data.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can:

Display the

`CovRatio`

by indexing into the property using dot notationmdl.Diagnostics.CovRatio

Plot the delete-1 change in covariance using

For details, see theplotDiagnostics(mdl,'CovRatio')

`plotDiagnostics`

method of the`LinearModel`

class.

This example shows how to use the `CovRatio`

statistics to determine the influential points in data. Load the sample data and define the response and predictor variables.

```
load hospital
y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));
```

Fit a linear regression model.

mdl = fitlm(X,y);

Plot the `CovRatio`

statistics.

`plotDiagnostics(mdl,'CovRatio')`

For this example, the threshold limits are 1 + 3*5/100 = 1.15 and 1 - 3*5/100 = 0.85. There are a few points beyond the limits, which might be influential points.

Find the observations that are beyond the limits.

find((mdl.Diagnostics.CovRatio)>1.15|(mdl.Diagnostics.CovRatio)<0.85)

`ans = `*5×1*
2
14
84
93
96

The sign of a delete-1 scaled difference in coefficient estimate
(Dfbetas) for coefficient *j* and observation *i* indicates
whether that observation causes an increase or decrease in the estimate
of the regression coefficient. The absolute value of a Dfbetas indicates
the magnitude of the difference relative to the estimated standard
deviation of the regression coefficient. A Dfbetas value larger than
3/sqrt(*n*) in absolute value indicates that the
observation has a large influence on the corresponding coefficient.

Dfbetas for coefficient *j* and observation *i* is
the ratio of the difference in the estimate of coefficient *j* using
all observations and the one obtained by removing observation *i*,
and the standard error of the coefficient estimate obtained by removing
observation *i*. The Dfbetas for coefficient *j* and
observation *i* is

$$Dfbeta{s}_{ij}=\frac{{b}_{j}-{b}_{j\left(i\right)}}{\sqrt{MS{E}_{\left(i\right)}}\left(1-{h}_{ii}\right)},$$

where *b*_{j} is
the estimate for coefficient *j*, *b*_{j(i)} is
the estimate for coefficient *j* by removing observation *i*, *MSE*_{(i)} is
the mean squared error of the regression fit by removing observation *i*,
and *h*_{ii} is
the leverage value for observation *i*. `Dfbetas`

is
an *n*-by-*p* matrix in the `Diagnostics`

table
of the fitted `LinearModel`

object. Each cell of `Dfbetas`

corresponds
to the Dfbetas value for the corresponding coefficient obtained by
removing the corresponding observation.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can obtain the Dfbetas values as an *n*-by-*p* matrix
by indexing into the property using dot notation,

mdl.Diagnostics.Dfbetas

`Dfbetas`

This example shows how to determine the observations that have large influence on coefficients using `Dfbetas`

. Load the sample data and define the response and independent variables.

```
load hospital
y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));
```

Fit a linear regression model.

mdl = fitlm(X,y);

Find the `Dfbetas`

values that are high in absolute value.

[row,col] = find(abs(mdl.Diagnostics.Dfbetas)>3/sqrt(100)); disp([row col])

2 1 28 1 84 1 93 1 2 2 13 3 84 3 2 4 84 4

The delete-1 scaled change in fitted values (Dffits) show the
influence of each observation on the fitted response values. Dffits
values with an absolute value larger than 2*sqrt(*p*/*n*)
might be influential.

Dffits for observation *i* is

$$Dffit{s}_{i}=s{r}_{i}\sqrt{\frac{{h}_{ii}}{1-{h}_{ii}}},$$

where *sr*_{i }is
the studentized residual, and *h*_{ii} is
the leverage value of the fitted `LinearModel`

object. `Dffits`

is
an *n*-by-1 column vector in the `Diagnostics`

table
of the fitted `LinearModel`

object. Each element
in `Dffits`

is the change in the fitted value caused
by deleting the corresponding observation and scaling by the standard
error.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can:

Display the

`Dffits`

values by indexing into the property using dot notationmdl.Diagnostics.Dffits

Plot the delete-1 scaled change in fitted values using

For details, see theplotDiagnostics(mdl,'Dffits')

`plotDiagnostics`

method of the`LinearModel`

class for details.

`Dffits`

This example shows how to determine the observations that are influential on the fitted response values using `Dffits`

values. Load the sample data and define the response and independent variables.

```
load hospital
y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));
```

Fit a linear regression model.

mdl = fitlm(X,y);

Plot the `Dffits`

values.

`plotDiagnostics(mdl,'Dffits')`

The influential threshold limit for the absolute value of `Dffits`

in this example is 2*sqrt(5/100) = 0.45. Again, there are some observations with `Dffits`

values beyond the recommended limits.

Find the `Dffits`

values that are large in absolute value.

find(abs(mdl.Diagnostics.Dffits)>2*sqrt(4/100))

`ans = `*10×1*
2
13
28
44
58
70
71
84
93
95

`S2_i`

)The delete-1 variance (`S2_i`

) shows how the mean squared error changes when
an observation is removed from the data set. You can compare the
`S2_i`

values with the value of the mean squared
error.

`S2_i`

is a set of residual variance estimates obtained by deleting each
observation in turn. The `S2_i`

value for observation
*i* is

$$S2\_i=MS{E}_{\left(i\right)}=\frac{{\displaystyle \sum _{j\ne i}^{n}{\left[{y}_{j}-{\widehat{y}}_{j\left(i\right)}\right]}^{2}}}{n-p-1},$$

where
*y*_{j} is the
*j*th observed response value. `S2_i`

is
an *n*-by-1 vector in the `Diagnostics`

table
of the fitted `LinearModel`

object. Each element in
`S2_i`

is the mean squared error of the regression obtained
by deleting that observation.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can:

Display the

`S2_i`

vector by indexing into the property using dot notationmdl.Diagnostics.S2_i

Plot the delete-1 variance values using

For details, see theplotDiagnostics(mdl,'S2_i')

`plotDiagnostics`

method of the`LinearModel`

class.

This example shows how to compute and plot S2_i values to examine the change in the mean squared error when an observation is removed from the data. Load the sample data and define the response and independent variables.

```
load hospital
y = hospital.BloodPressure(:,1);
X = double(hospital(:,2:5));
```

Fit a linear regression model.

mdl = fitlm(X,y);

Display the MSE value for the model.

mdl.MSE

ans = 23.1140

Plot the S2_i values.

`plotDiagnostics(mdl,'S2_i')`

This plot makes it easy to compare the S2_i values to the MSE value of 23.114, indicated by the horizontal dashed lines. You can see how deleting one observation changes the error variance.

`fitlm`

| `LinearModel`

| `plotDiagnostics`

| `plotResiduals`

| `stepwiselm`