plotDiagnostics

Plot observation diagnostics of linear regression model

collapse all in page

Syntax

plotDiagnostics(mdl)

plotDiagnostics(mdl,plottype)

plotDiagnostics(___,Name,Value)

plotDiagnostics(ax,___)

h = plotDiagnostics(___)

Description

plotDiagnostics creates a plot of observation diagnostics such as leverage, Cook's distance, and delete-1 statistics to identify outliers and influential observations.

plotDiagnostics(mdl) creates a leverage plot of the linear regression model (mdl) observations. A dotted line in the plot represents the recommended threshold values.

example

plotDiagnostics(mdl,plottype) specifies the type of observation diagnostics plottype.

plotDiagnostics(___,Name,Value) specifies additional options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify the marker symbol and size for the data points.

plotDiagnostics(ax,___) plots into the axes specified by ax instead of the current axes (gca). (since R2024a)

h = plotDiagnostics(___) returns graphics objects for the lines or contour in the plot. Use h to modify the properties of a specific line or contour after you create the plot. For a list of properties, see Line Properties and Contour Properties.

Examples

collapse all

Find Outliers Using Leverage and Cook's Distance

Open Live Script

Plot the leverage values and Cook's distances of observations and find the outliers.

Load the carsmall data set and fit a linear regression model of the mileage as a function of model year, weight, and weight squared.

load carsmall
tbl = table(MPG,Weight);
tbl.Year = categorical(Model_Year);
mdl = fitlm(tbl,'MPG ~ Year + Weight^2');

Plot the leverage values.

plotDiagnostics(mdl)
legend('show') % Show the legend

Figure contains an axes object. The axes object with title Case order plot of leverage, xlabel Row number, ylabel Leverage contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Leverage, Reference Line.

The dotted line represents the recommended threshold value 2*p/n, where p is the number of coefficients, and n is the number of observations. Find the threshold value using the NumCoefficients and NumObservations properties.

t_leverage = 2*mdl.NumCoefficients/mdl.NumObservations

t_leverage = 
0.1064

Find the observations with leverage values that exceed the threshold value.

find(mdl.Diagnostics.Leverage > t_leverage)

You can also find an observation number by using a data tip. Select the data points above the threshold line to display their data tips. The data tip includes the x-axis and y-axis values for the selected point, along with the observation number.

Plot the Cook's distance values.

plotDiagnostics(mdl,'cookd')

Figure contains an axes object. The axes object with title Case order plot of Cook's distance, xlabel Row number, ylabel Cook's distance contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Cook's distance, Reference Line.

The dotted line represents the recommended threshold value. Compute the threshold value t_cookd.

t_cookd = 3*mean(mdl.Diagnostics.CooksDistance,'omitnan')

t_cookd = 
0.0320

Find the observations with the Cook's distance values that exceed the threshold value.

find(mdl.Diagnostics.CooksDistance > t_cookd)

Two observations (26 and 35) are outliers by both measures, but some points (32, 80, 90, 92, and 97) are outliers by only one measure.

Input Arguments

collapse all

`mdl` — Linear regression model
`LinearModel` object

Linear regression model, specified as a LinearModel object created using fitlm or stepwiselm.

`plottype` — Type of plot
`'leverage'` (default) | `'contour'` | `'cookd'` | `'covratio'` | `'dfbetas'` | `'dffits'` | `'s2_i'`

Type of plot, specified as one of the values in this table.

Value	Plot Type	Dotted Reference Line in Plot	Purpose
`'contour'`	Residual vs. leverage with overlaid contours of Cook's distance	Contours of Cook's distance	Identify observations with large residual values, high leverage, and large Cook's distance values.
`'cookd'`	Cook's distance	Recommended threshold, computed by `3*mean(mdl.Diagnostics.CooksDistance)`	Identify observations with large Cook's distance values.
`'covratio'`	Delete-1 ratio of determinant of covariance	Recommended thresholds, computed by `1±3*p/n`, where `p` is the number of coefficients (`mdl.NumCoefficients`) and `n` is the number of observations (`mdl.NumObservations`)	Identify observations where the delete-1 statistic value is not in the range of the recommended thresholds.
`'dfbetas'`	Delete-1 scaled differences in coefficient estimates	Recommended threshold, computed by `3/sqrt(n)`	Identify observations with large delete-1 statistic values.
`'dffits'`	Delete-1 scaled differences in fitted values	Recommended threshold, computed by `2*sqrt(p/n)` in an absolute value	Identify observations with large delete-1 statistic values in an absolute value.
`'leverage'`	Leverage	Recommended threshold, computed by `2*p/n`	Identify high leverage observations.
`'s2_i'`	Delete-1 variance	Mean squared error (`mdl.MSE`)	Compare the delete-1 variance with the mean squared error.

For all plot types except 'contour', the x-axis is the row number (case order) of observations.

The Diagnostics property of mdl contains the diagnostic values used by plotDiagnostics to create plots.

For more information about observation diagnostics, see Cook’s Distance, Delete-1 Statistics, and Leverage.

`ax` — Target axes
`Axes` object

Since R2024a

Target axes, specified as an Axes object. If you do not specify the axes, then plotDiagnostics uses the current axes (gca).

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Color','blue','Marker','o'

Note

The graphical properties listed here are only a subset. For a complete list, see Line Properties. The specified properties determine the appearance of diagnostic data points.

`Color` — Line color
RGB triplet | hexadecimal color code | color name | short name

Line color, specified an RGB triplet, hexadecimal color code, color name, or short name for one of the color options listed in the following table.

The Color name-value argument also determines marker outline color and marker fill color if MarkerEdgeColor is "auto" (default) and MarkerFaceColor is "auto".

For a custom color, specify an RGB triplet or a hexadecimal color code.

An RGB triplet is a three-element row vector whose elements specify the intensities of the red, green, and blue components of the color. The intensities must be in the range [0,1], for example, [0.4 0.6 0.7].
A hexadecimal color code is a string scalar or character vector that starts with a hash symbol (#) followed by three or six hexadecimal digits, which can range from 0 to F. The values are not case sensitive. Therefore, the color codes "#FF8800", "#ff8800", "#F80", and "#f80" are equivalent.

Alternatively, you can specify some common colors by name. This table lists the named color options, the equivalent RGB triplets, and the hexadecimal color codes.

Color Name	Short Name	RGB Triplet	Hexadecimal Color Code	Appearance
`"red"`	`"r"`	`[1 0 0]`	`"#FF0000"`
`"green"`	`"g"`	`[0 1 0]`	`"#00FF00"`
`"blue"`	`"b"`	`[0 0 1]`	`"#0000FF"`
`"cyan"`	`"c"`	`[0 1 1]`	`"#00FFFF"`
`"magenta"`	`"m"`	`[1 0 1]`	`"#FF00FF"`
`"yellow"`	`"y"`	`[1 1 0]`	`"#FFFF00"`
`"black"`	`"k"`	`[0 0 0]`	`"#000000"`
`"white"`	`"w"`	`[1 1 1]`	`"#FFFFFF"`
`"none"`	Not applicable	Not applicable	Not applicable	No color

This table lists the default color palettes for plots in the light and dark themes.

Palette Palette Colors

Palette	Palette Colors
`"gem"` — Light theme default Before R2025a: Most plots use these colors by default.
`"glow"` — Dark theme default

"gem" — Light theme default

Before R2025a: Most plots use these colors by default.

"glow" — Dark theme default

You can get the RGB triplets and hexadecimal color codes for these palettes using the orderedcolors and rgb2hex functions. For example, get the RGB triplets for the "gem" palette and convert them to hexadecimal color codes.

RGB = orderedcolors("gem");
H = rgb2hex(RGB);

Before R2023b: Get the RGB triplets using RGB = get(groot,"FactoryAxesColorOrder").

Before R2024a: Get the hexadecimal color codes using H = compose("#%02X%02X%02X",round(RGB*255)).

Example: Color="blue"

Data Types: single | double | string | char

`LineWidth` — Line width
positive value

Line width, specified as a positive value in points. If the line has markers, then the line width also affects the marker edges.

Example: LineWidth=0.75

Data Types: single | double

`Marker` — Marker symbol
`"o"` | `"+"` | `"*"` | `"."` | `"x"` | ...

Marker symbol, specified as one of the values in this table.

Marker	Description	Resulting Marker
`"o"`	Circle
`"+"`	Plus sign
`"*"`	Asterisk
`"."`	Point
`"x"`	Cross
`"_"`	Horizontal line
`"\|"`	Vertical line
`"square"`	Square
`"diamond"`	Diamond
`"^"`	Upward-pointing triangle
`"v"`	Downward-pointing triangle
`">"`	Right-pointing triangle
`"<"`	Left-pointing triangle
`"pentagram"`	Pentagram
`"hexagram"`	Hexagram
`"none"`	No markers	Not applicable

Example: Marker="+"

Data Types: string | char

`MarkerEdgeColor` — Marker outline color
`"auto"` (default) | `"none"` | RGB triplet | hexadecimal color code | color name | short name

Marker outline color, specified an RGB triplet, hexadecimal color code, color name, or short name for one of the color options listed in the Color name-value argument.

The default value "auto" uses the same color specified by using the Color name-value argument. You can also specify "none" for no color.

Example: MarkerEdgeColor="blue"

Data Types: single | double | string | char

`MarkerFaceColor` — Marker fill color
`"none"` (default) | `"auto"` | RGB triplet | hexadecimal color code | color name | short name

Marker fill color, specified as an RGB triplet, hexadecimal color code, color name, or short name for one of the color options listed in the Color name-value argument. The default value "none" specifies no color.

The "auto" value uses the same color specified by using the Color name-value argument.

Example: MarkerFaceColor="blue"

Data Types: single | double | string | char

`MarkerSize` — Marker size
`6` (default) | positive value

Marker size, specified as a positive value in points.

Example: MarkerSize=2

Data Types: single | double

Output Arguments

collapse all

`h` — Graphics objects
graphics array

Graphics objects corresponding to the lines or contour in the plot, returned as a graphics array. Use dot notation to query and set properties of the graphics objects. For details, see Line Properties and Contour Properties.

You can use name-value pair arguments to specify the appearance of diagnostic data points corresponding to the first graphics object h(1). If plottype is 'dfbetas', the plot includes a line object for each coefficient. Name-value pair arguments specify the line object properties of all coefficients. You can modify the properties of each coefficient separately by using the corresponding graphics object.

More About

collapse all

Cook’s Distance

Cook’s distance is the scaled change in fitted values, which is useful for identifying outliers in the X values (observations for predictor variables). Cook’s distance shows the influence of each observation on the fitted response values. An observation with Cook’s distance larger than three times the mean Cook’s distance might be an outlier.

Each element in the Cook's distance D is the normalized change in the fitted response values due to the deletion of an observation. The Cook’s distance of observation i is

$D_{i} = \frac{\sum_{j = 1}^{n} {({\hat{y}}_{j} - {\hat{y}}_{j (i)})}^{2}}{p M S E},$

where

${\hat{y}}_{j}$ is the jth fitted response value.
${\hat{y}}_{j (i)}$ is the jth fitted response value, where the fit does not include observation i.
MSE is the mean squared error.
p is the number of coefficients in the regression model.

Cook’s distance is algebraically equivalent to the following expression:

$D_{i} = \frac{r_{i}^{2}}{p M S E} (\frac{h_{i i}}{{(1 - h_{i i})}^{2}}),$

where r_i is the ith residual, and h_ii is the ith leverage value.

For more details, see Cook’s Distance.

Delete-1 Statistics

Delete-1 statistics are useful for finding the influence of each observation. These statistics capture the changes that would result from excluding each observation in turn from the fit. If the delete-1 statistics differ significantly from the model using all observations, then the observation is influential.

See Delete-1 Statistics for the definitions and usages of the delete-1 statistics.

Leverage

Leverage is a measure of the effect of a particular observation on the regression predictions due to the position of that observation in the space of the inputs.

The leverage of observation i is the value of the ith diagonal term h_ii of the hat matrix H. The hat matrix H is defined in terms of the data matrix X:

H = X(X^TX)^–1X^T.

The hat matrix is also known as the projection matrix because it projects the vector of observations y onto the vector of predictions $\hat{y}$ , thus putting the "hat" on y.

Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered an outlier if its leverage substantially exceeds p/n, where n is the number of observations.

For more details, see Hat Matrix and Leverage.

Tips

The data cursor displays the values of the selected plot point in a data tip (small text box located next to the data point). The data tip includes the x-axis and y-axis values for the selected point, along with the observation name or number.
Use legend('show') to show the pre-populated legend.

Alternative Functionality

A LinearModel object provides multiple plotting functions.
- When creating a model, use plotAdded to understand the effect of adding or removing a predictor variable.
- When verifying a model, use plotDiagnostics to find questionable data and to understand the effect of each observation. Also, use plotResiduals to analyze the residuals of the model.
- After fitting a model, use plotAdjustedResponse, plotPartialDependence, and plotEffects to understand the effect of a particular predictor. Use plotInteraction to understand the interaction effect between two predictors. Also, use plotSlice to plot slices through the prediction surface.

References

[1] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. Applied Linear Statistical Models, Fourth Edition. Chicago: McGraw-Hill Irwin, 1996.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2012a

expand all

R2024a: Specify target axes

Specify the target axes for the plot by using the ax input argument.

plotDiagnostics

Syntax

Description

Examples

Find Outliers Using Leverage and Cook's Distance

Input Arguments

`mdl` — Linear regression model
`LinearModel` object

`plottype` — Type of plot
`'leverage'` (default) | `'contour'` | `'cookd'` | `'covratio'` | `'dfbetas'` | `'dffits'` | `'s2_i'`

`ax` — Target axes
`Axes` object

Name-Value Arguments

`Color` — Line color
RGB triplet | hexadecimal color code | color name | short name

`LineWidth` — Line width
positive value

`Marker` — Marker symbol
`"o"` | `"+"` | `"*"` | `"."` | `"x"` | ...

`MarkerEdgeColor` — Marker outline color
`"auto"` (default) | `"none"` | RGB triplet | hexadecimal color code | color name | short name

`MarkerFaceColor` — Marker fill color
`"none"` (default) | `"auto"` | RGB triplet | hexadecimal color code | color name | short name

`MarkerSize` — Marker size
`6` (default) | positive value

Output Arguments

`h` — Graphics objects
graphics array

More About

Cook’s Distance

Delete-1 Statistics

Leverage

Tips

Alternative Functionality

References

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024a: Specify target axes

See Also

Topics

plotDiagnostics

Syntax

Description

Examples

Find Outliers Using Leverage and Cook's Distance

Input Arguments

mdl — Linear regression model LinearModel object

plottype — Type of plot 'leverage' (default) | 'contour' | 'cookd' | 'covratio' | 'dfbetas' | 'dffits' | 's2_i'

ax — Target axes Axes object

Name-Value Arguments

Color — Line color RGB triplet | hexadecimal color code | color name | short name

LineWidth — Line width positive value

Marker — Marker symbol "o" | "+" | "*" | "." | "x" | ...

MarkerEdgeColor — Marker outline color "auto" (default) | "none" | RGB triplet | hexadecimal color code | color name | short name

MarkerFaceColor — Marker fill color "none" (default) | "auto" | RGB triplet | hexadecimal color code | color name | short name

MarkerSize — Marker size 6 (default) | positive value

Output Arguments

h — Graphics objects graphics array

More About

Cook’s Distance

Delete-1 Statistics

Leverage

Tips

Alternative Functionality

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024a: Specify target axes

See Also

Topics

`mdl` — Linear regression model
`LinearModel` object

`plottype` — Type of plot
`'leverage'` (default) | `'contour'` | `'cookd'` | `'covratio'` | `'dfbetas'` | `'dffits'` | `'s2_i'`

`ax` — Target axes
`Axes` object

`Color` — Line color
RGB triplet | hexadecimal color code | color name | short name

`LineWidth` — Line width
positive value

`Marker` — Marker symbol
`"o"` | `"+"` | `"*"` | `"."` | `"x"` | ...

`MarkerEdgeColor` — Marker outline color
`"auto"` (default) | `"none"` | RGB triplet | hexadecimal color code | color name | short name

`MarkerFaceColor` — Marker fill color
`"none"` (default) | `"auto"` | RGB triplet | hexadecimal color code | color name | short name

`MarkerSize` — Marker size
`6` (default) | positive value

`h` — Graphics objects
graphics array

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.