Clean Outlier Data
Find, fill, or remove outliers in the Live Editor
Description
The Clean Outlier Data task lets you interactively handle outliers in data. The task automatically generates MATLAB® code for your live script.
Using this task, you can:
Find, fill, or remove outliers from data in a workspace variable.
Customize the methods for finding and filling outliers.
Visualize the outlier data and cleaned data.
Open the Task
To add the Clean Outlier Data task to a live script in the MATLAB Live Editor:
On the Live Editor tab, click Task and select the Clean Outlier Data icon
.
In a code block in the live script, type a relevant keyword, such as
outlier
orclean
. SelectClean Outlier Data
from the suggested command completions.
Examples
Related Examples
Parameters
Input data
— Valid input data from workspace
vector | table | timetable
This task operates on input data contained in a vector, table, or timetable. The
data can be of type single
or double
.
For table or timetable input data, to clean all variables with type
single
or double
, select All
supported variables
. To choose which single
or
double
variables to clean, select Specified
variables
.
Cleaning method
— Cleaning method for filling outliers
Linear interpolation
(default) | Constant value
| Convert to missing
| ...
Specify the method for filling outliers as one of these options.
Fill Method | Description |
---|---|
Linear interpolation | Linear interpolation of neighboring, nonoutlier values |
Constant value | Specified scalar value, which is 0 by default |
Convert to missing | Convert to default definition of standard missing value |
Center value | Center value determined by the detection method |
Clip to threshold value | Lower threshold value for elements smaller than the lower threshold determined by the detection method; upper threshold value for elements larger than the upper threshold determined by the detection method |
Previous value | Previous nonoutlier value |
Next value | Next nonoutlier value |
Nearest value | Nearest nonoutlier value |
Spline interpolation | Piecewise cubic spline interpolation |
Shape-preserving cubic interpolation
(PCHIP) | Shape-preserving piecewise cubic spline interpolation |
Modified Akima cubic interpolation | Modified Akima cubic Hermite interpolation |
Detection method
— Method for detecting outliers
Moving median
(default) | Median
| Mean
| ...
Specify the detection method for finding outliers as one of these options.
Method | Description |
---|---|
Moving median | Outliers are defined as elements more than the specified threshold of local
scaled median absolute deviations (MAD) from the local median over a specified
window. The default threshold is 3 . |
Median | Outliers are defined as elements more than the specified threshold of
scaled MAD from the median. The default threshold is 3 . For
input data A , the scaled MAD is defined as
c*median(abs(A-median(A))) , where
c=-1/(sqrt(2)*erfcinv(3/2)) . |
Mean | Outliers are defined as elements more than the specified threshold of
standard deviations from the mean. The default threshold is
3 . This method is faster but less robust than
Median . |
Quartiles | Outliers are defined as elements more than the specified threshold of
interquartile ranges above the upper quartile (75 percent) or below the lower
quartile (25 percent). The default threshold is 1.5 . This
method is useful when the input data is not normally distributed. |
Grubbs | Outliers are detected using Grubbs’ test, which removes one outlier per iteration based on hypothesis testing. This method assumes that the input data is normally distributed. |
Generalized extreme studentized deviate
(GESD) | Outliers are detected using the generalized extreme studentized deviate
test for outliers. This iterative method is similar to
Grubbs but can perform better when multiple
outliers are masking each other. |
Moving mean | Outliers are defined as elements more than the specified threshold of local
standard deviations from the local mean over a specified window. The default
threshold is 3 . |
Percentiles | Outliers are defined as elements outside of the percentile range specified
by an upper and lower threshold. The default lower percentile threshold is
10 , and the default upper percentile threshold is
90 . Valid threshold values are in the interval [0,
100]. |
Moving window
— Window for moving methods
Centered
(default) | Asymmetric
Specify the window type and size when the method for detecting outliers is
Moving median
or Moving
mean
.
Window | Description |
---|---|
Centered | Specified window length centered about the current point |
Asymmetric | Specified window containing the number of elements before the current point and the number of elements after the current point |
Window sizes are relative to the X-axis variable units.
Version History
Introduced in R2019bSee Also
Functions
Live Editor Tasks
- Clean Missing Data | Find Change Points | Find Local Extrema | Smooth Data | Remove Trends | Normalize Data | Compute by Group