Data Preprocessing

Clean, normalize, aggregate, and analyze data

Data preprocessing is the process of transforming raw data into a format that is easier to analyze. This process can include cleaning steps, such as handling missing values or smoothing noisy data. By cleaning, organizing, and summarizing the data, you can identify patterns, make predictions, and inform decision-making.

Apps

expand all

Apply Preprocessing Steps

Data Cleaner

Preprocess and organize column-oriented data (Since R2022a)

Live Editor Tasks

expand all

Apply Single Preprocessing Step

Clean Missing Data	Find, fill, or remove missing data in the Live Editor
Clean Outlier Data	Find, fill, or remove outliers in the Live Editor
Smooth Data	Smooth noisy data in the Live Editor
Find Local Extrema	Find local maxima and minima in the Live Editor
Find Change Points	Find abrupt changes in data in the Live Editor
Stack Table Variables	Combine values from multiple table variables into one table variable in the Live Editor (Since R2020a)
Unstack Table Variables	Distribute values from one table variable to multiple table variables in the Live Editor (Since R2020a)
Retime Timetable	Resample or aggregate timetable data in the Live Editor (Since R2020a)
Normalize Data	Center and scale data in the Live Editor (Since R2021b)
Find and Remove Trends	Find and remove polynomial or periodic trends from data in the Live Editor
Pivot Table	Summarize tabular data in pivoted table in the Live Editor (Since R2023b)
Compute by Group	Summarize, transform, or filter by group in the Live Editor (Since R2021b)

Functions

expand all

Clean and Inspect Data

Missing Values

`fillmissing`	Fill missing entries
`fillmissing2`	Fill missing entries in 2-D data (Since R2023a)
`standardizeMissing`	Insert standard missing values
`rmmissing`	Remove missing entries
`anymissing`	Determine if any array element is missing (Since R2022a)
`ismissing`	Find missing values

Outliers

`filloutliers`	Detect and replace outliers in data
`rmoutliers`	Detect and remove outliers in data
`clip`	Clip data to range (Since R2024a)
`isoutlier`	Find outliers in data
`isbetween`	Determine which elements are within specified range

Noise Reduction

`smoothdata`	Smooth noisy data
`smoothdata2`	Smooth noisy data in two dimensions (Since R2023b)
`movmean`	Moving mean
`movmedian`	Moving median
`movsum`	Moving sum

Local Extrema and Change Points

`islocalmin`	Find local minima
`islocalmin2`	Find local minima in 2-D data (Since R2024a)
`islocalmax`	Find local maxima
`islocalmax2`	Find local maxima in 2-D data (Since R2024a)
`ischange`	Find abrupt changes in data

Sampling

`isuniform`	Determine if vector is uniformly spaced (Since R2022b)
`isregular`	Determine if input times are regular with respect to time or calendar unit
`retime`	Resample or aggregate data in timetable, and resolve duplicate or irregular times

Reshape, Sort, and Resize

Reshape Tables

`rows2vars`	Reorient table or timetable so that rows become variables
`stack`	Stack data from input table or timetable into one variable in output table or timetable
`unstack`	Unstack data from one variable into multiple variables

Sort and Compare Elements

`sort`	Sort array elements
`sortrows`	Sort rows of matrix or table
`issorted`	Determine if array is sorted
`issortedrows`	Determine if matrix or table rows are sorted
`unique`	Unique values
`uniquetol`	Unique values within tolerance
`ismember`	Find set members of data
`ismembertol`	Find set members of data within tolerance

Resize

`paddata`	Pad data by adding elements (Since R2023b)
`trimdata`	Trim data by removing elements (Since R2023b)
`resize`	Resize data by adding or removing elements (Since R2023b)

Normalize and Remove Trends

Normalize

`normalize`	Normalize data
`rescale`	Scale range of array elements

Find and Remove Trends

`detrend`	Remove polynomial trend
`trenddecomp`	Find trends in data (Since R2021b)

Bin, Group, and Summarize

Bin

`discretize`	Group data into bins or categories
`histcounts`	Histogram bin counts
`histcounts2`	Bivariate histogram bin counts

Pivot

pivot Summarize tabular data in pivoted table (Since R2023a)

Summarize

`summary`	Data summary
`groupsummary`	Group summary computations
`groupcounts`	Number of group elements
`groupfilter`	Filter by group
`grouptransform`	Transform by group
`findgroups`	Find groups and return group numbers
`splitapply`	Split data into groups and apply function
`accumarray`	Accumulate vector elements

Topics

Clean Data

Missing Data in MATLAB
Handle missing values in data sets.
Clean Messy and Missing Data in Tables
Standardize, fill, or remove missing values in tables, and reorganize tables by sorting rows and moving variables.
Data Smoothing and Outlier Detection
Eliminate unwanted noise or behavior in data, and find, fill, and remove outliers.
Clean Messy Data and Locate Extrema Using Live Editor Tasks
Interactively preprocess data with Live Editor Tasks.

Remove Trends

Remove Linear Trends from Timetable Data
Remove polynomial trend from data using detrend.

Summarize

Summarize or Pivot Data in Tables Using Groups
Interpret data based on common characteristics by creating and visualizing a grouped summary table or pivoted table.
Perform Calculations by Group in Table
Specify groups of data in tables and timetables, and perform calculations by group. Choose a function for group calculations using these recommendations.

Related Information

Matrices and Arrays
Tables
Timetables
How to Clean Your Data in MATLAB
How to Create Pivot Tables in MATLAB

Featured Examples

Data Cleaning and Calculations in Tables

Clean data stored in a table or timetable. Perform computations by using the numeric and categorical data that the table contains.

Open Live Script

Grouped Calculations in Tables and Timetables

Perform in-place calculations on groups of data in tables and timetables.

Open Live Script

How useful was this information?

Unrated 1 star 2 stars 3 stars 4 stars 5 stars