Documentation

This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

Data with Missing Values

Many data sets have one or more missing values. It is convenient to code missing values as `NaN` (Not a Number) to preserve the structure of data sets across multiple variables and observations.

Normal MATLAB® arithmetic operations yield `NaN` values when operands are `NaN`. Removing the `NaN` values would destroy the matrix structure. Removing the rows containing the `NaN` values would discard data. Statistics and Machine Learning Toolbox™ functions in the following table remove `NaN` values only for the purposes of computation.

FunctionDescription
`nancov`

Covariance matrix, ignoring `NaN` values

`nanmax`

Maximum, ignoring `NaN` values

`nanmean`

Mean, ignoring `NaN` values

`nanmedian`

Median, ignoring `NaN` values

`nanmin`

Minimum, ignoring `NaN` values

`nanstd`

Standard deviation, ignoring `NaN` values

`nansum`

Sum, ignoring `NaN` values

`nanvar`

Variance, ignoring `NaN` values

Other Statistics and Machine Learning Toolbox functions also ignore `NaN` values. These include `iqr`, `kurtosis`, `mad`, `prctile`, `range`, `skewness`, and `trimmean`.

Working with Data with Missing Values

Create a 3-by-3 matrix of sample data. Remove two data values by replacing them with `NaN`.

```X = magic(3); X([1 5]) = [NaN NaN]```
```X = 3×3 NaN 1 6 3 NaN 7 4 9 2 ```

Compute the sum of for each column of the sample data matrix using the `sum` function.

`s1 = sum(X)`
```s1 = 1×3 NaN NaN 15 ```

If a column contains a `NaN` value, then the `sum` function will return `NaN` as the sum of the data in that column.

For comparison, compute the sum for each column of the sample data matrix using the `nansum` function.

`s2 = nansum(X)`
```s2 = 1×3 7 10 15 ```

If a column contains a `NaN` value, then the `nansum` function ignores the `NaN` value and returns the sum of the remaining values in the column.