Main Content

# groupsummary

Group summary computations

## Syntax

``G = groupsummary(T,groupvars)``
``G = groupsummary(T,groupvars,method)``
``G = groupsummary(T,groupvars,method,datavars)``
``G = groupsummary(T,groupvars,groupbins)``
``G = groupsummary(T,groupvars,groupbins,method)``
``G = groupsummary(T,groupvars,groupbins,method,datavars)``
``G = groupsummary(___,Name,Value)``
``B = groupsummary(A,groupvars,method)``
``B = groupsummary(A,groupvars,groupbins,method)``
``B = groupsummary(___,Name,Value)``
``[B,BG,BC] = groupsummary(A,___)``

## Description

### Table Data

example

````G = groupsummary(T,groupvars)` returns the unique grouping variable combinations and the number of members in each group for table or timetable `T`. Groups are defined by rows in the variables in `groupvars` that have the same unique combination of values. Each row of the output table corresponds to one group. For example, `G = groupsummary(T,"HealthStatus")` returns a table with the count of each group in the variable `HealthStatus`.```
````G = groupsummary(T,groupvars,method)` applies the group-wise computations specified in `method` and appends the computation results to the output table as additional variables. For example, `G = groupsummary(T,"Location","median")` returns the median value of every nongrouping variable in `T` for each location, in addition to the number of members in each location group.```

example

````G = groupsummary(T,groupvars,method,datavars)` specifies the table variables to apply the method to.```
````G = groupsummary(T,groupvars,groupbins)` specifies to bin rows in `groupvars` according to binning scheme `groupbins` prior to grouping. For example, ```G = groupsummary(T,"SaleDate","year")``` returns the group counts for all sales in `T` within each year according to the grouping variable `SaleDate`.```

example

````G = groupsummary(T,groupvars,groupbins,method)` specifies how to bin the data in `groupvars` prior to grouping and applying the group-wise computations specified in `method`.```
````G = groupsummary(T,groupvars,groupbins,method,datavars)` specifies the table variables to apply the method to.```

example

````G = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value arguments for any of the previous syntaxes. For example, ```G = groupsummary(T,"Category1","IncludeMissingGroups",false)``` excludes the group made from missing data of type `categorical` indicated by `<undefined>` in `Category1`.```

### Array Data

````B = groupsummary(A,groupvars,method)` returns the concatenated results of applying the group-wise computations in `method` to unique groups in vector, matrix, or cell array `A`. Groups are defined by rows in the column vectors in `groupvars` that have the same unique combination of values. Each row of the output array contains the computation results for one group.```

example

````B = groupsummary(A,groupvars,groupbins,method)` specifies to bin rows in `groupvars` according to binning scheme `groupbins` prior to grouping.```

example

````B = groupsummary(___,Name,Value)` specifies additional grouping properties using one or more name-value arguments for either of the previous syntaxes for an input array.```

example

````[B,BG,BC] = groupsummary(A,___)` returns additional group information. `BG` is the unique grouping vector combinations corresponding to the rows in `B`. `BC` is the number of members in each group in `BG`.```

## Examples

collapse all

Compute summary statistics on table variables.

Create a table `T` that contains information about eight individuals.

```HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Fair"; "Poor"; "Excellent"; "Good"; "Excellent"]); Age = [38; 43; 38; 40; 49; 51; 52; 35]; Height = [71; 68; 64; 67; 64; 62; 65; 55]; Weight = [176; 153; 131; 133; 119; 120; 140; 129]; T = table(HealthStatus,Age,Height,Weight)```
```T=8×4 table HealthStatus Age Height Weight ____________ ___ ______ ______ Poor 38 71 176 Good 43 68 153 Fair 38 64 131 Fair 40 67 133 Poor 49 64 119 Excellent 51 62 120 Good 52 65 140 Excellent 35 55 129 ```

Compute the counts of the health status groups by specifying `HealthStatus` as the grouping variable.

`G = groupsummary(T,"HealthStatus")`
```G=4×2 table HealthStatus GroupCount ____________ __________ Excellent 2 Fair 2 Good 2 Poor 2 ```

Compute the mean age, height, and weight of each health status group.

`G = groupsummary(T,"HealthStatus","mean")`
```G=4×5 table HealthStatus GroupCount mean_Age mean_Height mean_Weight ____________ __________ ________ ___________ ___________ Excellent 2 43 58.5 124.5 Fair 2 39 65.5 132 Good 2 47.5 66.5 146.5 Poor 2 43.5 67.5 147.5 ```

Still grouping by health status, compute only the median height.

`G = groupsummary(T,"HealthStatus","median","Height")`
```G=4×3 table HealthStatus GroupCount median_Height ____________ __________ _____________ Excellent 2 58.5 Fair 2 65.5 Good 2 66.5 Poor 2 67.5 ```

Group table data using two grouping variables.

Create a table `T` that contains information about eight individuals.

```HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Fair"; "Poor"; "Excellent"; "Good"; "Excellent"]); Smoker = logical([1; 0; 0; 1; 1; 0; 0; 1]); Weight = [176; 153; 131; 133; 119; 120; 140; 129]; T = table(HealthStatus,Smoker,Weight)```
```T=8×3 table HealthStatus Smoker Weight ____________ ______ ______ Poor true 176 Good false 153 Fair false 131 Fair true 133 Poor true 119 Excellent false 120 Good false 140 Excellent true 129 ```

Compute the mean weight, grouped by health status and smoking status. By default, some combinations of health status and smoking status are not represented in the output because they are empty groups.

`G = groupsummary(T,["HealthStatus","Smoker"],"mean","Weight")`
```G=6×4 table HealthStatus Smoker GroupCount mean_Weight ____________ ______ __________ ___________ Excellent false 1 120 Excellent true 1 129 Fair false 1 131 Fair true 1 133 Good false 2 146.5 Poor true 2 147.5 ```

Set the value of `IncludeEmptyGroups` to `true` to see all group combinations, including the empty ones.

`G = groupsummary(T,["HealthStatus","Smoker"],"mean","Weight","IncludeEmptyGroups",true)`
```G=8×4 table HealthStatus Smoker GroupCount mean_Weight ____________ ______ __________ ___________ Excellent false 1 120 Excellent true 1 129 Fair false 1 131 Fair true 1 133 Good false 2 146.5 Good true 0 NaN Poor false 0 NaN Poor true 2 147.5 ```

Group data according to specified bins.

Create a timetable that contains sales information for days within a single month.

```TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]'; ItemsSold = [14 13 8 5 10 16 8 6 7 11]'; TT = timetable(TimeStamps,Profit,ItemsSold)```
```TT=10×2 timetable TimeStamps Profit ItemsSold ___________ ______ _________ 04-Mar-2017 2032 14 02-Mar-2017 3071 13 15-Mar-2017 1185 8 10-Mar-2017 2587 5 14-Mar-2017 1998 10 31-Mar-2017 2899 16 25-Mar-2017 3112 8 29-Mar-2017 909 6 21-Mar-2017 2619 7 18-Mar-2017 3085 11 ```

Compute the mean and the mode of profit binned by the items sold, binning the groups into intervals of item numbers.

```format shorte G = groupsummary(TT,"ItemsSold",[0 4 8 12 16],{"mean","mode"},"Profit")```
```G=3×4 table disc_ItemsSold GroupCount mean_Profit mode_Profit ______________ __________ ___________ ___________ [4, 8) 3.0000e+00 2.0383e+03 9.0900e+02 [8, 12) 4.0000e+00 2.3450e+03 1.1850e+03 [12, 16] 3.0000e+00 2.6673e+03 2.0320e+03 ```

Compute the mean profit grouped by day of the week.

`G = groupsummary(TT,"TimeStamps","dayname","mean","Profit")`
```G=5×3 table dayname_TimeStamps GroupCount mean_Profit __________________ __________ ___________ Tuesday 2.0000e+00 2.3085e+03 Wednesday 2.0000e+00 1.0470e+03 Thursday 1.0000e+00 3.0710e+03 Friday 2.0000e+00 2.7430e+03 Saturday 3.0000e+00 2.7430e+03 ```

Create a vector of dates and a vector of corresponding profit values.

```TimeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ... 2017 3 14; 2017 3 31; 2017 3 25; ... 2017 3 29; 2017 3 21; 2017 3 18]); Profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';```

Compute the mean profit for each day of the week. Display the mean, the group names, and the number of members in each group.

```format shorte [meanDailyProfit,dayOfWeek,dailyCounts] = groupsummary(Profit,TimeStamps,"dayname","mean")```
```meanDailyProfit = 5×1 2.3085e+03 1.0470e+03 3.0710e+03 2.7430e+03 2.7430e+03 ```
```dayOfWeek = 5x1 categorical Tuesday Wednesday Thursday Friday Saturday ```
```dailyCounts = 5×1 2 2 1 2 3 ```

Compute the mean weights for a set of people grouped by their health status and smoker status.

Store information about the individuals as three vectors of different types.

```HealthStatus = categorical(["Poor"; "Good"; "Fair"; "Fair"; "Poor"; "Excellent"; "Good"; "Excellent"]); Smoker = logical([1; 0; 0; 1; 1; 0; 0; 1]); Weight = [176; 153; 131; 133; 119; 120; 140; 129];```

Grouping by health status and smoker status, compute the mean weights.

`B` contains the mean for each group (`NaN` for empty groups). `BG` is a cell array containing two vectors that describe the groups as you look at their elements row-wise. For instance, the first row of `BG{1}` indicates that the patients in the first group have a health status `Excellent`, and the first row of `BG{2}` indicates that they are nonsmokers. Finally, `BC` contains the number of members in each group for the corresponding groups in `BG`.

```[B,BG,BC] = groupsummary(Weight,{HealthStatus,Smoker},"mean","IncludeEmptyGroups",true); B```
```B = 8×1 120.0000 129.0000 131.0000 133.0000 146.5000 NaN NaN 147.5000 ```
`BG{1}`
```ans = 8x1 categorical Excellent Excellent Fair Fair Good Good Poor Poor ```
`BG{2}`
```ans = 8x1 logical array 0 1 0 1 0 1 0 1 ```
`BC`
```BC = 8×1 1 1 1 1 2 0 0 2 ```

Load data containing patient information and create a table describing each patient's location, systolic and diastolic blood pressure, height, and weight.

```load patients Location = categorical(Location); T = table(Location,Systolic,Diastolic,Height,Weight)```
```T=100×5 table Location Systolic Diastolic Height Weight _________________________ ________ _________ ______ ______ County General Hospital 124 93 71 176 VA Hospital 109 77 69 163 St. Mary's Medical Center 125 83 64 131 VA Hospital 117 75 67 133 County General Hospital 122 80 64 119 St. Mary's Medical Center 121 70 68 142 VA Hospital 130 88 64 142 VA Hospital 115 82 68 180 St. Mary's Medical Center 115 78 68 183 County General Hospital 118 86 66 132 County General Hospital 114 77 68 128 St. Mary's Medical Center 115 68 66 137 VA Hospital 127 74 71 174 VA Hospital 130 95 72 202 St. Mary's Medical Center 114 79 65 129 VA Hospital 130 92 71 181 ⋮ ```

Grouping by location, compute the correlation between patient height and weight and the correlation between systolic and diastolic blood pressure. Use the `xcov` function as the method to compute the correlation. The first two input arguments to `xcov` describe the data to correlate, the third argument describes the lag size, and the fourth argument describes the type of normalization. For each group computation, the `x` and `y` arguments passed to `xcov` are specified pairwise by variable from the two cell elements `["Height","Systolic"]` and `["Weight","Diastolic"]`.

`G = groupsummary(T,"Location",@(x,y) xcov(x,y,0,"coeff"),{["Height","Systolic"],["Weight","Diastolic"]})`
```G=3×4 table Location GroupCount fun1_Height_Weight fun1_Systolic_Diastolic _________________________ __________ __________________ _______________________ County General Hospital 39 0.65483 0.44187 St. Mary's Medical Center 24 0.62047 0.44466 VA Hospital 37 0.78438 0.62256 ```

Alternatively, if your data is in vector or matrix form instead of in a table, you can provide the data to correlate as the first input argument of `groupsummary`.

`[B,BG,BC] = groupsummary({[Height,Systolic],[Weight,Diastolic]},Location,@(x,y) xcov(x,y,0,"coeff"))`
```B = 3×2 0.6548 0.4419 0.6205 0.4447 0.7844 0.6226 ```
```BG = 3x1 categorical County General Hospital St. Mary's Medical Center VA Hospital ```
```BC = 3×1 39 24 37 ```

## Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector, group of column vectors stored as a matrix, or cell array of column vectors, character row vectors, or matrices.

When you specify a function handle for `method` that takes more than one input argument, the input array `A` must be a cell array of column vectors, character row vectors, or matrices. In each call to the function by group, the input arguments are the corresponding columns of each element in the cell array. For example:

• ```groupsummary({x1,y1},groupvars,@(x,y) myFun(x,y))``` calculates `myFun(x1,y1)` for each group.

• ```groupsummary({[x1 x2],[y1 y2]},groupvars,@(x,y) myFun(x,y))``` first calculates `myFun(x1,y1)` for each group and then calculates `myFun(x2,y2)` for each group.

Grouping variables or vectors, specified as one of these options:

• For array input data, `groupvars` can be either a column vector with the same number of rows as `A` or a group of column vectors arranged in a matrix or cell array.

• For table or timetable input data, `groupvars` indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

OptionDescriptionExamples
Variable name

A character vector or string scalar specifying a single table variable name

`'Var1'`

`"Var1"`

Vector of variable names

A cell array of character vectors or string array, where each element is a table variable name

`{'Var1' 'Var2'}`

`["Var1" "Var2"]`

Scalar or vector of variable indices

A scalar or vector of table variable indices

`1`

`[1 3 5]`

Logical vector

A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

`[true false true]`

Function handle

A function handle that takes a table variable as input and returns a logical scalar

`@isnumeric`

`vartype` subscript

A table subscript generated by the `vartype` function

`vartype("numeric")`

Example: `groupsummary(T,"Var3")`

Computation method, specified as one of these values:

Method

Description
`"sum"`Sum
`"mean"`Mean
`"median"`Median
`"mode"`Mode
`"var"`Variance
`"std"`Standard deviation
`"min"`Minimum
`"max"`Maximum
`"range"`Maximum minus minimum
`"nummissing"`Number of missing elements
`"nnz"`

Number of nonzero and non-`NaN` elements

`"all"`All computations previously listed

You also can specify `method` as a function handle that returns one output per group whose first dimension has length 1. For table input data, the function operates on each table variable separately.

When the input data is a table `T` and you specify a function handle for `method` that takes more than one input argument, you must specify `datavars`. The `datavars` argument must be a cell array whose elements indicate the table variables to use for each input into the method. In each call to the function by group, the input arguments are the corresponding table variables of the cell array elements. For example:

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1","y1"})``` calculates `myFun(T.x1,T.y1)` for each group.

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})``` first calculates `myFun(T.x1,T.y1)` for each group and then calculates `myFun(T.x2,T.y2)` for each group.

When the input data is a vector or matrix and you specify a function handle for `method` that takes more than one input argument, the input data `A` must be a cell array of vectors or matrices. In each call to the function, the input arguments are the corresponding columns of each element in the cell array. For example:

• ```groupsummary({x1,y1},groupvars,@(x,y) myFun(x,y))``` calculates `myFun(x1,y1)` for each group.

• ```groupsummary({[x1 x2],[y1 y2]},groupvars,@(x,y) myFun(x,y))``` first calculates `myFun(x1,y1)` for each group and then calculates `myFun(x2,y2)` for each group.

To specify multiple computations at a time, list the options in a cell array, such as `{"mean","median"}` or `{myFun1,myFun2}`.

`NaN` values in the input data are automatically omitted when using the method names described here, with the exception of `"nummissing"`. To include `NaN` values, use a function handle for the method, such as `@sum` instead of `"sum"`.

Data Types: `char` | `string` | `cell` | `function_handle`

Table variables to operate on, specified as one of the options in this table. `datavars` indicates which variables of the input table or timetable to apply the methods to. Other variables not specified by `datavars` are not operated on and do not pass through to the output. When `datavars` is not specified, `groupsummary` operates on each nongrouping variable.

OptionDescriptionExamples
Variable name

A character vector or string scalar specifying a single table variable name

`'Var1'`

`"Var1"`

Vector of variable names

A cell array of character vectors or string array, where each element is a table variable name

`{'Var1' 'Var2'}`

`["Var1" "Var2"]`

Scalar or vector of variable indices

A scalar or vector of table variable indices

`1`

`[1 3 5]`

Logical vector

A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

`[true false true]`

Function handle

A function handle that takes a table variable as input and returns a logical scalar

`@isnumeric`

`vartype` subscript

A table subscript generated by the `vartype` function

`vartype("numeric")`

When the input data is a table `T` and you specify a function handle for `method` that takes more than one input argument, you must specify `datavars`. The `datavars` argument must be a cell array whose elements are any of the options in the table. The cell array elements indicate the table variables to use for each input into the method. In each call to the function by group, the input arguments are the corresponding table variables of the cell array elements. For example:

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{"x1","y1"})``` calculates `myFun(T.x1,T.y1)` for each group.

• ```groupsummary(T,groupvars,@(x,y) myFun(x,y),{["x1" "x2"],["y1" "y2"]})``` first calculates `myFun(T.x1,T.y1)` for each group and then calculates `myFun(T.x2,T.y2)` for each group.

Example: ```groupsummary(T,groupvars,method,["Var1" "Var2" "Var4"])```

Binning scheme, specified as one of these options:

• `"none"`, indicating no binning

• A list of bin edges, specified as a numeric vector, or a `datetime` vector for `datetime` grouping variables or vectors

• A number of bins, specified as a positive integer scalar

• A time duration, specified as a scalar of type `duration` or `calendarDuration` indicating bin widths (for `datetime` or `duration` grouping variables or vectors only)

• A cell array listing binning methods for each grouping variable or vector

• A time bin for `datetime` and `duration` grouping variables or vectors only, specified as one of these strings.

ValueDescriptionData Type
`"second"`

Each bin is 1 second.

`datetime` and `duration`
`"minute"`

Each bin is 1 minute.

`datetime` and `duration`
`"hour"`

Each bin is 1 hour.

`datetime` and `duration`
`"day"`

Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

`datetime` and `duration`
`"week"`Each bin is 1 calendar week.`datetime` only
`"month"`Each bin is 1 calendar month.`datetime` only
`"quarter"`Each bin is 1 calendar quarter.`datetime` only
`"year"`

Each bin is 1 calendar year. This value accounts for leap days.

`datetime` and `duration`
`"decade"`Each bin is 1 decade (10 calendar years).`datetime` only
`"century"`Each bin is 1 century (100 calendar years).`datetime` only
`"secondofminute"`

Bins are seconds from 0 to 59.

`datetime` only
`"minuteofhour"`

Bins are minutes from 0 to 59.

`datetime` only
`"hourofday"`

Bins are hours from 0 to 23.

`datetime` only
`"dayofweek"`

Bins are days from 1 to 7. The first day of the week is Sunday.

`datetime` only
`"dayname"`Bins are full day names such as `"Sunday"`.`datetime` only
`"dayofmonth"`Bins are days from 1 to 31.`datetime` only
`"dayofyear"`Bins are days from 1 to 366.`datetime` only
`"weekofmonth"`Bins are weeks from 1 to 6.`datetime` only
`"weekofyear"`Bins are weeks from 1 to 54.`datetime` only
`"monthname"`Bins are full month names such as `"January"`.`datetime` only
`"monthofyear"`

Bins are months from 1 to 12.

`datetime` only
`"quarterofyear"`Bins are quarters from 1 to 4.`datetime` only

When multiple grouping variables or vectors are specified, you can provide a single binning method that is applied to all grouping variables or vectors, or a cell array containing a binning method for each grouping variable or vector such as `{"none",[0 2 4 Inf]}`.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: ```G = groupsummary(T,groupvars,groupbins,IncludedEdge="right")```

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: ```G = groupsummary(T,groupvars,groupbins,"IncludedEdge","right")```

Included bin edge, specified as either `"left"` or `"right"`, indicating which end of the bin interval is inclusive.

This name-value argument can be specified only when `groupbins` is specified, and the value applies to all binning schemes for all grouping variables or vectors.

Missing groups indicator, specified as a numeric or logical `1` (`true`) or `0` (`false`). When the value of `IncludeMissingGroups` is `true`, `groupsummary` displays groups made up of missing values, such as `NaN`. When the value of `IncludeMissingGroups` is `false`, `groupcounts` does not display the missing value groups.

Empty groups indicator, specified as a numeric or logical `1` (`true`) or `0` (`false`). When the value of `IncludeEmptyGroups` is `false`, `groupsummary` does not display groups with zero elements. When the value of `IncludeEmptyGroups` is `true`, `groupsummary` displays the empty groups.

## Output Arguments

collapse all

Output table for table or timetable input data, returned as a table. `G` contains the computed groups, the number of elements in each group, and if `method` is provided, the result of the specified computations.

Output array for array input data, returned as a vector or matrix. `B` contains the specified computations for each group. When multiple methods are specified, `groupsummary` horizontally concatenates the computations in the order that they were listed.

Groups for array input data, returned as a column vector or cell array of column vectors. For a single grouping vector, the output groups are sorted according to the order returned by the `unique` function with the `"sorted"` option.

For more than one input vector, `BG` is a cell array containing column vectors of equal length. Information for each group is contained in the elements of a row across all vectors in `BG`. Each group maps to the corresponding row of the output array `B`.

Group counts for array input data, returned as a column vector. `BC` contains the number of elements in each group. The length of `BC` is the same as the length of the group column vectors returned in `BG`.

## Tips

• When making many calls to `groupsummary`, consider converting grouping variables to type `categorical` or `logical` when possible for improved performance. For example, if you have a string array grouping variable (such as `HealthStatus` with elements `"Poor"`, `"Fair"`, `"Good"`, and `"Excellent"`), you can convert it to a categorical variable using the command `categorical(HealthStatus)`.

## Version History

Introduced in R2018a

expand all