Main Content

grouptransform

Transform by group

Description

Table Data

G = grouptransform(T,groupvars,method) returns transformed data in the place of the nongrouping variables in table or timetable T. The group-wise computations in method are applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = grouptransform(T,"HealthStatus","norm") normalizes the data in T by health status using the 2-norm.

example

G = grouptransform(T,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping and appends the bins to the output table as additional variables. For example, G = grouptransform(T,"SaleDate","year","rescale") bins by sale year and scales the data in T to the range [0, 1].

G = grouptransform(___,datavars) specifies the table variables to apply the method to for either of the previous syntaxes.

example

G = grouptransform(___,Name,Value) specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example, G = grouptransform(T,"Temp","linearfill","ReplaceValues",false) appends the filled data as an additional variable of T instead of replacing the nongrouping variables.

Array Data

B = grouptransform(A,groupvars,method) returns transformed data in the place of column vectors in the input vector or matrix A. The group-wise computations in method are applied to all column vectors in A. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

example

B = grouptransform(A,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

B = grouptransform(___,Name,Value) specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.

example

[B,BG] = grouptransform(A,___) also returns the values of the grouping vectors or binned grouping vectors corresponding to the rows in B.

Examples

collapse all

Create a timetable that contains a progress status for three teams.

timeStamp = days([1 1 1 2 2 2 3 3 3]');
teamNumber = [1 2 3 1 2 3 1 2 3]';
percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]';
T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1              14.2      
    1 day            2              28.1      
    1 day            3              11.5      
    2 days           1               NaN      
    2 days           2               NaN      
    2 days           3              19.3      
    3 days           1              46.1      
    3 days           2              51.2      
    3 days           3              30.3      

Fill missing status percentages, indicated with NaN, for each team using linear interpolation.

G = grouptransform(T,"teamNumber","linearfill","percentComplete")
G=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1               14.2     
    1 day            2               28.1     
    1 day            3               11.5     
    2 days           1              30.15     
    2 days           2              39.65     
    2 days           3               19.3     
    3 days           1               46.1     
    3 days           2               51.2     
    3 days           3               30.3     

To append the filled data to the original table instead of replacing the percentComplete variable, use ReplaceValues.

Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)
Gappend=9×3 timetable
    timeStamp    teamNumber    percentComplete    linearfill_percentComplete
    _________    __________    _______________    __________________________

    1 day            1              14.2                     14.2           
    1 day            2              28.1                     28.1           
    1 day            3              11.5                     11.5           
    2 days           1               NaN                    30.15           
    2 days           2               NaN                    39.65           
    2 days           3              19.3                     19.3           
    3 days           1              46.1                     46.1           
    3 days           2              51.2                     51.2           
    3 days           3              30.3                     30.3           

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
T = table(timeStamps,profit)
T=10×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Mar-2017     2587 
    14-Mar-2017     1998 
    31-Mar-2017     2899 
    25-Mar-2017     3112 
    29-Mar-2017      909 
    21-Mar-2017     2619 
    18-Mar-2017     3085 

Binning by day name, normalize the profits using the 2-norm.

G = grouptransform(T,"timeStamps","dayname","norm")
G=10×3 table
    timeStamps     profit     dayname_timeStamps
    ___________    _______    __________________

    04-Mar-2017    0.42069        Saturday      
    02-Mar-2017          1        Thursday      
    15-Mar-2017    0.79344        Wednesday     
    10-Mar-2017    0.66582        Friday        
    14-Mar-2017    0.60654        Tuesday       
    31-Mar-2017    0.74612        Friday        
    25-Mar-2017    0.64428        Saturday      
    29-Mar-2017    0.60864        Wednesday     
    21-Mar-2017    0.79506        Tuesday       
    18-Mar-2017    0.63869        Saturday      

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.

[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")
normDailyProfit = 10×1

    0.4207
    1.0000
    0.7934
    0.6658
    0.6065
    0.7461
    0.6443
    0.6086
    0.7951
    0.6387

dayName = 10x1 categorical
     Saturday 
     Thursday 
     Wednesday 
     Friday 
     Tuesday 
     Friday 
     Saturday 
     Wednesday 
     Tuesday 
     Saturday 

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

  • For array input data, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or cell array.

  • For table or timetable input data, groupvars indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

    OptionDescriptionExamples
    Variable name

    A character vector or string scalar specifying a single table variable name

    'Var1'

    "Var1"

    Vector of variable names

    A cell array of character vectors or string array, where each element is a table variable name

    {'Var1' 'Var2'}

    ["Var1" "Var2"]

    Scalar or vector of variable indices

    A scalar or vector of table variable indices

    1

    [1 3 5]

    Logical vector

    A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

    [true false true]

    Function handle

    A function handle that takes a table variable as input and returns a logical scalar

    @isnumeric

    vartype subscript

    A table subscript generated by the vartype function

    vartype("numeric")

Example: grouptransform(T,"Var3",method)

Transformation method, specified as one of these values:

Method

Description

"zscore"

Normalize data to have mean 0 and standard deviation 1

"norm"

Normalize data by 2-norm

"meancenter"

Normalize data to have mean 0

"rescale"

Rescale range to [0,1]

"meanfill"

Fill missing values with the mean of the group data

"linearfill"

Fill missing values by linear interpolation of nonmissing group data

You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, then grouptransform repeats that value so that the output has the same number of rows as the input.

Data Types: char | string | function_handle

Binning scheme, specified as one of these options:

  • "none", indicating no binning

  • A list of bin edges, specified as a numeric vector, or a datetime vector for datetime grouping variables or vectors

  • A number of bins, specified as a positive integer scalar

  • A time duration, specified as a scalar of type duration or calendarDuration indicating bin widths (for datetime or duration grouping variables or vectors only)

  • A cell array listing binning methods for each grouping variable or vector

  • A time bin for datetime and duration grouping variables or vectors only, specified as one of these strings.

    ValueDescriptionData Type
    "second"

    Each bin is 1 second.

    datetime and duration
    "minute"

    Each bin is 1 minute.

    datetime and duration
    "hour"

    Each bin is 1 hour.

    datetime and duration
    "day"

    Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

    datetime and duration
    "week"Each bin is 1 calendar week.datetime only
    "month"Each bin is 1 calendar month.datetime only
    "quarter"Each bin is 1 calendar quarter.datetime only
    "year"

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    "decade"Each bin is 1 decade (10 calendar years).datetime only
    "century"Each bin is 1 century (100 calendar years).datetime only
    "secondofminute"

    Bins are seconds from 0 to 59.

    datetime only
    "minuteofhour"

    Bins are minutes from 0 to 59.

    datetime only
    "hourofday"

    Bins are hours from 0 to 23.

    datetime only
    "dayofweek"

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    "dayname"Bins are full day names such as "Sunday".datetime only
    "dayofmonth"Bins are days from 1 to 31.datetime only
    "dayofyear"Bins are days from 1 to 366.datetime only
    "weekofmonth"Bins are weeks from 1 to 6.datetime only
    "weekofyear"Bins are weeks from 1 to 54.datetime only
    "monthname"Bins are full month names such as "January".datetime only
    "monthofyear"

    Bins are months from 1 to 12.

    datetime only
    "quarterofyear"Bins are quarters from 1 to 4.datetime only

When multiple grouping variables or vectors are specified, you can provide a single binning method that is applied to all grouping variables or vectors, or a cell array containing a binning method for each grouping variable or vector such as {"none",[0 2 4 Inf]}.

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the methods to. Other variables not specified by datavars pass through to the output without being operated on. When datavars is not specified, grouptransform operates on each nongrouping variable.

OptionDescriptionExamples
Variable name

A character vector or string scalar specifying a single table variable name

'Var1'

"Var1"

Vector of variable names

A cell array of character vectors or string array, where each element is a table variable name

{'Var1' 'Var2'}

["Var1" "Var2"]

Scalar or vector of variable indices

A scalar or vector of table variable indices

1

[1 3 5]

Logical vector

A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

[true false true]

Function handle

A function handle that takes a table variable as input and returns a logical scalar

@isnumeric

vartype subscript

A table subscript generated by the vartype function

vartype("numeric")

Example: grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")

Included bin edge, specified as either "left" or "right", indicating which end of the bin interval is inclusive.

This name-value argument can be specified only when groupbins is specified, and the value applies to all binning schemes for all grouping variables or vectors.

Replace values indicator, specified as one of these values:

  • true or 1 — Replace nongrouping table variables or column vectors in the input data with table variables or column vectors containing transformed data.

  • false or 0 — Append the input data with the table variables or column vectors containing transformed data.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable. G contains the transformed data for each group.

Output array for array input data, returned as a vector or matrix. B contains the transformed data in the place of the nongrouping vectors.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Tips

  • When making many calls to grouptransform, consider converting grouping variables to type categorical or logical when possible for improved performance. For example, if you have a string array grouping variable (such as HealthStatus with elements "Poor", "Fair", "Good", and "Excellent"), you can convert it to a categorical variable using the command categorical(HealthStatus).

Extended Capabilities

Version History

Introduced in R2018b

expand all