Main Content

grouptransform

Transform by group

Description

Table Data

G = grouptransform(T,groupvars,method) returns transformed data in the place of the nongrouping variables in table or timetable T. The group-wise computations in method are applied to each nongrouping variable. Groups are defined by rows in the variables in groupvars that have the same unique combination of values. For example, G = grouptransform(T,"HealthStatus","norm") normalizes the data in T by health status using the 2-norm.

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

example

G = grouptransform(T,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping and appends the bins to the output table as additional variables. For example, G = grouptransform(T,"SaleDate","year","rescale") bins by sale year and scales the data in T to the range [0, 1].

G = grouptransform(___,datavars) specifies the table variables to apply the method to for either of the previous syntaxes.

example

G = grouptransform(___,Name,Value) specifies additional properties using one or more name-value arguments for any of the previous syntaxes. For example, G = grouptransform(T,"Temp","linearfill","ReplaceValues",false) appends the filled data as an additional variable of T instead of replacing the nongrouping variables.

Array Data

B = grouptransform(A,groupvars,method) returns transformed data in the place of column vectors in the input vector or matrix A. The group-wise computations in method are applied to all column vectors in A. Groups are defined by rows in the column vectors in groupvars that have the same unique combination of values.

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

example

B = grouptransform(A,groupvars,groupbins,method) specifies to bin rows in groupvars according to binning scheme groupbins prior to grouping.

B = grouptransform(___,Name,Value) specifies additional properties using one or more name-value arguments for either of the previous syntaxes for an input array.

example

[B,BG] = grouptransform(A,___) also returns the values of the grouping vectors or binned grouping vectors corresponding to the rows in B.

Examples

collapse all

Create a timetable that contains a progress status for three teams.

timeStamp = days([1 1 1 2 2 2 3 3 3]');
teamNumber = [1 2 3 1 2 3 1 2 3]';
percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]';
T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1              14.2      
    1 day            2              28.1      
    1 day            3              11.5      
    2 days           1               NaN      
    2 days           2               NaN      
    2 days           3              19.3      
    3 days           1              46.1      
    3 days           2              51.2      
    3 days           3              30.3      

Fill missing status percentages, indicated with NaN, for each team using linear interpolation.

G = grouptransform(T,"teamNumber","linearfill","percentComplete")
G=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1               14.2     
    1 day            2               28.1     
    1 day            3               11.5     
    2 days           1              30.15     
    2 days           2              39.65     
    2 days           3               19.3     
    3 days           1               46.1     
    3 days           2               51.2     
    3 days           3               30.3     

To append the filled data to the original table instead of replacing the percentComplete variable, use ReplaceValues.

Gappend = grouptransform(T,"teamNumber","linearfill","percentComplete","ReplaceValues",false)
Gappend=9×3 timetable
    timeStamp    teamNumber    percentComplete    linearfill_percentComplete
    _________    __________    _______________    __________________________

    1 day            1              14.2                     14.2           
    1 day            2              28.1                     28.1           
    1 day            3              11.5                     11.5           
    2 days           1               NaN                    30.15           
    2 days           2               NaN                    39.65           
    2 days           3              19.3                     19.3           
    3 days           1              46.1                     46.1           
    3 days           2              51.2                     51.2           
    3 days           3              30.3                     30.3           

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
T = table(timeStamps,profit)
T=10×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Mar-2017     2587 
    14-Mar-2017     1998 
    31-Mar-2017     2899 
    25-Mar-2017     3112 
    29-Mar-2017      909 
    21-Mar-2017     2619 
    18-Mar-2017     3085 

Binning by day name, normalize the profits using the 2-norm.

G = grouptransform(T,"timeStamps","dayname","norm")
G=10×3 table
    timeStamps     profit     dayname_timeStamps
    ___________    _______    __________________

    04-Mar-2017    0.42069        Saturday      
    02-Mar-2017          1        Thursday      
    15-Mar-2017    0.79344        Wednesday     
    10-Mar-2017    0.66582        Friday        
    14-Mar-2017    0.60654        Tuesday       
    31-Mar-2017    0.74612        Friday        
    25-Mar-2017    0.64428        Saturday      
    29-Mar-2017    0.60864        Wednesday     
    21-Mar-2017    0.79506        Tuesday       
    18-Mar-2017    0.63869        Saturday      

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Binning by day name, normalize the profits using the 2-norm. Display the transformed data and which group it corresponds to.

[normDailyProfit,dayName] = grouptransform(profit,timeStamps,"dayname","norm")
normDailyProfit = 10×1

    0.4207
    1.0000
    0.7934
    0.6658
    0.6065
    0.7461
    0.6443
    0.6086
    0.7951
    0.6387

dayName = 10x1 categorical
     Saturday 
     Thursday 
     Wednesday 
     Friday 
     Tuesday 
     Friday 
     Saturday 
     Wednesday 
     Tuesday 
     Saturday 

Input Arguments

collapse all

Input table, specified as a table or timetable.

Input array, specified as a column vector or a group of column vectors stored as a matrix.

Grouping variables or vectors, specified as one of these options:

  • For array input data, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or cell array.

  • For table or timetable input data, groupvars indicates which variables to use to compute groups in the data. You can specify the grouping variables with any of the options in this table.

    Indexing SchemeExamples

    Variable names:

    • A string or character vector

    • A string array or cell array of character vectors

    • A pattern object

    • "A" or 'A' — A variable named A

    • ["A" "B"] or {'A','B'} — Two variables named A and B

    • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

    Variable index:

    • An index number that refers to the location of a variable in the table

    • A vector of numbers

    • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values.

    • 3 — The third variable from the table

    • [2 3] — The second and third variables from the table

    • [false false true] — The third variable

    Function handle:

    • A function handle that takes a table variable as input and returns a logical scalar

    • @isnumeric — All the variables containing numeric values

    Variable type:

    • A vartype subscript that selects variables of a specified type

    • vartype("numeric") — All the variables containing numeric values

Example: grouptransform(T,"Var3",method)

Transformation method, specified as one of these values:

Method

Description

"zscore"

Normalize data to have mean 0 and standard deviation 1

"norm"

Normalize data by 2-norm

"meancenter"

Normalize data to have mean 0

"rescale"

Rescale range to [0,1]

"meanfill"

Fill missing values with the mean of the group data

"linearfill"

Fill missing values by linear interpolation of nonmissing group data

You can also specify a function handle that returns one array whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an array with first dimension length equal to 1, then grouptransform repeats that value so that the output has the same number of rows as the input.

Data Types: char | string | function_handle

Binning scheme for grouping variables or vectors, specified as one or more of the following binning methods. Grouping variables or vectors and binning scheme arguments must be the same size, or one of them can be scalar.

  • "none" — No binning.

  • Vector of bin edges — The bin edges define the bins. You can specify the edges as numeric values or as datetime values for datetime grouping variables or vectors.

  • Number of bins — The number determines how many equally spaced bins to create. You can specify the number of bins as a positive integer scalar.

  • Length of time (bin width) — The length of time determines the width of each bin. You can specify the bin width as a duration or calendarDuration scalar for datetime or duration grouping variables or vectors.

  • Name of time unit (bin width) — The name of the time unit determines the width of each bin. You can specify the bin width as one of the options in this table for datetime or duration grouping variables or vectors.

    ValueDescriptionData Type
    "second"

    Each bin is 1 second.

    datetime and duration
    "minute"

    Each bin is 1 minute.

    datetime and duration
    "hour"

    Each bin is 1 hour.

    datetime and duration
    "day"

    Each bin is 1 calendar day. This value accounts for daylight saving time shifts.

    datetime and duration
    "week"Each bin is 1 calendar week.datetime only
    "month"Each bin is 1 calendar month.datetime only
    "quarter"Each bin is 1 calendar quarter.datetime only
    "year"

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    "decade"Each bin is 1 decade (10 calendar years).datetime only
    "century"Each bin is 1 century (100 calendar years).datetime only
    "secondofminute"

    Bins are seconds from 0 to 59.

    datetime only
    "minuteofhour"

    Bins are minutes from 0 to 59.

    datetime only
    "hourofday"

    Bins are hours from 0 to 23.

    datetime only
    "dayofweek"

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    "dayname"Bins are full day names, such as "Sunday".datetime only
    "dayofmonth"Bins are days from 1 to 31.datetime only
    "dayofyear"Bins are days from 1 to 366.datetime only
    "weekofmonth"Bins are weeks from 1 to 6.datetime only
    "weekofyear"Bins are weeks from 1 to 54.datetime only
    "monthname"Bins are full month names, such as "January".datetime only
    "monthofyear"

    Bins are months from 1 to 12.

    datetime only
    "quarterofyear"Bins are quarters from 1 to 4.datetime only

Table variables to operate on, specified as one of the options in this table. datavars indicates which variables of the input table or timetable to apply the methods to. Other variables not specified by datavars pass through to the output without being operated on. When datavars is not specified, grouptransform operates on each nongrouping variable.

Indexing SchemeExamples

Variable names:

  • A string or character vector

  • A string array or cell array of character vectors

  • A pattern object

  • "A" or 'A' — A variable named A

  • ["A" "B"] or {'A','B'} — Two variables named A and B

  • "Var"+digitsPattern(1) — Variables named "Var" followed by a single digit

Variable index:

  • An index number that refers to the location of a variable in the table

  • A vector of numbers

  • A logical vector. Typically, this vector is the same length as the number of variables, but you can omit trailing 0 or false values.

  • 3 — The third variable from the table

  • [2 3] — The second and third variables from the table

  • [false false true] — The third variable

Function handle:

  • A function handle that takes a table variable as input and returns a logical scalar

  • @isnumeric — All the variables containing numeric values

Variable type:

  • A vartype subscript that selects variables of a specified type

  • vartype("numeric") — All the variables containing numeric values

Example: grouptransform(T,groupvars,method,["Var1" "Var2" "Var4"])

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: G = grouptransform(T,groupvars,groupbins,"zscore",IncludedEdge="right")

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: G = grouptransform(T,groupvars,groupbins,"zscore","IncludedEdge","right")

Included bin edge for binning scheme, specified as either "left" or "right", indicating which end of the bin interval is inclusive.

You can specify IncludedEdge only if you also specify groupbins, and the value applies to all binning methods for all grouping variables or vectors.

Option to replace values, specified as one of these values:

  • true or 1 — Replace nongrouping table variables or column vectors in the input data with table variables or column vectors containing transformed data.

  • false or 0 — Append the input data with the table variables or column vectors containing transformed data.

Output Arguments

collapse all

Output table for table or timetable input data, returned as a table or timetable. G contains the transformed data for each group.

Output array for array input data, returned as a vector or matrix. B contains the transformed data in the place of the nongrouping vectors.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the unique grouping vector or binned grouping vector combinations that correspond to the rows in B.

Tips

  • When making many calls to grouptransform, consider converting grouping variables to type categorical or logical when possible for improved performance. For example, if you have a string array grouping variable (such as HealthStatus with elements "Poor", "Fair", "Good", and "Excellent"), you can convert it to a categorical variable using the command categorical(HealthStatus).

Alternative Functionality

Live Editor Task

You can use grouptransform functionality interactively by adding the Compute by Group task to a live script.

Compute by Group task in the Live Editor

Extended Capabilities

Version History

Introduced in R2018b

expand all